Universal database naming conventions

3/16/2023

Naming objects becomes much more than just tagging a word to a face. When it comes to data modeling, especially in a multi-tier, team-based, fast-growing environment, the “ Name” of an object really becomes crucial as it defines the objects. It’s almost like we take for granted an authority to be different. So why can’t we humans follow simple naming conventions? Particularly when it comes to naming table names and column names we suddenly get creative. I could give you many more examples but you get the point. When we standardized the TCP/IP protocol we can connect all the computers and the internet was born! When we standardized the HTML protocol the world wide web was born. When we standardized the formatting of a disk drive we got drives that could be read by both Unix and Windows systems, CDs, DVDs, Flash Drives, all have a standard format for information exchange. The computer industry has experienced the value of standardization for a long. So, I had to be the geek & explain to the team the way we name database objects here…Īn observation that I made based on this experience was that when we layout standards for everyone the progress is rapid. The database provides an intuitive web interface that allows users to create and maintain their own naming rules and organize these rules in projects that can be shared with the community.Although all of that earned him the reputation of a “Cool Dude”, it was not possible to follow his style in our Data Warehouse. Here, we present our solution to this problem in the form of the Protein Naming Utility (PNU), a web-based database to store and apply customizable sets of naming rules to correct and standardize gene and protein names within an annotated genome or metagenome. The Broad Institute has developed BioNames, a tool to resolve these difficulties using collections of hard-coded regular expressions ( ). In addition, the biological text mining community has created dictionaries to resolve gene/protein synonyms to improve the identification of genes and proteins in scientific articles ( 1, 8). The need for consistent and unambiguous names has led to the development of a number of conventions for naming genes and proteins. Consequently, poor-quality names have proliferated in both public databases and the scientific literature. New proteins are often named based on homology to existing proteins and many existing proteins have syntactically incorrect or ambiguous names, producing transitive annotation errors.

This issue is further complicated by the prevalence of ambiguous names resulting from the lack of interspecies naming conventions ( 1). However, with the scale of genomic data produced by next-generation sequencing technology and with increasingly automated functional annotation processes, the manual correction of names is no longer feasible. Ideally, before such names are submitted to public sequence databases, they should be manually reviewed by experts to ensure that they are consistent, syntactically correct and unambiguous. Currently, the database features 3080 manual rules that have been entered by JCVI Bioinformatics Analysts as well as 7458 automatically imported names.ĭuring the annotation phase of a typical modern genomics project, functional names are assigned to identified genes and proteins in an automated or semi-automated fashion. The PNU can also be used to correct GenBank table files prior to submission to GenBank. Users can check their protein names against a selected PNU rule collection, generating both statistics and corrected names. Since communities often enforce disparate conventions for naming proteins, the PNU supports grouping rules into user-managed collections. The PNU allows users to generate and manage collections of naming rules, optionally building upon the growing body of rules generated at the J. The PNU is a web-based database for storing and applying naming rules to identify and correct syntactically incorrect protein names, or to replace synonyms with their preferred name. To address the need to generate high-quality protein names, and capture our significant experience correcting protein names manually, we have developed the Protein Naming Utility (PNU, ). Proteins are often named based on homology to known proteins, many of which have problematic names. Generation of syntactically correct and unambiguous names for proteins is a challenging, yet vital task for functional annotation processes.

0 Comments

Universal database naming conventions

Leave a Reply.

Author

Archives

Categories