Metagenomic Data
About Enzymes
Data Sources
Data Sources

Update Policy: There is a plan to update the data and information about enzymes, metagenomic and bacterial genomic data and publicly available reference datasets used in MetaBioME sometime in 2015.

Curation of CUEs was performed using information available from the following resources

  1. ENZYME: Enzyme nomenclature database at ExPASy
  2. Enzyme Nomenclature: Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB)
  3. BRENDA*: The Comprehensive Enzyme Information System
  4. Swiss-Prot: Protein knowledgebase
  5. NCBI PubMed: Comprehensive resource of literature

   *Protein Sequences were retrieved from the Swiss-Prot database.

*BRENDA : References

  • Chang A., Scheer M., Grote A., Schomburg I., Schomburg D. (2009). BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009. Nucleic Acids Res, 37, D588-D592.
  • Barthelmes J., Ebeling C., Chang A., Schomburg I., Schomburg D. BRENDA, AMENDA and FRENDA: the enzyme information system in 2007. Nucleic Acids Res, 35, D511-D514.
  • Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G. & Schomburg, D. (2004). BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res, 32, D431-D433.
  • Schomburg, I., Chang, A. & Schomburg, D. (2002). BRENDA, enzyme data and metabolic information. Nucleic Acids Res, 30, 47-49.
  • Schomburg, I., Chang, A., Hofmann, O., Ebeling, C., Ehrentreich, F. & Schomburg, D. (2002). BRENDA: a resource for enzyme data and metabolic information. Trends Biochem Sci, 27, 54-56.

Metagenomic Data

Publicly available metagenomic data from ten environments (details) has been analyzed in the present version (1.0) of the database.

The metagenomic data was downloaded from NCBI.

Completed Bacterial Genomes

The sequences for 971 completed bacterial genomes were downloaded from NCBI.

ORF Prediction

Complete and partial ORFs were predicted in the metagenomic sequences using SuperGene with a minimum length of 50 amino acids (150 nucleotides) to identify a gene.  The SuperGene algorithm is part of our in-house iMetaSys pipeline which integrates both the Glimmer and MetaGene gene prediction software. SuperGene calls the ORFs as either ‘Exact’ (same start and end predicted by both methods), ‘End_match’ (start is variable and only end is matching), or ‘Unique’ (predicted by only one method), and thus provides additional confidence values for the predicted ORFs. ‘Exact’ ORFs are predicted with higher confidence and with more reliable start and end positions because they are predicted by two independent methods. For the ‘End_match’ cases, the longer ORFs are considered to ensure that no part of an ORF is left out, even if some extra part was included in the initial prediction.

Web User Interface and Metabase Development

Open Source LAMP Technology (Linux (RHEL 4) , Apache (version 2.2.8), MySQL (version 5.0.45), PHP (version 5.2.4), and Perl (version 5.8.5)) were used for the development of the web-based user interface and back-end database called ‘Metabase’.  The web-server was developed using Apache HTTP Server  (Version 2.2.8). Client-side scripting was done using XHTML and jQuery and server-side scripting was done using PHP and XML. The external applications BLAT (v34), BLAST (version 2.2.17), and MAFFT (version 6.240) are provided for analysis.



Version 1.0, Copyright © 2009-2019 Laboratory for Integrated Bioinformatics, RIKEN, Japan