Metagenomic Data
About Enzymes
Data Sources


The key idea of MetaBioME is to develop a computational resource for finding novel homologs to known CUEs by mining metagenomic datasets and completed bacterial genomes using homology-based approaches, along with advanced analysis options for facilitating validation of the results. Broadly, it has two main components:

  1. Curated database of Commercially Useful Enzymes (CUEs)
  2. Comprehensive bio-mining options to search for novel homologs to known CUEs from metagenomic datasets and completed bacterial genomes.

For comprehensive querying, we have designed four query pages which are described in the following sections.


  • Includes a manually curated dataset of 510 CUEs classified into 9 application categories.
  • Unique search engine for mining metagenomic and bacterial genomic data for discovering novel homologs to known CUEs
  • Lists novel homologs for CUEs from ten metagenomic and 971 completed bacterial genome datasets using ‘MetaSearch’ query page.
  • CUEs can be explored using the ‘CUEsXplorer’ query page.
  • Genes or proteins of interest can be searched for in metagenomic or bacterial genomic datasets using the ‘Meta Align’ option (BLAST/BLAT). Users can upload their own genomic or metagenomic sequences to search for homologs to known CUEs.
  • To facilitate analysis, ORFs have been mapped on metagenomic contigs using online visualization tools available on the ‘Profile’ page.
  • Comprehensive analysis of novel homologs can be performed using the information from sequence alignments, conserved motifs, ORFs display, and conserved domains.

The ‘MetaSearch’ query page is the main feature of MetaBioME and is designed to identify novel homologs for the existing set of CUEs from metagenomic datasets or completed bacterial genomes. It houses a set of CUEs classified into nine application categories which helps the user to select any CUEs of interest based on the area of application.

The following query page shows the available options with an example query (CUEs involved in ‘Environment’ from ‘Soil’ metagenomic source).

This selected set of enzymes can be searched for in the available metagenomic datasets which can be selected by environmental source (shown above) or project name (shown below).

By selecting ‘metagenomic source’, a combined search will be made in all metagenomic datasets which come from that source; e.g., a query for ‘Sludge’ will search both of the sludge datasets namely,  US EBPR sludge metagenome and OZ EBPR sludge metagenome.

A search for CUEs can also be made from the sequences of 971 complete bacterial genomes as shown below.

Queries can also be made by selecting different attributes such as EC number, enzyme name, Swiss-Prot ID, biochemical pathway, and substrate or products using the form below.

Multiple keywords can be submitted using standard Boolean operators (AND and OR) in the 'Enzyme Name or keywords', Biochemical Pathway' and 'Substrate or Products' search fields. Options are also provided for limiting the number of results by specifying the threshold coverage or E-value.

An ‘Advanced Search’ option is available below the nine application categories as shown in the figure below.

By selecting ‘Advanced Search’, users can select any EC class in any of the application categories. Users can also select whole application categories or EC classes or any combination of the two. Clicking on the 'Show CUEs' button as shown above, displays a list of all enzymes (EC numbers) belonging to the selection made above.

On submission of a sample query for CUEs involved in ‘Environment’ from the‘Soil’ metagenomic source, MetaBioME examines the sequence similarity of all Swiss-Prot sequences known for all EC numbers categorised to the ‘Environment’ application category with the ORFs predicted in contigs of the metagenomic datasets coming from the ‘Soil’ metagenomic source. The resultant ‘MetaResults’ page, shown below, displays the qualified hits as a table sorted on the basis of percent coverage (completeness of the alignment) and provides a list of the best matching Swiss-Prot IDs for those EC numbers showing at least 90% alignment coverage with the predicted ORFs from the metagenomic contigs.

The results are shown below for the above sample query.

Comprehensive information for each match can be retrieved from the MetaResults page by clicking on the Swiss-Prot ID link which opens up the ‘MetaBioME profile’ page.

For the above sample query and selection of CUE represented by Swiss-Prot sequence ‘Q51758’, the profile page (shown in parts below) summarizes information about the enzyme, biochemical reaction and pathway (KEGG), curated functional summary, links to external databases, and selected metagenomic dataset with contig containing the homologous ORF. Examples of this information are shown below.

This is followed by an expandable table of ORFs predicted in the above contig and a description of the best ORF that showed the closest match with the Swiss-Prot sequence of the CUE. Next is a contig view window displaying the best matching ORF (green coloured) along with other predicted ORFs (optional) as directional arrows indicating the orientation of the ORFs on the contig.

The following figure displays details of the best (homologous) ORF and all ORFs predicted in the contig.

The following figure displays all the ORFs predicted in the metagenomic contig.


Each arrow can be clicked on to retrieve both the nucleotide and protein sequences of the predicted ORFs as shown below.

Next is an alignment view of the ORF with the enzyme sequence, displaying sequence similarity. This is followed by a summary of the closest match of the novel homologous ORF to a known finished bacterial genome which helps to determine if the novel ORF shares significant sequence similarity with any protein from any known genome. The next table provides information on the closest available PDB structure and displays the 3-D protein structure (if available).

The following figure shows an alignment of the selected CUE with novel homologous ORFs predicted from metagenomic datasets. This is followed by a summary of other Swiss-Prot IDs for the same EC number which showed lower similarity.

In order to provide a useful indicator for the goodness of the results, we include a ‘MetaBioME Rating’, which rates the predicted homologous ORF on a scale of 1-5 stars, with a single star indicating the weakest match and five stars indicating the best match. For the above sample query, this match is a good match (four stars) with identity of 31.99% and coverage of 99.12%.

In the case of a good or best match, users can perform an ‘Advanced Analysis’ (as shown above), which helps to confirm the goodness of the results by using an additonal suite of options. Users can check the alignment of the Swiss-Prot sequence of the selected CUE with that of the homologous ORF as shown below.

Since conserved motifs likely play a key role in the activity of an enzyme, all Swiss-Prot sequences belonging to the same EC number can be aligned together or with the homologous ORF to find the overall sequence homology among the sequences. This could help in the identification of conserved regions in Swiss-Prot sequences of the enzyme and in determining if the homologous ORF also possesses the same conserved regions as shown below.

As another functional confirmation, users can also look for the presence of conserved domains in the homologous ORF by aligning the sequence against the NCBI Conserved Domain Database (CDD). Additionally, the user can also check if the Swiss-Prot sequence of the CUE is present in any other (meta)genomic dataset by carrying out a homology search against other metagenomic datasets or completed bacterial genomes. These additional options are helpful in examining the homology of the novel ORF with the known CUE.

This query page provides options for browsing the CUEs database with respect to application category or EC classification.

Shown below is ann example selection for ‘Application category’

Shown below an example selection for ‘EC classification’ with a selection of EC class 5.

Upon choosing an application category or an EC class as shown above, a list of all CUEs belonging to that selection are populated in the list box provided below. Users can select any EC number for detailed information about that enzyme. The complete list of all CUEs can also be downloaded from the link at the top of this query page. Detailed information for the above query is shown below.

This query page provides users with an option to search for all known enzymes as available in the six EC classes, irrespective of their known role as a CUE, in the metagenomic and bacterial genomic datasets. Upon selection of an EC class, a list box containing all EC numbers belonging to that class opens up as shown below.

Selecting an EC number from this list box provides an expanded page with information about the enzyme, biochemical reaction and pathway, and list of all Swiss-Prot IDs belonging to that EC number, as shown below.

Any representative Swiss-Prot sequence can be selected and searched for one or more metagenomic and bacterial genomic datasets. The results ‘MetaSearch Results’ are shown below.

The profile ‘MetaBioME Profile’ pages, for the submitted query, are similar to those explained in the MetaSearch section.

MetaAlign is an application powered by the BLAT (faster and less sensitive) and BLAST (slower and more sensitive) sequence alignment tools. It provides the user an option to carry out homology-based searches for single or multiple (multi-fasta format) submitted nucleotide or protein sequences against the metagenomic or bacterial sequences. Users can also upload their own genomic or metagenomic sequences to search for the presence of CUE homologs.

Larger files containing multiple sequences can also be uploaded, with an email being sent to the user on completion of analysis. The searches can be limited by selecting the threshold E-value and the output format can be specified as ‘tabular’ or ‘full’ (complete alignment).

How to cite MetaBioME:
MetaBioME: a database to explore commercially useful enzymes in metagenomic datasets. Vineet K. Sharma, Naveen Kumar, Tulika Prakash, Todd D. Taylor. Nucleic Acids Research 2010 Jan;38(Database issue):D468-72. Epub 2009 Nov 11.


Version 1.0, Copyright © 2009-2019 Laboratory for Integrated Bioinformatics, RIKEN, Japan