ConjoinG is a database of 800 Conjoined Genes identified in the Human Genome.
A Conjoined Gene (CG) is defined as a gene formed at the time of transcription by combining at least part of one exon from each of two or more distinct (parent) genes which lie on the same chromosome, in the same orientation, and translate independently into different proteins. In some cases, the transcripts formed by CGs are translated to form chimeric or completely novel proteins.
The following image shows some randomly selected examples of CGs identified in the human genome.
Out of the 800 CGs presently included in ConjoinG, 751 were identified by our approach and 49 CGs were identified by other groups. The 800 CGs are formed by connecting a total of 1,542 known, separate parent genes. 353 representative CGs out of the 751 identified by our approach were subjected to experimental validation using RT-PCR and sequencing methods in 16 human tissues of which 291 (82%) could be confirmed by sequencing the expected conjoined mRNA in at least one selected human tissue.
The main contents of the ConjoinG database include the following:
- Detailed information about the parent genes which form the CGs, such as:
- Genomic location from the NCBI RefSeq database
- Functional information from the HGNC, KOGs, and Gene Ontology databases
- Sequences (mRNA, EST, and CDS) from the NCBI RefSeq database
- Tissue expression details from the NCBI UniGene database
- Information about implication in diseases from the NCBI OMIM database, etc.
- Detailed information about the CGs, such as:
- Genomic location from the NCBI RefSeq database
- Sequences (mRNA, EST, and CDS (when available)) from the NCBI RefSeq database
-
Information about the type of protein formed by the CG transcript such as chimeric, novel, or same as the 5’ or 3’ parent genes
- Nonsense mediated decay (NMD) prediction status
- Conservation in other vertebrate genomes like chimpanzee, rhesus macaque, dog, mouse, etc.
- Exon-intron splicing patterns observed in CGs
- Details about the experiments used to validate the CGs in human tissues such as primers used for the RT-PCR experiments, sequences of the PCR products that mapped back to the genome, etc.
- Tissue expression information from both our experiments and the NCBI GenBank database, etc.