Computational prediction of eukaryotic protein-coding genes

Zhang, Michael Q.

doi:10.1038/nrg890

Review Article
Published: 01 September 2002

Computational prediction of eukaryotic protein-coding genes

Michael Q. Zhang¹

Nature Reviews Genetics volume 3, pages 698–709 (2002)Cite this article

3124 Accesses
153 Citations
1 Altmetric
Metrics details

Key Points

With the recent explosion in the availability of genome data, gene-finding programs have proliferated. However, the accuracy with which genes can be predicted is still far from satisfactory. This review provides background information and surveys the latest developments in gene-prediction programs. It also highlights the problems that face the gene-prediction field and discusses future research goals.
The main characteristic of a eukaryotic gene is its organization into exons and introns. The 'exon-definition' model explains how the splicing machinery recognizes exons in a sea of intronic DNA. It indicates that an internal exon is initially recognized by a chain of interacting splicing factors that span it. The binding of these factors to pre-mRNA is responsible for the non-random nucleotide patterns that form the molecular basis of all exon-recognition algorithms.
Correctly identifying the boundaries of a gene is essential when searching for several genes in a large genomic region. It is relatively easy to find internal exons, but many gene-prediction programs fail to identify gene boundaries. Determining the 3′ end of a gene is easier than determining its 5′ end, mainly because of the difficulty of identifying the promoter and transcriptional start-site sequences, and because the 5′ ends of cDNA sequences are often truncated.
As current gene-prediction programs are biased towards intron-containing genes, many intronless genes might have been missed by such programs. Many false-positive exon predictions have also been caused by pseudogenes. Developing better and more specialized algorithms to recognize them is becoming increasingly important.
Hidden Markov model (HMM)-based programs can be used to predict multiple genes, partial genes and genes on both strands, all at the same time. These features are essential when annotating genomes or large chunks of sequence data, such as large contigs, in an automated fashion.
By comparing the genomes of several closely related species, conserved regulatory regions can be identified easily. For these reasons, making use of comparative genomic data is an important future challenge for the gene-prediction field.
More functional genomics methods for finding genes are desperately needed to improve gene prediction. Only with sufficient mechanistic data can gene prediction be transformed from being statistical to being biological in nature. The field is working towards the ultimate dynamic model that can identify the consecutive exons of a gene, from its 5′ to its 3′ ends, as if they were being co-transcriptionally recognized and spliced.

Abstract

The human genome sequence is the book of our life. Buried in this large volume are our genes, which are scattered as small DNA fragments throughout the genome and comprise a small percentage of the total text. Finding these indistinct 'needles' in a vast genomic 'haystack' can be extremely challenging. In response to this challenge, computational prediction approaches have proliferated in recent years that predict the location and structure of genes. Here, I discuss these approaches and explain why they have become essential for the analyses of newly sequenced genomes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: The central dogma of gene expression.**

**Figure 4: Different states and transitions in the Genscan hidden Markov model.**

**Figure 5: A generalized pair hidden Markov model.**

Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning

Article Open access 08 July 2022

Flavio Pazos Obregón, Diego Silvera, … Rafael Cantera

Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints

Article Open access 04 September 2019

Joe G. Greener, Shaun M. Kandathil & David T. Jones

Highly accurate protein structure prediction for the human proteome

Article Open access 22 July 2021

Kathryn Tunyasuvunakool, Jonas Adler, … Demis Hassabis

References

Claverie, J.-M. Computational methods for the identification of genes in vertebrate genomic sequences. Hum. Mol. Genet. 6, 1735–1744 (1997).
CAS PubMed Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structure in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).In this paper, the popular Genscan gene-prediction algorithm was first reported.
CAS PubMed Google Scholar
Milanesi, L. & Rogozin, I. B. in Guide to Human Genome Computing 2nd edn (ed. Bishop, M. J.) 215–260 (Academic, New York, 1998).
Google Scholar
Krogh, A. in Guide to Human Genome Computing 2nd edn (ed. Bishop, M. J.) 261–274 (Academic, New York, 1998).
Google Scholar
Pavy, N. et al. Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences. Bioinformatics 15, 887–899 (1999).
CAS PubMed Google Scholar
Rogic, S., Mackworth, A. K. & Ouellette, F. B. F. Evaluation of gene-finding programs on mammalian sequences. Genome Res. 11, 817–832 (2001).
CAS PubMed PubMed Central Google Scholar
Solovyev, V. V. in Current Topics in Computational Molecular Biology (eds Jiang, T., Xu, Y. & Zhang, M. Q.) 201–248 (MIT Press, Cambridge, Massachusetts, 2002).An up-to-date introduction and review on computational gene-prediction methods.
Google Scholar
Brent, M. R. Predicting full-length transcripts. Trends Biotechnol. 20, 273–275 (2002).
CAS PubMed Google Scholar
Zhang, M. Q. Statistical features of human exons and their flanking regions. Hum. Mol. Genet. 7, 919–932 (1998).
CAS PubMed Google Scholar
Senapathy, P., Shapiro, M. B. & Harris, N. L. Splice junctions, branch point sites, and exons: sequence statistics, identification and application to genome project. Methods Enzymol. 183, 252–278 (1990).A good introduction to the statistical features of splicing signals and exons.
CAS PubMed Google Scholar
Chen, T. & Zhang, M. Q. POMBE: a fission yeast gene-finding and exon–intron structure prediction system. Yeast 14, 701–710 (1998).
CAS PubMed Google Scholar
Lim, L. P. & Burge, C. B. A computational analysis of sequence features involved in recognition of short introns. Proc. Natl Acad. Sci. USA 98, 11193–11198 (2001).A systematic study of the sequence features that might define a short intron.
CAS PubMed PubMed Central Google Scholar
Robberson, B. L., Cote, G. J. & Berget, S. M. Exon definition may facilitate splice site selection in RNAs with multiple exons. Mol. Cell. Biol. 10, 84–94 (1990).
CAS PubMed PubMed Central Google Scholar
Ripley, B. D. Pattern Recognition and Neural Networks (Cambridge Univ. Press, Cambridge, UK, 1996).
Google Scholar
Solovyev, V. V., Salamov, A. A. & Lawrence, C. B. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res. 22, 248–250 (1994).
Google Scholar
Pertea, M., Lin, X. & Salzberg, S. L. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 29, 1185–1190 (2001).
CAS PubMed PubMed Central Google Scholar
Fickett, J. W. & Tung, C.-S. Assessment of protein coding measures. Nucleic Acids Res. 20, 6441–6450 (1992).This is a comprehensive assessment of protein-coding measures, which are used in many gene-prediction algorithms.
CAS PubMed PubMed Central Google Scholar
Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26, 544–548 (1998).
CAS PubMed PubMed Central Google Scholar
Bernardi, G. The human genome: organization and evolutionary history. Annu. Rev. Genet. 29, 445–476 (1995).
CAS PubMed Google Scholar
Zhang, M. Q. Identification of protein coding regions in the human genome based on quadratic discriminant analysis. Proc. Natl Acad. Sci. USA 94, 565–568 (1997).
CAS PubMed PubMed Central Google Scholar
Uberbacher, E. C. & Mural, R. J. Locating protein coding segments in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl Acad. Sci. USA 88, 11261–11265 (1991).
CAS PubMed PubMed Central Google Scholar
Graber, J. H., Cantor, C. R., Mohr, S. C. & Smith, T. F. In silico detection of control signals: mRNA 3′-end-processing sequences in diverse species. Proc. Natl Acad. Sci. USA 96, 14055–14060 (1999).
CAS PubMed PubMed Central Google Scholar
Tabaska, J. E. & Zhang, M. Q. Detection of polyadenylation signals in human DNA sequences. Gene 231, 77–86 (1999).
CAS PubMed Google Scholar
Tabaska, J. E., Davuluri, R. V. & Zhang, M. Q. Identifying the 3′-terminal exon in human DNA. Bioinformatics 17, 602–607 (2001).
CAS PubMed Google Scholar
Schell, T., Kulozik, A. E. & Hentze, M. W. Integration of splicing, transport and translation to achieve mRNA quality control by the nonsense-mediated decay pathway. Genome Biol. 3, ReviewS1006 (2002).
PubMed PubMed Central Google Scholar
Cartegni, L., Chew, S. L. & Krainer, A. R. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Rev. Genet. 3, 285–298 (2002).
CAS PubMed Google Scholar
Suzuki, Y. et al. DBTSS: database of human transcriptional start sites and full-length cDNAs. Nucleic Acids Res. 30, 328–331 (2002).
CAS PubMed PubMed Central Google Scholar
Carey, M. & Smale, S. T. Transcriptional Regulation in Eukaryotes: Concepts, Strategies, and Techniques (Cold Spring Harbor Laboratory Press, New York, 2000).
Google Scholar
Fickett, J. W. & Hatzigeorgiou, A. G. Eukaryotic promoter recognition. Genome Res. 7, 861–878 (1997).The first comparison of promoter prediction programs.
CAS PubMed Google Scholar
Werner, T. Models for prediction and recognition of eukaryotic promoters. Mamm. Genome 23, 168–175 (1999).
Google Scholar
Ohler, U. & Niemann, H. Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet. 17, 56–60 (2001).
CAS PubMed Google Scholar
Zhang, M. Q. in Current Topics in Computational Molecular Biology (eds Jiang, T., Xu, Y. & Zhang, M. Q.) 249–268 (MIT Press, Cambridge, Massachusetts, 2002).
Google Scholar
Ioshikhes, I. P. & Zhang, M. Q. Large-scale human promoter mapping using CpG islands. Nature Genet. 26, 61–63 (2000).
CAS PubMed Google Scholar
Zhang, M. Q. Identification of human gene core promoters in silico. Genome Res. 8, 319–326 (1998).
CAS PubMed PubMed Central Google Scholar
Scherf, M., Klingenhoff, A. & Werner, T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J. Mol. Biol. 297, 599–606 (2000).
CAS PubMed Google Scholar
Solovyev, V. & Salamov, A. The Gene-Finder computer tools for analysis of human and model organisms genome sequences. Proc. ISMB 5, 294–302 (1997).
CAS PubMed Google Scholar
Down, T. A. & Hubbard, T. J. P. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12, 458–461 (2002).
CAS PubMed PubMed Central Google Scholar
Frech, K., Quandt, K. & Werner, T. Muscle actin genes: a first step towards computational classification of tissue specific promoters. In Silico Biol. 1, 29–38 (1998).
CAS PubMed Google Scholar
Kel, A., Kel-Margoulis, O., Banemko, V. & Wingender, E. Recognition of NFATp/AP-1 composite elements within genes induced upon the activation of immune cells. J. Mol. Biol. 288, 353–376 (1999).
CAS PubMed Google Scholar
Kozak, M. A progress report on translational control in eukaryotes. SciSTKE 2001, PE1 (2001).
CAS Google Scholar
Davuluri, R. V., Grosse, I. & Zhang, M. Q. Computational identification of promoters and first exons in the human genome. Nature Genet. 29, 412–417 (2001).The first report of a first-exon prediction algorithm.
CAS PubMed Google Scholar
Fickett, J. W. ORFs and genes: how strong a connection? J. Comput. Biol. 2, 117–123 (1995).
CAS PubMed Google Scholar
Harrison, P. M. et al. Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res. 12, 272–280 (2002).
CAS PubMed PubMed Central Google Scholar
Gelfand, M. S. & Roytberg, M. A. Prediction of the exon–intron structure by a dynamic programming approach. Biosystems 30, 173–182 (1993).
CAS PubMed Google Scholar
Snyder, E. E. & Stormo, G. D. Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Res. 11, 607–613 (1993).
Google Scholar
Stormo, G. D. & Haussler, D. Optimally parsing a sequence into different classes based on multiple types of evidence. Proc. Int. Conf. ISMB 2, 369–375 (1994).
CAS PubMed Google Scholar
Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989).
Google Scholar
Krogh, A. Two methods for improving performance of an HMM and their application for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997).
CAS PubMed Google Scholar
Kulp, D., Haussler, D., Reese, M. G. & Eeckman, F. H. A generalized hidden Markov model for the recognition of human genes in DNA. Proc. Int. Conf. Intell. Syst. Mol. Biol. 4, 134–142 (1996).
CAS PubMed Google Scholar
Salamov, A. & Solovyev, V. Ab initio gene finding in Drosophila genome DNA. Genome Res. 10, 516–522 (2000).
CAS PubMed PubMed Central Google Scholar
Hooper, P. M., Zhang, H. & Wishart, D. S. Prediction of genetic structure in eukaryotic DNA using reference point logistic regression and sequence alignment. Bioinformatics 16, 425–438 (2000).
CAS PubMed Google Scholar
Cox, D. R. & Snell, E. J. Analysis of Binary Data 2nd edn (Chapman & Hall, London, 1989).
Google Scholar
Rogic, S., Mackworth, A. K. & Ouellette, F. B. F. Improving gene recognition accuracy by combining predictions from two gene-finding programs. Bioinformatics (in the press).
Lukashin, A. V. & Borodovski, M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26, 1107–1115 (1998).
CAS PubMed PubMed Central Google Scholar
Reese, M. G., Kulp, D., Tammana, H. & Haussler, D. Genie — gene finding in Drosophila melanogaster. Genome Res. 10, 529–538 (2000).
CAS PubMed PubMed Central Google Scholar
Burset, M. & Guigo, R. Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996).The first comprehensive evaluation of gene-prediction programs using a common standard training set.
CAS PubMed Google Scholar
Korf, I., Flicek, P., Duan, D. & Brent, M. R. Integrating genomic homology into gene structure prediction. Bioinformatics 17 (Suppl.), 140–148 (2001).
Google Scholar
Frisch, M. et al. In silico prediction of scaffold/matrix attachment regions in large genome sequences. Genome Res. 12, 349–354 (2002).
CAS PubMed PubMed Central Google Scholar
Zhan, H. C., Liu, D. P. & Liang, C. C. Insulator: from chromatin domain boundary to gene regulation. Hum. Genet. 109, 471–478 (2001).
CAS PubMed Google Scholar
Gish, W. & States, D. J. Identification of protein coding regions by database similarity search. Nature Genet. 3, 266–272 (1993).
CAS PubMed Google Scholar
Florea, L. et al. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8, 967–974 (1998).
CAS PubMed PubMed Central Google Scholar
Gelfand, M. S., Mironov, A. & Pevner, P. Gene recognition via spliced sequence alignment. Proc. Natl Acad. Sci. USA 93, 9061–9066 (1996).
CAS PubMed PubMed Central Google Scholar
Kulp, D., Haussler, D., Reese, M. G. & Eeckman, F. H. Integrating database homology in a probabilistic gene structure model. Pacif. Symp. Biocomput. 232–244 (1997).
Xu, Y. & Uberbacher, E. C. Gene prediction by pattern recognition and homology search. Proc. Int. Conf. Intell. Syst. Mol. Biol. 4, 241–251 (1996).
CAS PubMed Google Scholar
Krogh, A. Using database matches with HMMgene for automated gene detection in Drosophila. Genome Res. 10, 523–528 (2000).
CAS PubMed PubMed Central Google Scholar
Birney, E. & Durbin, R. Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548 (2000).
CAS PubMed PubMed Central Google Scholar
Gotoh, O. Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps. Bioinformatics 16, 190–202 (2000).
CAS PubMed Google Scholar
Guigo, R. et al. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).A comparison of ab initio and alignment-based gene-prediction programs.
CAS PubMed PubMed Central Google Scholar
Yeh, R. F., Lim, L. P. & Burge, C. B. Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803–816 (2001).
CAS PubMed PubMed Central Google Scholar
Reese, M. G. et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 10, 483–501 (2000).
CAS PubMed PubMed Central Google Scholar
Pennacchio, L. A. & Rubin, E. M. Genomic strategies to identify mammalian regulatory sequences. Nature Rev. Genet. 2, 100–119 (2001).
CAS PubMed Google Scholar
Mayor, C. et al. VISTA: visualizing global DNA sequence alignment of arbitrary length. Bioinformatics 16, 1046–1047 (2000).
CAS PubMed Google Scholar
Schwartz, S. et al. PipMaker — a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586 (2000).
CAS PubMed PubMed Central Google Scholar
Batzoglou, S. et al. Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000).
CAS PubMed PubMed Central Google Scholar
Kent, W. J. & Zahler, A. M. Conservation, regulation, synteny, and introns in a large C. briggsae–C. elegans genomic alignment. Genome Res. 10, 1115–1125 (2000).
CAS PubMed Google Scholar
Bafna, V. & Huson, D. H. The conserved exon method for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 3–12 (2000).
CAS PubMed Google Scholar
Wiehe, T., Gebauer-Jung, S., Mitchell-Olds, T. & Guigo, R. SGP-1: prediction and validation of homologous genes based on sequence alignments. Genome Res. 11, 1574–1583 (2001).
CAS PubMed PubMed Central Google Scholar
Pachter, L., Alexandersson, M. & Cawley, S. Applications of generalized pair hidden Markov models to alignment and gene finding problems. J. Comput. Biol. 9, 389–399 (2002).
CAS PubMed Google Scholar
Claverie, J.-M. From bioinformatics to computational biology. Genome Res. 10, 1277–1279 (2000).
CAS PubMed Google Scholar
Zhang, M. Q. Predicting full-length transcripts. Nature Biotechnol. 20, 275 (2002).
CAS Google Scholar
Miyajima, N., Burge, C. B. & Saito, T. Computational and experimental analysis identifies many novel human genes. Biochem. Biophys. Res. Commun. 272, 801–807 (2000).
CAS Google Scholar
Shoemaker, D. D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).
CAS PubMed Google Scholar
Frazer, K. A. et al. Evolutionarily conserved sequences on human chromosome 21. Genome Res. 11, 1651–1659 (2001).
CAS PubMed PubMed Central Google Scholar
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).
CAS PubMed Google Scholar
Lee, S. et al. Correct identification of genes from serial analysis of gene expression tag sequences. Genomics 79, 598–602 (2002).
CAS PubMed Google Scholar
Horak, C. E. & Snyder, M. ChIP-chip: a genomic approach for identifying transcription factor binding sites. Methods Enzymol. 350, 469–483 (2002).
CAS PubMed Google Scholar
Clark, T. A., Sugnet, C. W. & Ares, M. Jr. Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science 296, 907–910 (2002).
CAS PubMed Google Scholar
Yeakey, J. M. et al. Profiling alternative splicing on fiber-optic arrays. Nature Biotechnol. 20, 353–358 (2002).
Google Scholar
Goldstrohm, A. C., Greenleaf, A. L. & Garcia-Blanco, M. A. Co-transcriptional splicing of pre-messenger RNAs: considerations for the mechanism of alternative splicing. Gene 277, 31–47 (2001).
CAS PubMed Google Scholar
Proudfoot, N. J., Furger, A. & Dye, M. J. Integrating mRNA processing with transcription. Cell 108, 501–512 (2002).A recent review on the interdependence of transcription and RNA processing.
CAS PubMed Google Scholar

Download references

Acknowledgements

My lab is supported by National Institutes of Health (NIH) grants. I thank L. Pachter and M. Alexandersson for providing their manuscript before publication; and R. Guigo and M. Brent for presenting their recent comparative analysis of human and mouse drafts at the 1% Workshop of NIH/NHGRI in July 2002. I also thank the anonymous reviewers for many helpful suggestions.

Author information

Authors and Affiliations

Watson School of Biological Sciences, Cold Spring Harbor Laboratory, 1 Bungtown Road, PO Box 100, Cold Spring Harbor, 11724, New York, USA
Michael Q. Zhang

Authors

Michael Q. Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Glossary

REFSEQ: The NCBI Reference Sequence project (RefSeq) provides curated gene, mRNA and protein sequences that reflect current knowledge about a sequence and its function, and that are available in the GenBank and NCBI databases.
TRAINING DATA SET: The known examples of an object (for example, an exon) that are used to train prediction algorithms, so that they learn the rules for predicting an object. They can be positive training sets (consisting of true objects, such as exons) or negative training sets (consisting of false objects, such as pseudoexons).
SPLICEOSOME: A ribonucleoprotein complex that is involved in splicing nuclear pre-mRNA. It is composed of five small nuclear ribonucleoproteins (snRNPs) and more than 50 non-snRNPs, which recognize and assemble on exon–intron boundaries to catalyse intron processing of the pre-mRNA.
ISOCHORE: A large region of mammalian genomic DNA sequence in which C+G compositions are relatively uniform.
LOG-NORMAL DISTRIBUTION: The distribution of a random variable, the logarithm of which follows a normal distribution. A normal log (length) implies a strong fixed-length selection pressure.
EXON LENGTH DISTRIBUTION: A statistical distribution of exon sizes.
NONSENSE-MEDIATED DECAY: (NMD). A pathway ensuring that mRNAs that have premature stop codons are eliminated as templates for translation.
PSEUDOEXON: A pre-mRNA sequence that resembles an exon, both in its size and in the presence of flanking splice-site sequences, but that is never recognized as an exon by the splicing machinery (the spliceosome).
KOZAK SEQUENCE: The consensus sequence for initiation of translation in vertebrates.
PSEUDOGENE: A DNA sequence that was derived originally from a functional protein-coding gene that has lost its function, owing to the presence of one or more inactivating mutations.
BLASTX: Basic local alignment tool (BLAST) is a computer program for comparing DNA and protein sequences. The BLASTX version compares a nucleotide query sequence that is translated in all reading frames with a protein sequence database.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, M. Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3, 698–709 (2002). https://doi.org/10.1038/nrg890

Download citation

Issue Date: 01 September 2002
DOI: https://doi.org/10.1038/nrg890

This article is cited by

Identification and Functional Analysis of Acyl-Acyl Carrier Protein Δ9 Desaturase from Nannochloropsis oceanica
- Ruigang Yang
- Hui Wang
- Dongyi Zhang
Journal of Microbiology (2023)
Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models
- Jeanne Wilbrandt
- Bernhard Misof
- Oliver Niehuis
BMC Genomics (2019)
COGNATE: comparative gene annotation characterizer
- Jeanne Wilbrandt
- Bernhard Misof
- Oliver Niehuis
BMC Genomics (2017)
Disease modeling in genetic kidney diseases: zebrafish
- Heiko Schenk
- Janina Müller-Deile
- Mario Schiffer
Cell and Tissue Research (2017)
SHIFT: Server for hidden stops analysis in frame-shifted translation
- Arun Gupta
- Tiratha Raj Singh
BMC Research Notes (2013)

Computational prediction of eukaryotic protein-coding genes

Key Points

Abstract

Access options

Similar content being viewed by others

Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning

Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints

Highly accurate protein structure prediction for the human proteome

References

Acknowledgements

Author information

Authors and Affiliations

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

Identification and Functional Analysis of Acyl-Acyl Carrier Protein Δ9 Desaturase from Nannochloropsis oceanica

Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models

COGNATE: comparative gene annotation characterizer

Disease modeling in genetic kidney diseases: zebrafish

SHIFT: Server for hidden stops analysis in frame-shifted translation

Search

Quick links

Key Points

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Related links

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links