Abstract
With >1000 prokaryotic genome sequencing projects ongoing or already finished, comprehensive comparative analysis of the gene content of these genomes has become viable. To allow for a meaningful comparative analysis, gene prediction of the various genomes should be as accurate as possible. It is clear that improving the state of genome annotation requires automated gene identification methods to cope with the influence of artifacts, such as genomic GC content. There is currently still room for improvement in the state of annotations.
We present a web server and a database of high-quality gene predictions. The web server is a resource for gene identification in prokaryote genome sequences. It implements our previously described, accurate gene finding method REGANOR. We also provide novel gene predictions for 241 complete, or almost complete, prokaryotic genomes. We demonstrate how this resource can easily be utilised to identify promising candidates for currently missing genes from genome annotations with several examples. All data sets are available online.
Similar content being viewed by others
References
GOLD™ genomes online database v 2.0 [online]. Available from URL: http://www.genomesonline.org [Accessed 2006 Jun 21]
Kyrpides NC, Ouzounis CA, Iliopoulos I, et al. Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools. Nucleic Acids Res 2000 Nov 15; 28(22): 4573–6
Dandekar T, Huynen M, Regula JT, et al. Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames. Nucleic Acids Res 2000 Sep 1; 28(17): 3278–88
Daraselia N, Dernovoy D, Tian Y, et al. Reannotation of Shewanella oneidensis genome. OMICS 2003; 7(2): 171–5
Kolker E, Picone AF, Galperin MY, et al. Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations. Proc Natl Acad Sci U S A 2005 Feb 8; 102(6): 2099–104
Overbeek R, Begley T, Butler R, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 2005 Oct 7; 33(17): 5691–702
McHardy AC, Goesmann A, Pühler A, et al. Development of joint application strategies for two microbial gene finders. Bioinformatics 2004 Jul 10; 20(10): 1622–31
Delcher AL, Harmon D, Kasif S, et al. Improved microbial gene identification with GLIMMER. Nucleic Acids Res 1999 Dec 1; 27(23): 4636–41
Badger JH, Olsen GJ. CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 1999 Apr; 16(4): 512–24
Osterman A, Overbeek R. Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol 2003 Apr; 7(2): 238–51
Meyer F, Goesmann A, McHardy AC, et al. GenDB: an open source genome annotation system for prokaryote genomes. Nucleic Acids Res 2003 Apr 15; 31(8): 2187–95
Besemer J, Lomsadze A, Borodovsky M. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes: implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 2001 Jun 15; 29(12): 2607–1
Larsen TS, Krogh A. Easygene: a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformatics 2003 Jun 3; 4: 21
Bocs S, Cruveiller S, Vallenet D, et al. AMIGene: annotation of Microbial Genes. Nucleic Acids Res 2003 Jul 1; 31(13): 3723–6
Guo FB, Ou HY, Zhang CT. ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res 2003 Mar 15; 31(6): 1780–9
Mahony S, Mclnerney JO, Smith TJ, et al. Gene prediction using the self-organizing map: automatic generation of multiple gene models. BMC Bioinformatics 2004 Mar 5; 5: 23
Frishman D, Mironov A, Mewes HW, et al. Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 1998; 26: 2941–7
Shibuya T, Rigoutsos I. Dictionary-driven prokaryotic gene finding. Nucleic Acids Res 2002 Jun 15; 30(12): 2710–25
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997 Mar 1; 25(5): 955–64
Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997 Sep 1; 25(17): 3389–402
GFF format [online]. Available from URL: http://www.sanger.ac.uk/Software/formats/GFF/ [Accessed 2006 Jun 21]
Overbeek RA, Disz T, Stevens RL. The SEED: a peer-to-peer environment for genome annotation. Communications of the ACM 2004; 47: 46–51
Xie G, Keyhani NO, Bonner CA, et al. Ancient origin of the tryptophan operon and the dynamics of evolutionary change. Microbiol Mol Biol Rev 2003 Sep; 67(3): 303–42
Ivanova N, Sorokin A, Anderson I, et al. Related articles: genome sequence of Bacillus cereus and comparative analysis with Bacillus anthracis. Nature 2003 May 1; 423(6935): 87–91
Read TD, Peterson SN, Tourasse N, et al. The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 2003 May 1; 423(6935): 81–6
Acknowledgements
The authors would like to thank Ross Overbeek and Gordon Pusch for initiating the project and valuable comments and discussion. We would also like to thank Niels Larsen for making the SearchforRNAs program available to us. Burkhard Linke is funded by the Deutsche Forschungsgemeinschaft (DFG PU28/25-3). Lutz Krause is supported by the DFG Graduiertenkolleg 635 Bioinformatik. Heiko Neuweger is funded by the EU (GOCECT-2004-505403).
The authors have no conflics of interest that are directly relevant to the content of this article.
Author information
Authors and Affiliations
Corresponding author
Additional information
Availability: The gene finding server is accessible via https://www.cebitec.uni-bielefeld.de/groups/brf/software/reganor/cgi-bin/reganor_upload.cgi. The server software is available with the GenDB genome annotation system (version 2.2.1 onwards) under the GNU general public license. The software can be downloaded from https://sourceforge.net/projects/gendb/. More information on installing GenDB and REGANOR and the system requirements can be found on the GenDB project page http://www.cebitec.uni-bielefeld.de/groups/brf/software/wiki/GenDBWiki/AdministratorDocumentation/GenDBInstallation
These authors contributed equally to this article.
These authors contributed equally to this article.
Rights and permissions
About this article
Cite this article
Linke, B., McHardy, A.C., Neuweger, H. et al. REGANOR. Appl-Bioinformatics 5, 193–198 (2006). https://doi.org/10.2165/00822942-200605030-00008
Published:
Issue Date:
DOI: https://doi.org/10.2165/00822942-200605030-00008