Abstract
sRNAFinder is a new gene prediction system for systematic identification of noncoding genes in bacteria. Most noncoding RNAs in prokaryotes belong to a class of genes denoted as small RNAs (sRNAs). In the model organism Escherichia coli, over 70 sRNA genes have been identified, and the existence of many more has been hypothesized. While various sources of information have proven useful for prediction of novel sRNA genes, most computational approaches do not take advantage of the disparate sources of data available for identifying these noncoding RNA genes. We present a general probabilistic method for predicting sRNA genes in bacteria. The method, based on a general Markov model, is implemented in the computational tool sRNAFinder. sRNAFinder incorporates heterogeneous data sources for gene prediction, including primary sequence data, transcript expression data from microarray experiments, and conserved RNA structure information as determined from comparative genomics analysis. We demonstrate that sRNAFinder improves upon current tools for identifying small, noncoding genes in bacteria.
Similar content being viewed by others
References
Alexandersson M., Cawley S., Pachter L. (2003). SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13: 496–502
Allen J.E., Pertea M., Salzberg S.L. (2004). Computational gene prediction using multiple sources of evidence. Genome Res. 14: 142–148
Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402
Argaman L., Hershberg R., Vogel J., Bejerano G., Wagner E.G., Margalit H., Altuvia S. (2001). Novel small RNA-encoding genes in the intergenic regions of Escherichia coli. Curr. Biol. 11: 941–950
Brejova B., Brown D.G., Li M., Vinar T. (2005). ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21: i57–i65
Burge C., Karlin S. (1997). Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78–94
Carafa Y.A., Brody E., Thermes C. (1990). Prediction of rho-independent Escherichia coli transcription terminators. J. Mol. Biol. 216: 835–858
Carter R.J., Dubchak I., Holbrook S.R. (2001). A computational approach to identify genes for functional RNAs in genomic sequences. Nucleic Acids Res. 29: 3928–3938
Chen S., Lesnik E.A., Hall T.A., Sampath R., Griffey R.H., Ecker D.J., Blyn L.B. (2002). A bioinformatics based approach to discover small RNA genes in the Escherichia coli genome. Biosystems 65: 157–177
Coventry A., Kleitman D.J., Berger B. (2004). MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proc. Natl. Acad. Sci. USA 101: 12102–12107
Ermolaeva M.D., Khalak H.G., White O., Smith H.O., Salzberg S.L. (2000). Prediction of transcription terminators in bacterial genomes. J. Mol. Biol. 301: 27–33
Flicek P., Keibler E., Hu P., Korf I., Brent M.R. (2003). Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map. Genome Res. 13: 46–54
Forney G.D. Jr. (1973). The Viterbi algorithm. Proc. IEEE 61: 263–278
Gottesman S. (2004). The small RNA regulators of Escherichia coli: roles and mechanisms. Annu. Rev. Microbiol. 58: 303–328
Gumbel E.J. (1958). Statistics of Extremes. Columbia University Press, New York
Hershberg R., Altuvia S., Margalit H. (2003). A survey of small RNA-encoding genes in Escherichia coli. Nucleic Acids Res. 31: 1813–1820
Hershberg R., Bejerano G., Santos-Zavaleta A., Margalit H. (2001). PromEC: an updated database of Escherichia coli mRNA promoters with experimentally identified transcriptional start sites. Nucleic Acids Res. 29: 277
Howard R.A. (1971). Dynamic Probabilistic Systems, Vol. II: Semi-Markov and Decision Processes. Wiley, New York
Howe K.L., Chothia T., Durbin R. (2002). GAZE: A genetic framework for the integration of gene-prediction data by dynamic programming. Genome Res. 12: 1418–1427
Korf I., Flicek P., Duan D., Brent M.R. (2001). Integrating genomic homology into gene structure prediction. Bioinformatics 17: S140–S148
Lai E.C., Tomancak P., Williams R.W., Rubin G.M. (2003). Computational identification of Drosophila microRNA genes. Genome Biol. 4: R42
Lenz D.H., Mok K.C., Lilley B.N., Kulkarni R.V., Wingreen N.S., Bassler B.L. (2004). The small RNA chaperone Hfq and multiple small RNAs control quorum sensing in Vibrio harveyi and Vibrio cholerae. Cell 118: 69–82
Lim L.P., Glasner M.E., Yekta S., Burge C.B., Bartel D.P. (2003). Vertebrate microRNA genes. Science 299: 1540
Livny J., Fogel M.A., Davis B.M., Waldor M.K. (2005). sRNAPredict: an integrative computational approach to identify sRNAs in bacterial genomes. Nucleic Acids Res. 33: 4096–4105
Masse E., Majdalani N., Gottesman S. (2003). Regulatory roles of small RNAs in bacteria. Curr. Opin. Microbiol. 6: 120–124
Parra G., Agarwal P., Abril J.F., Wiehe T., Fickett J.W., Guigo R. (2003). Comparative gene prediction in human and mouse. Genome Res. 13: 108–117
Rabiner L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77: 257–285
Rivas E., Eddy S.R. (2001). Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2: 8
Rivas E., Klein R.J., Jones T.A., Eddy S.R. (2001). Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr. Biol. 11: 1369–1373
Selinger D.W., Cheung K.J., Mei R., Johansson E.M., Richmond C.S., Blattner F.R., Lockhart D.J., Church G.M. (2000). RNA expression analysis using a 30 base pair resolution Escherichia coli genome array. Nat. Biotechnol. 18: 1262–1268
Staden R. (1984). Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 12: 505–519
Storz G., Gottesman S. (2006). Versatile roles of small RNA regulators in bacteria. In: Gesteland, R.F., Cech, T.R., Atkins, J.F. (eds) The RNA World, pp 567–594. Cold Spring Harbor Laboratory Press, Cold Spring Harbor
Tjaden B., Goodwin S.S., Opdyke J.A., Guillier M., Fu D.X., Gottesman S., Storz G. (2006). Target prediction for small, noncoding RNAs in bacteria. Nucleic Acids Res. 34: 2791–2802
Tjaden B., Haynor D.R., Stolyar S., Rosenow C., Kolker E. (2002). Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis. Bioinformatics 18: S337–S344
Tjaden B., Saxena R.M., Stolyar S., Haynor D.R., Kolker E., Rosenow C. (2002). Transcriptome analysis of Escherichia coli using high-density oligonucleotide probe arrays. Nucleic Acids Res. 30: 3732–3738
Washietl S., Hofacker I.L., Stadler P.F. (2005). Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. USA 102: 2454–2459
Wassarman K.M., Repoila F., Rosenow C., Storz G., Gottesman S. (2001). Identification of novel small RNAs using comparative genomics and microarrays. Genes Dev. 15: 1637–1651
Workman C., Krogh A. (1999). No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. Nucleic Acids Res. 27: 4816–4822
Yeh R., Lim L.P., Burge C.B. (2001). Computational inference of homologous gene structures in the human genome. Genome Res. 11: 803–816
Zhang A., Wassarman K.M., Rosenow C., Tjaden B., Storz G., Gottesman S. (2003). Global analysis of small RNA and mRNA targets of Hfq. Mol. Microbiol. 50: 1111–1124
Zhang L., Pavlovic V., Cantor C.R., Kasif S. (2003). Human-mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Res. 13: 1190–1202
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tjaden, B. Prediction of small, noncoding RNAs in bacteria using heterogeneous data. J. Math. Biol. 56, 183–200 (2008). https://doi.org/10.1007/s00285-007-0079-5
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-007-0079-5