Computational Gene Prediction Using Multiple Sources of Evidence

Jonathan E. Allen; Mihaela Pertea; Steven L. Salzberg

doi:10.1101/gr.1562804

Computational Gene Prediction Using Multiple Sources of Evidence

¹ The Institute for Genomic Research, Rockville, Maryland 20850, USA
² Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA

Abstract

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.

Footnotes

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1562804.
↵3 Corresponding author. E-MAIL jallen{at}tigr.org; FAX (301)838-0208.
- Accepted November 4, 2003.
- Received May 20, 2003.
Cold Spring Harbor Laboratory Press