Quality scores and SNP detection in sequencing-by-synthesis systems

  1. William Brockman1,3,4,
  2. Pablo Alvarez1,3,5,
  3. Sarah Young1,
  4. Manuel Garber1,
  5. Georgia Giannoukos1,
  6. William L. Lee1,
  7. Carsten Russ1,
  8. Eric S. Lander1,2,
  9. Chad Nusbaum1, and
  10. David B. Jaffe1,6
  1. 1 Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02141, USA;
  2. 2 Whitehead Institute for Biomedical Research, MIT, Cambridge, Massachusetts 02139, USA
  1. 3 These authors contributed equally to this work.

Abstract

Promising new sequencing technologies, based on sequencing-by-synthesis (SBS), are starting to deliver large amounts of DNA sequence at very low cost. Polymorphism detection is a key application. We describe general methods for improved quality scores and accurate automated polymorphism detection, and apply them to data from the Roche (454) Genome Sequencer 20. We assess our methods using known-truth data sets, which is critical to the validity of the assessments. We developed informative, base-by-base error predictors for this sequencer and used a variant of the phred binning algorithm to combine them into a single empirically derived quality score. These quality scores are more useful than those produced by the system software: They both better predict actual error rates and identify many more high-quality bases. We developed a SNP detection method, with variants for low coverage, high coverage, and PCR amplicon applications, and evaluated it on known-truth data sets. We demonstrate good specificity in single reads, and excellent specificity (no false positives in 215 kb of genome) in high-coverage data.

Footnotes

  • 4 Present addresses: Google, Inc., Cambridge, MA 02142, USA;

  • 5 Akamai, Cambridge, MA 02142, USA.

  • 6 Corresponding author.

    6 E-mail jaffe{at}broad.mit.edu; fax (617) 452-4588.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.070227.107.

    • Received August 10, 2007.
    • Accepted January 15, 2008.
| Table of Contents

Preprint Server