Automating sequence-based detection and genotyping of SNPs from diploid samples

Stephens, Matthew; Sloan, James S; Robertson, P D; Scheet, Paul; Nickerson, Deborah A

doi:10.1038/ng1746

Technical Report
Published: 19 February 2006

Automating sequence-based detection and genotyping of SNPs from diploid samples

Matthew Stephens^1,2,
James S Sloan²,
P D Robertson²,
Paul Scheet¹ &
…
Deborah A Nickerson²

Nature Genetics volume 38, pages 375–381 (2006)Cite this article

563 Accesses
120 Citations
6 Altmetric
Metrics details

Abstract

The detection of sequence variation, for which DNA sequencing has emerged as the most sensitive and automated approach, forms the basis of all genetic analysis. Here we describe and illustrate an algorithm that accurately detects and genotypes SNPs from fluorescence-based sequence data. Because the algorithm focuses particularly on detecting SNPs through the identification of heterozygous individuals, it is especially well suited to the detection of SNPs in diploid samples obtained after DNA amplification. It is substantially more accurate than existing approaches and, notably, provides a useful quantitative measure of its confidence in each potential SNP detected and in each genotype called. Calls assigned the highest confidence are sufficiently reliable to remove the need for manual review in several contexts. For example, for sequence data from 47–90 individuals sequenced on both the forward and reverse strands, the highest-confidence calls from our algorithm detected 93% of all SNPs and 100% of high-frequency SNPs, with no false positive SNPs identified and 99.9% genotyping accuracy. This algorithm is implemented in a software package, PolyPhred version 5.0, which is freely available for academic use.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Sequence traces (chromatograms) for four individuals.**

**Figure 2: Removal of systematic variation in peak height improves discrimination between heterozygotes and homozygotes.**

**Figure 3: Missed SNP rate versus false discovery rate for different data sets.**

**Figure 4: Dependence of performance on sequence quality.**

Detection of low-frequency DNA variants by targeted sequencing of the Watson and Crick strands

Article 03 May 2021

Joshua D. Cohen, Christopher Douville, … Bert Vogelstein

SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly method

Article Open access 12 October 2022

Mikkel Meyer Andersen, Steffan Noe Christiansen, … Niels Morling

A robust benchmark for detection of germline large deletions and insertions

Article 15 June 2020

Justin M. Zook, Nancy F. Hansen, … Marc Salit

References

Carlson, C.S., Newman, T.L. & Nickerson, D.A. SNPing in the human genome. Curr. Opin. Chem. Biol. 5, 78–85 (2001).
Article CAS Google Scholar
Nickerson, D.A., Tobe, V.O. & Taylor, S.L. PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res. 25, 2745–2751 (1997).
Article CAS Google Scholar
Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22, 231–238 (1999).
Article CAS Google Scholar
Carlson, C.S. et al. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat. Genet. 33, 518–521 (2003).
Article CAS Google Scholar
Marth, G.T. et al. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23, 452–456 (1999).
Article CAS Google Scholar
Ning, Z., Cox, A.J. & Mullikin, J.C. SSAHA: a fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001).
Article CAS Google Scholar
Ewing, B. & Green, P. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
Article CAS Google Scholar
Weckx, S. et al. novoSNP, a novel computational tool for sequence variation discovery. Genome Res. 15, 436–442 (2005).
Article CAS Google Scholar
Kwok, P.Y., Carlson, C., Yager, T.D., Ankener, W. & Nickerson, D.A. Comparative analysis of human DNA variations by fluorescence-based sequencing of PCR products. Genomics 23, 138–144 (1994).
Article CAS Google Scholar
Parker, L.T. et al. AmpliTaq DNA polymerase, FS dye-terminator sequencing: analysis of peak height patterns. Biotechniques 21, 694–699 (1996).
Article CAS Google Scholar
Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using Phred. I. accuracy assessment. Genome Res. 8, 175–185 (1998).
Article CAS Google Scholar
Hinds, D.A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).
Article CAS Google Scholar
The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Bhangale, T.R., Rieder, M.J., Livingston, R.J. & Nickerson, D.A. Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes. Hum. Mol. Genet. 14, 59–69 (2005).
Article CAS Google Scholar
Olden, K. & Wilson, S. Environmental health and genomics: visions and implications. Nat. Rev. Genet. 1, 149–153 (2000).
Article CAS Google Scholar
Dempster, A.P., Laird, N.M. & Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 34, 1–38 (1977).
Google Scholar
Gordon, D., Abajian, C. & Green, P. Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202 (1998).
Article CAS Google Scholar

Download references

Acknowledgements

The authors thank past and present members of the Nickerson lab for compiling the databases that were used to develop, train and test our algorithm. This work was supported by US National Institutes of Health (NIH) grants (1RO1HG/LM-02585 to M.S., and ES-15478 and HL-66682 to D.A.N.). P.S. was supported by an NIH training grant (T32 HG00035-06).

Author information

Authors and Affiliations

Department of Statistics, University of Washington, Seattle, 98195, Washington, USA
Matthew Stephens & Paul Scheet
Department of Genome Sciences, University of Washington, Seattle, 98195, Washington, USA
Matthew Stephens, James S Sloan, P D Robertson & Deborah A Nickerson

Authors

Matthew Stephens
View author publications
You can also search for this author in PubMed Google Scholar
James S Sloan
View author publications
You can also search for this author in PubMed Google Scholar
P D Robertson
View author publications
You can also search for this author in PubMed Google Scholar
Paul Scheet
View author publications
You can also search for this author in PubMed Google Scholar
Deborah A Nickerson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew Stephens.

Ethics declarations

Competing interests

POLYPHRED is freely available for academic purposes, but a licensing fee is charged for commercial use, which predominantly funds further software and methods development.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stephens, M., Sloan, J., Robertson, P. et al. Automating sequence-based detection and genotyping of SNPs from diploid samples. Nat Genet 38, 375–381 (2006). https://doi.org/10.1038/ng1746

Download citation

Received: 16 October 2005
Accepted: 12 January 2006
Published: 19 February 2006
Issue Date: 01 March 2006
DOI: https://doi.org/10.1038/ng1746

This article is cited by

Rare intronic variants of TCF7L2 arising by selective sweeps in an indigenous population from Mexico
- Jose Luis Acosta
- Alma Cristal Hernández-Mondragón
- Laura del Bosque-Plata
BMC Genetics (2016)
MSuPDA: A Memory Efficient Algorithm for Sequence Alignment
- Mohammad Ibrahim Khan
- Md. Sarwar Kamal
- Linkon Chowdhury
Interdisciplinary Sciences: Computational Life Sciences (2016)
DiSNPindel: improved intra-individual SNP and InDel detection in direct amplicon sequencing of a diploid
- Jizhong Deng
- Huasheng Huang
- Siming Gan
BMC Bioinformatics (2015)
Investigation of Pathogenic Genes in Chinese sporadic Hypertrophic Cardiomyopathy Patients by Whole Exome Sequencing
- Jing Xu
- Zhongshan Li
- Qiming Dai
Scientific Reports (2015)
Performance evaluation of Warshall algorithm and dynamic programming for Markov chain in local sequence alignment
- Mohammad Ibrahim Khan
- Md. Sarwar kamal
Interdisciplinary Sciences: Computational Life Sciences (2015)

Automating sequence-based detection and genotyping of SNPs from diploid samples

Abstract

Access options

Similar content being viewed by others

Detection of low-frequency DNA variants by targeted sequencing of the Watson and Crick strands

SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly method

A robust benchmark for detection of germline large deletions and insertions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Fig. 1

Supplementary Fig. 2

Supplementary Fig. 3

Supplementary Fig. 4

Supplementary Methods (PDF 83 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

Rare intronic variants of TCF7L2 arising by selective sweeps in an indigenous population from Mexico

MSuPDA: A Memory Efficient Algorithm for Sequence Alignment

DiSNPindel: improved intra-individual SNP and InDel detection in direct amplicon sequencing of a diploid

Investigation of Pathogenic Genes in Chinese sporadic Hypertrophic Cardiomyopathy Patients by Whole Exome Sequencing

Performance evaluation of Warshall algorithm and dynamic programming for Markov chain in local sequence alignment

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links