Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

Automating resequencing-based detection of insertion-deletion polymorphisms

Abstract

Structural and insertion-deletion (indel) variants have received considerable recent attention, partly because of their phenotypic consequences. Among these variants, the most common are small indels (1–30 bp). Identifying and genotyping indels using sequence traces obtained from diploid samples requires extensive manual review, which makes large-scale studies inconvenient. We report a new algorithm, implemented in available software (PolyPhred version 6.0), to help automate detection and genotyping of indels from sequence traces. The algorithm identifies heterozygous individuals, which permits the discovery of low-frequency indels. It finds 80% of all indel polymorphisms with almost no false positives and finds 97% with a false discovery rate of 10%. Additionally, genotyping accuracy exceeds 99%, and it correctly infers indel length in 96% of the cases. Using this approach, we identify indels in the HapMap ENCODE regions, providing the first report of these polymorphisms in this data set.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: An example of how our algorithm identifies a heterozygous indel trace.
Figure 2: Indel detection and genotyping accuracy.

Similar content being viewed by others

References

  1. Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

    Article  CAS  Google Scholar 

  2. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).

    Article  CAS  Google Scholar 

  3. Albertini, A.M., Hofer, M., Calos, M.P. & Miller, J.H. On the formation of spontaneous deletions: the importance of short sequence homologies in the generation of large deletions. Cell 29, 319–328 (1982).

    Article  CAS  Google Scholar 

  4. Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E. & Pritchard, J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 75–81 (2006).

    Article  CAS  Google Scholar 

  5. Hinds, D.A., Kloek, A.P., Jen, M., Chen, X. & Frazer, K.A. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 38, 82–85 (2006).

    Article  CAS  Google Scholar 

  6. McCarroll, S.A. et al. Common deletion polymorphisms in the human genome. Nat. Genet. 38, 86–92 (2006).

    Article  CAS  Google Scholar 

  7. Bhangale, T.R., Rieder, M.J., Livingston, R.J. & Nickerson, D.A. Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes. Hum. Mol. Genet. 14, 59–69 (2005).

    Article  CAS  Google Scholar 

  8. Othman, M. et al. Identification and functional characterization of a novel 27-bp deletion in the macroglycopeptide-coding region of the GPIBA gene resulting in platelet-type von Willebrand disease. Blood 105, 4330–4336 (2005).

    Article  CAS  Google Scholar 

  9. deSanctis, L. et al. Familial PAX8 small deletion (c.989_992delACCC) associated with extreme phenotype variability. J. Clin. Endocrinol. Metab. 89, 5669–5674 (2004).

    Article  CAS  Google Scholar 

  10. Karban, A.S. et al. Functional annotation of a novel NFKB1 promoter polymorphism that increases risk for ulcerative colitis. Hum. Mol. Genet. 13, 35–45 (2004).

    Article  CAS  Google Scholar 

  11. Lin, S.C. et al. Correlation between functional genotypes in the matrix metalloproteinases-1 promoter and risk of oral squamous cell carcinomas. J. Oral Pathol. Med. 33, 323–326 (2004).

    Article  CAS  Google Scholar 

  12. Stenson, P.D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003).

    Article  CAS  Google Scholar 

  13. Nickerson, D.A., Tobe, V.O. & Taylor, S.L. PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res. 25, 2745–2751 (1997).

    Article  CAS  Google Scholar 

  14. Stephens, M., Sloan, J.S., Robertson, P.D., Scheet, P. & Nickerson, D.A. Automating sequence-based detection and genotyping of SNPs from diploid samples. Nat. Genet. 38, 375–381 (2006).

    Article  CAS  Google Scholar 

  15. Weckx, S. et al. novoSNP, a novel computational tool for sequence variation discovery. Genome Res. 15, 436–442 (2005).

    Article  CAS  Google Scholar 

  16. Carlson, C.S. et al. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat. Genet. 33, 518–521 (2003).

    Article  CAS  Google Scholar 

  17. Livingston, R.J. et al. Pattern of sequence variation across 213 environmental response genes. Genome Res. 14, 1821–1831 (2004).

    Article  CAS  Google Scholar 

  18. International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  19. Manaster, C. et al. InSNP: a tool for automated detection and visualization of SNPs and InDels. Hum. Mutat. 26, 11–19 (2005).

    Article  CAS  Google Scholar 

  20. Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

    Article  CAS  Google Scholar 

  21. Locke, D.P. et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).

    Article  CAS  Google Scholar 

  22. Newman, T.L. et al. High-throughput genotyping of intermediate-size structural variation. Hum. Mol. Genet. 15, 1159–1167 (2006).

    Article  CAS  Google Scholar 

  23. Klein, R.J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).

    Article  CAS  Google Scholar 

  24. Ahn, J. et al. Cloning of the putative tumour suppressor gene for hereditary multiple exostoses (EXT1). Nat. Genet. 11, 137–143 (1995).

    Article  CAS  Google Scholar 

  25. Rockman, M.V. et al. Positive selection on MMP3 regulation has shaped heart disease risk. Curr. Biol. 14, 1531–1539 (2004).

    Article  CAS  Google Scholar 

  26. Eichler, E.E. Widening the spectrum of human genetic variation. Nat. Genet. 38, 9–11 (2006).

    Article  CAS  Google Scholar 

  27. Weber, J.L. et al. Human diallelic insertion/deletion polymorphisms. Am. J. Hum. Genet. 71, 854–862 (2002).

    Article  Google Scholar 

  28. Mills, R.E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).

    Article  CAS  Google Scholar 

  29. Kruglyak, L. & Nickerson, D.A. Variation is the spice of life. Nat. Genet. 27, 234–236 (2001).

    Article  CAS  Google Scholar 

  30. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

    Article  CAS  Google Scholar 

  31. Gordon, D., Abajian, C. & Green, P. Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202 (1998).

    Article  CAS  Google Scholar 

  32. Needleman, S.B. & Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank the past and present members of the SeattleSNPs team for their efforts in variation discovery and the PolyPhred development team, including J. Sloan and P. Robertson. This work was supported by grants from the US National Institute of Health (HL66682 to D.A.N. and HG/LM02585 to M.S.).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tushar R Bhangale or Deborah A Nickerson.

Ethics declarations

Competing interests

PolyPhred is freely available for academic purposes, but a licensing fee is charged for commercial use, which predominantly funds further software and methods development.

Supplementary information

Supplementary Fig. 1

Trace signal patterns. (PDF 76 kb)

Supplementary Fig. 2

Lengths and LD characteristics of ENCODE indels. (PDF 285 kb)

Supplementary Fig. 3

Application of the DPA. (PDF 225 kb)

Supplementary Table 1

ENCODE indels found in different functional regions of genes. (PDF 49 kb)

Supplementary Table 2

Chromosomal locations of the 1,125 indels identified in the ENCODE regions. (PDF 200 kb)

Supplementary Table 3

The independent variables and their estimated effect parameters in the logistic regression model used for identifying heterozygous indel traces. (PDF 86 kb)

Supplementary Table 4

The independent variables and their estimated effect parameters in the logistic regression model used for identifying indel loci. (PDF 83 kb)

Supplementary Methods (PDF 101 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhangale, T., Stephens, M. & Nickerson, D. Automating resequencing-based detection of insertion-deletion polymorphisms. Nat Genet 38, 1457–1462 (2006). https://doi.org/10.1038/ng1925

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng1925

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing