Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

BreakDancer: an algorithm for high-resolution mapping of genomic structural variation

Abstract

Detection and characterization of genomic structural variation are important for understanding the landscape of genetic variation in human populations and in complex diseases such as cancer. Recent studies demonstrate the feasibility of detecting structural variation using next-generation, short-insert, paired-end sequencing reads. However, the utility of these reads is not entirely clear, nor are the analysis methods with which accurate detection can be achieved. The algorithm BreakDancer predicts a wide variety of structural variants including insertion-deletions (indels), inversions and translocations. We examined BreakDancer's performance in simulation, in comparison with other methods and in analyses of a sample from an individual with acute myeloid leukemia and of samples from the 1,000 Genomes trio individuals. BreakDancer sensitively and accurately detected indels ranging from 10 base pairs to 1 megabase pair that are difficult to detect via a single conventional approach.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of BreakDancer algorithm.
Figure 2: Performance of BreakDancer in simulation.
Figure 3: Size distribution of deletions detected in the genome of an individual with AML.
Figure 4: Accuracy of predicted variant sizes. Plotted are variant sizes predicted by BreakDancer and by local assembly (estimated) versus true sizes determined from the PCR resequencing (validated).

Similar content being viewed by others

References

  1. Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).

    Article  CAS  Google Scholar 

  2. Ben-Shachar, S. et al. 22q11.2 distal deletion: a recurrent genomic disorder distinct from DiGeorge syndrome and velocardiofacial syndrome. Am. J. Hum. Genet. 82, 214–221 (2008).

    Article  CAS  Google Scholar 

  3. Sharp, A.J. et al. A recurrent 15q13.3 microdeletion syndrome associated with mental retardation and seizures. Nat. Genet. 40, 322–328 (2008).

    Article  CAS  Google Scholar 

  4. Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

    Article  CAS  Google Scholar 

  5. Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).

  6. Mitelman, F., Johansson, B. & Mertens, F. The impact of translocations and gene fusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007).

    Article  CAS  Google Scholar 

  7. Urban, A.E. et al. High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 103, 4534–4539 (2006).

    Article  CAS  Google Scholar 

  8. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  CAS  Google Scholar 

  9. Istrail, S. et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl. Acad. Sci. USA 101, 1916–1921 (2004).

    Article  CAS  Google Scholar 

  10. Khaja, R. et al. Genome assembly comparison identifies structural variants in the human genome. Nat. Genet. 38, 1413–1418 (2006).

    Article  CAS  Google Scholar 

  11. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  Google Scholar 

  12. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    Article  CAS  Google Scholar 

  13. Mardis, E.R. The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008).

    Article  CAS  Google Scholar 

  14. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Article  CAS  Google Scholar 

  15. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    Article  CAS  Google Scholar 

  16. Volik, S. et al. End-sequence profiling: sequence-based analysis of aberrant genomes. Proc. Natl. Acad. Sci. USA 100, 7696–7701 (2003).

    Article  Google Scholar 

  17. Raphael, B.J., Volik, S., Collins, C. & Pevzner, P.A. Reconstructing tumor genome architectures. Bioinformatics 19 Suppl 2, ii162–ii171 (2003).

    Article  Google Scholar 

  18. Kaiser, J. DNA sequencing. A plan to capture human diversity in 1000 genomes. Science 319, 395 (2008).

    Article  CAS  Google Scholar 

  19. Mardis, E.R. et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med. (in the press).

  20. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

    Article  CAS  Google Scholar 

  21. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  Google Scholar 

  22. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

    Article  CAS  Google Scholar 

  23. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).

    Article  CAS  Google Scholar 

  24. Hormozdiari, F., Alkan, C., Eichler, E.E. & Sahinalp, S.C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).

    Article  CAS  Google Scholar 

  25. Lee, S., Hormozdiari, F., Alkan, C. & Brudno, M. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nat. Methods 6, 473–474 (2009).

    Article  CAS  Google Scholar 

  26. Stuart, A., Ord, K. & Arnold, S. Tests of fit. in Kendall's Advanced Theory of Statistics Vol. 2A 25.37–25.43 (Arnold, London, 1999).

    Google Scholar 

  27. Walter, M.J. et al. Acquired subcytogenetic deletions and amplifications in adult acute myeloid leukemia genomes. Proc. Natl. Acad. Sci. USA (in the press).

  28. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).

    Article  CAS  Google Scholar 

  29. Chiang, D.Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat. Methods 6, 99–103 (2009).

    Article  CAS  Google Scholar 

  30. Fisher, R.A. Combining independent tests of significance. Am. Stat. 2, 30 (1948).

    Google Scholar 

  31. Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank the Genomics of AML Program Project Grant team at Washington University Medical School (US National Cancer Institute PO1 CA101937; principal investigagor, T.J.L.) and the 1,000 Genomes Consortium members for providing the data. We thank members of the 1,000 Genomes structural variation group and H. Li for methodology discussions; D. Bentley and M. Ross (Illumina), C. Alkan and J. Kidd (University of Washington), and Y. Li and H. Zheng (Beijing Genome Institute) for providing validation data; and A. Chinwalla, D. Dooling, S. Smith, J. Eldred, C. Harris, L. Cook, V. Magrini, Y. Tang, H. Schmidt, C. Haipek, G. Elliott and R. Abbott for assistance. This work was supported by the National Human Genome Research Institute (HG003079; principal investigator, R.K.W.).

Author information

Authors and Affiliations

Authors

Contributions

E.R.M., R.K.W., L.D. and T.J.L.: project conception and oversight. K.C.: algorithm design and implementation. J.W.W.: variant assembly. J.M.K., M.D.M. and R.S.F.: experimental validation. C.S.P. and L.D.: primer design. S.D.M. and D.P.L.: Illumina library preparation. Q.Z. and M.C.W.: statistical insight. J.W.W., D.E.L., X.S., and D.P.L.: variant characterization and visualization. K.C., E.R.M., M.C.W., L.D. and J.W.W.: manuscript preparation.

Corresponding author

Correspondence to Ken Chen.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12, Supplementary Tables 2, 4–8 and Supplementary Note (PDF 1386 kb)

Supplementary Table 1

List of structural variants detected in simulation. (XLS 72 kb)

Supplementary Table 3

A list of AML2 structural variants detected by BreakDancer, refined by local assembly and validated via PCR resequencing. (XLS 99 kb)

Supplementary Software

The BreakDancer software package encompasses two algorithms: BreakDancerMax detects large structural variants (deletions, insertions, inversions, and intra- and interchromosomal translocations), and BreakDancerMini detects small (10–100 bp) insertions and deletions. (ZIP 16 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, K., Wallis, J., McLellan, M. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 6, 677–681 (2009). https://doi.org/10.1038/nmeth.1363

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.1363

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing