Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history

  1. Philip M. Kim1,8,
  2. Hugo Y.K. Lam2,8,
  3. Alexander E. Urban3,
  4. Jan O. Korbel1,7,
  5. Jason Affourtit4,
  6. Fabian Grubert5,
  7. Xueying Chen1,
  8. Sherman Weissman5,
  9. Michael Snyder3, and
  10. Mark B. Gerstein1,2,6,9
  1. 1 Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA;
  2. 2 Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA;
  3. 3 Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520, USA;
  4. 4 454 Life Sciences, Branford, Connecticut 06405, USA;
  5. 5 Department of Genetics, Yale University, New Haven, Connecticut 06520, USA;
  6. 6 Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA;
  7. 7 European Molecular Biology Laboratory, 69117 Heidelberg, Germany
  1. 8 These authors contributed equally to this work.

Abstract

Segmental duplications (SDs) are operationally defined as >1 kb stretches of duplicated DNA with high sequence identity. They arise from copy number variants (CNVs) fixed in the population. To investigate the formation of SDs and CNVs, we examine their large-scale patterns of co-occurrence with different repeats. Alu elements, a major class of genomic repeats, had previously been identified as prime drivers of SD formation. We also observe this association; however, we find that it sharply decreases for younger SDs. Continuing this trend, we find only weak associations of CNVs with Alus. Similarly, we find an association of SDs with processed pseudogenes, which is decreasing for younger SDs and absent entirely for CNVs. Next, we find that SDs are significantly co-localized with each other, resulting in a highly skewed “power-law” distribution and chromosomal hotspots. We also observe a significant association of CNVs with SDs, but find that an SD-mediated mechanism only accounts for some CNVs (<28%). Overall, our results imply that a shift in predominant formation mechanism occurred in recent history: ∼40 million years ago, during the “Alu burst” in retrotransposition activity, non-allelic homologous recombination, first mediated by Alus and then the by newly formed CNVs themselves, was the main driver of genome rearrangements; however, its relative importance has decreased markedly since then, with proportionally more events now stemming from other repeats and from non-homologous end-joining. In addition to a coarse-grained analysis, we performed targeted sequencing of 67 CNVs and then analyzed a combined set of 270 CNVs (540 breakpoints) to verify our conclusions.

Footnotes

  • 9 Corresponding author.

    9 E-mail mark.gerstein{at}yale.edu; fax (360) 838-7861.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.081422.108.

    • Received May 27, 2008.
    • Accepted September 30, 2008.
  • Freely available online through the Genome Research Open Access option.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server