Reduced purifying selection prevails over positive selection in human copy number variant evolution

  1. Duc-Quang Nguyen1,3,
  2. Caleb Webber1,3,4,
  3. Jayne Hehir-Kwa2,
  4. Rolph Pfundt2,
  5. Joris Veltman2, and
  6. Chris P. Ponting1
  1. 1 MRC Functional Genomics Unit, University of Oxford, Department of Physiology, Anatomy and Genetics, Oxford OX1 3QX, United Kingdom;
  2. 2 Department of Human Genetics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen 6500 HB, The Netherlands
  1. 3 These authors contributed equally to this work.

Abstract

Copy number variation is a dominant contributor to genomic variation and may frequently underlie an individual’s variable susceptibilities to disease. Here we question our previous proposition that copy number variants (CNVs) are often retained in the human population because of their adaptive benefit. We show that genic biases of CNVs are best explained, not by positive selection, but by reduced efficiency of selection in eliminating deleterious changes from the human population. Of four CNV data sets examined, three exhibit significant increases in protein evolutionary rates. These increases appear to be attributable to the frequent coincidence of CNVs with segmental duplications (SDs) that recombine infrequently. Furthermore, human orthologs of mouse genes, which, when disrupted, result in pre- or postnatal lethality, are unusually depleted in CNVs. Together, these findings support a model of reduced purifying selection (Hill–Robertson interference) within copy number variable regions that are enriched in nonessential genes, allowing both the fixation of slightly deleterious substitutions and increased drift of CNV alleles. Additionally, all four CNV sets exhibited increased rates of interspecies chromosomal rearrangement and nucleotide substitution and an increased gene density. We observe that sequences with high G+C contents are most prone to copy number variation. In particular, frequently duplicated human SD sequence, or CNVs that are large and/or observed frequently, tend to be elevated in G+C content. In contrast, SD sequences that appear fixed in the human population lie more frequently within low G+C sequence. These findings provide an overarching view of how CNVs arise and segregate in the human population.

Footnotes

  • 4 Corresponding author.

    4 E-mail caleb.webber{at}dpag.ox.ac.uk; fax 44-1865-285862.

  • [Supplemental material is available online at www.genome.org. The BAC array data from this study have been submitted to Gene Expression Omnibus [GEO] (www.ncbi.nlm.nih.gov/geo) under accession no. GSE7391.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.077289.108.

    • Received February 11, 2008.
    • Accepted July 23, 2008.
| Table of Contents

Preprint Server