The difficulty of avoiding false positives in genome scans for natural selection

  1. Swapan Mallick1,2,3,
  2. Sante Gnerre2,
  3. Paul Muller1,2 and
  4. David Reich1,2,3
  1. 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA;
  2. 2 Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA

    Abstract

    Several studies have found evidence for more positive selection on the chimpanzee lineage compared with the human lineage since the two species split. A potential concern, however, is that these findings may simply reflect artifacts of the data: inaccuracies in the underlying chimpanzee genome sequence, which is of lower quality than human. To test this hypothesis, we generated de novo genome assemblies of chimpanzee and macaque and aligned them with human. We also implemented a novel bioinformatic procedure for producing alignments of closely related species that uses synteny information to remove misassembled and misaligned regions, and sequence quality scores to remove nucleotides that are less reliable. We applied this procedure to re-examine 59 genes recently identified as candidates for positive selection in chimpanzees. The great majority of these signals disappear after application of our new bioinformatic procedure. We also carried out laboratory-based resequencing of 10 of the regions in multiple chimpanzees and humans, and found that our alignments were correct wherever there was a conflict with the published results. These findings throw into question previous findings that there has been more positive selection in chimpanzees than in humans since the two species diverged. Our study also highlights the challenges of searching the extreme tails of distributions for signals of natural selection. Inaccuracies in the genome sequence at even a tiny fraction of genes can produce false-positive signals, which make it difficult to identify loci that have genuinely been targets of selection.

    Footnotes

    • 3 Corresponding authors.

      E-mail reich{at}genetics.med.harvard.edu; fax (617) 432-7663.

      E-mail shop{at}broad.mit.edu; fax (617) 432-7663.

    • [Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) under accession nos. FJ821202–FJ821288.]

    • Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.086512.108.

      • Received September 11, 2008.
      • Accepted March 11, 2009.

    Related Articles

    | Table of Contents

    Preprint Server