Human-specific insertions and deletions inferred from mammalian genome sequences

  1. Feng-Chi Chen1,2,
  2. Chueng-Jong Chen2,
  3. Wen-Hsiung Li2,3,4, and
  4. Trees-Juen Chuang2,4
  1. 1 Division of Biostatistics and Bioinformatics, National Health Research Institute, Miaoli County 350, Taiwan;
  2. 2 Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan;
  3. 3 Department of Ecology and Evolution, University of Chicago, Illinois 60637, USA

Abstract

It has been suggested that insertions and deletions (indels) have contributed to the sequence divergence between the human and chimpanzee genomes more than do nucleotide changes (3% vs. 1.2%). However, although there have been studies of large indels between the two genomes, no systematic analysis of small indels (i.e., indels ≤ 100 bp) has been published. In this study, we first estimated that the false-positive rate of small indels inferred from human–chimpanzee pairwise sequence alignments is quite high, suggesting that the chimpanzee genome draft is not sufficiently accurate for our purpose. We have therefore inferred only human-specific indels using multiple sequence alignments of mammalian genomes. We identified >840,000 “small” indels, which affect >7000 UCSC-annotated human genes (>11,000 transcripts). These indels, however, amount to only ∼0.21% sequence change in the human lineage for the regions compared, whereas in pseudogenes indels contribute to a sequence divergence of 1.40%, suggesting that most of the indels that occurred in genic regions have been eliminated. Functional analysis reveals that the genes whose coding exons have been affected by human-specific indels are enriched in transcription and translation regulatory activities but are underrepresented in catalytic and transporter activities, cellular and physiological processes, and extracellular region/matrix. This functional bias suggests that human-specific indels might have contributed to human unique traits by causing changes at the RNA and protein level.

Footnotes

  • 4 Corresponding authors.

    4 E-mail wli{at}uchicago.edu; fax (773) 702-9740.

    4 E-mail trees{at}gate.sinica.edu.twor; fax (886) 2-27898757.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5429606

    • Received April 24, 2006.
    • Accepted August 30, 2006.
| Table of Contents

Preprint Server