Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome

  1. Zhaolei Zhang,
  2. Paul M. Harrison,
  3. Yin Liu, and
  4. Mark Gerstein1
  1. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA

Abstract

Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we developed a pipeline using features such as intron-absence, frame-disruption, polyadenylation, and truncation. This has enabled us to identify in recent genome drafts ∼8000 processed pseudogenes (distributed from http://pseudogene.org). Overall, processed pseudogenes are very similar to their closest corresponding human gene, being 94% complete in coding regions, with sequence similarity of 75% for amino acids and 86% for nucleotides. Their chromosomal distribution appears random and dispersed, with the numbers on chromosomes proportional to length, suggesting sustained “bombardment” over evolution. However, it does vary with GC-content: Processed pseudogenes occur mostly in intermediate GC-content regions. This is similar to Alus but contrasts with functional genes and L1-repeats. Pseudogenes, moreover, have age profiles similar to Alus. The number of pseudogenes associated with a given gene follows a power-law relationship, with a few genes giving rise to many pseudogenes and most giving rise to few. The prevalence of processed pseudogenes agrees well with germ-line gene expression. Highly expressed ribosomal proteins account for ∼20% of the total. Other notables include cyclophilin-A, keratin, GAPDH, and cytochrome c.

Footnotes

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1429003.

  • 1 Corresponding author. E-MAIL Mark.Gerstein{at}yale.edu; FAX (360) 838-7861.

    • Accepted September 18, 2003.
    • Received April 11, 2003.
| Table of Contents

Preprint Server