The human genome has 49 cytochrome c pseudogenes, including a relic of a primordial gene that still functions in mouse
Introduction
Cytochrome c (cyc) is a central component of the electron transfer chain in the cell, and is involved in both aerobic and anaerobic respiration. It is also involved in other cellular processes such as apoptosis (Kluck et al., 1997) and heme biosynthesis (Biel and Biel, 1990). It is a ubiquitous protein, found in all eukaryotes and prokaryotes. Because of its importance, relatively small size (104 amino acids in mammals) and ease of isolation, cyc has been very intensively studied. Cyc has also been used as a paradigm in the study of the evolution of protein sequence and structure (Chothia and Lesk, 1985, Wu et al., 1986, Mills, 1991). The amino acid sequences of cyc from many species are now available (Banci et al., 1999); the sequences among vertebrates are especially conserved except among primates, where acceleration in non-synonymous mutation has been observed (Evans and Scarpulla, 1988, Grossman et al., 2001).
By screening genomic DNA libraries, multiple copies of cytochrome c processed pseudogenes were discovered in mammalian genomes (Scarpulla et al., 1982, Scarpulla, 1984), including 11 copies in human (Evans and Scarpulla, 1988). Processed pseudogenes are disabled copies of functional genes that do not produce a functional, full-length protein (Vanin, 1985, Mighell et al., 2000, Harrison et al., 2002a). It is believed that they arose from LINE1-mediated retrotransposition, i.e. reverse-transcription of mRNA transcripts followed by integration into genomic DNA, presumably in the germ line (Kazazian and Moran, 1998, Esnault et al., 2000). They are characterized by a complete lack of introns, the presence of small flanking direct repeats and a polyadenine tract near the 3′ end (provided that they have not decayed). Existence of pseudogenes in the genome can obscure the identification and cloning of functional genes; however, pseudogenes can also provide a fossil record of gene sequences existing at various times during evolution.
Previously, we identified over 2000 ribosomal protein (RP) pseudogenes in the human genome (Harrison et al., 2002b, Zhang et al., 2002), most of which were previously overlooked by DNA hybridization experiments. Motivated by this discovery of an unexpectedly large number of additional pseudogenes, we carried out a similar comprehensive survey on human cytochrome c pseudogenes. Our study provides a complete molecular record of the recent evolution of this gene and demonstrates the importance of examining pseudogenic sequences. It also demonstrates a specific instance of a gene disappearing and leaving only a fossil pseudogene in its place.
Section snippets
Materials and methods
The basic procedures of our pseudogene discovery pipeline have been previously described (Zhang et al., 2002). A brief overview is given below.
The human cyc pseudogene population
A total of 50 cyc homology loci were identified in the human genome, including 49 pseudogenes (denoted as HCP) and one intron-containing functional gene (denoted as HCS). The HCS gene was located on chromosome 7 (cytogenic band 7p15.3, see Fig. 1), the annotation was confirmed by the perfect alignment of the exons, intron, and the 5′ and 3′ regions with the previously reported nucleotide sequence ((Evans and Scarpulla, 1988), GenBank ID: 181241). It is known that the HCS gene contains two
Discussion
The 49 cyc pseudogenes we describe here present an evolutionary record of the human cytochrome c gene; our findings strongly support the hypothesis that this gene has evolved at a very rapid rate in the recent human lineage. The sequence information we report here will not only aid researchers to design better HCS-specific probes to avoid pseudogene complications, but will also be very useful in calibrating and estimating various evolutionary and phylogenetic models. The discovery of the common
Acknowledgements
MG acknowledges NIH grant 2P01GM54160-04. Z.Z. thanks Dr. Paul Harrison for comments on the manuscript and Dr. Duncan Milburn and Nat Echols for computational help.
References (39)
- et al.
Helix movements and the reconstruction of the haem pocket during the evolution of the cytochrome c family
J. Mol. Biol.
(1985) - et al.
Molecular evolution of aerobic energy metabolism in primates
Mol. Phylogenet. Evol.
(2001) - et al.
A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution
J. Mol. Biol.
(2002) - et al.
Alu sequences
FEBS Lett.
(1997) - et al.
Vertebrate pseudogenes
FEBS Lett.
(2000) Cytochrome c: gene structure, homology and ancestral relationships
J. Theor. Biol.
(1991)Comparison of DNA sequences with protein sequences
Genomics
(1997)Interspersed repeats and other mementos of transposable elements in mammalian genomes
Curr. Opin. Genet. Dev.
(1999)- et al.
Structure and expression of rodent genes encoding the testis-specific cytochrome c. Differences in gene structure and evolution between somatic and testicular variants
J. Biol. Chem.
(1988) - et al.
Statistics of local complexity in amino acid sequences and sequence databases
Comput. Chem.
(1993)