Trends in Genetics
Do transposable elements really contribute to proteomes?
Introduction
It is widely accepted that transposable elements (TEs; see Glossary) have had a major impact on the evolution of mammalian genomes. A well-documented example is that of the human genome, almost half of its sequence being derived from TEs [1]. TEs were initially regarded as ‘junk’ [2], ‘selfish’, and ‘parasite’ pieces of DNA 3, 4, 5. Gradually, scientists realized that TEs should be regarded as ‘seeds of evolution’ [6] and ‘genomic treasures’ [7] because they seem to enhance the organisms' evolvability in many ways. TEs are active genomic components that can promote recombination 8, 9 and provide ready-to-use motifs, such as transcriptional regulatory elements, polyadenylation and splicing signals, and even protein coding sequences 10, 11, 12, 13, 14.
The contribution of TEs to coding regions is of particular interest, because they can directly influence the phenotype by altering protein sequences. This aspect was documented, however, only at the transcript level, and the presence of TE-encoded fragments was not confirmed at the protein level [15]. Because of the important evolutionary implications, we attempted to clarify the issue of TE contribution to metazoan proteomes using computational methods and publicly available data by searching for TE cassettes in functionally well-characterized proteins. We found evidence indicating that functional proteins can indeed contain TE cassettes, but only those derived from old TEs. Those derived from young TEs, such as Alu short interspersed elements (SINEs) and L1 long interspersed elements (LINE1s), seem to disrupt the functionality of the proteins into which they are inserted.
Section snippets
TE fragments were found in coding regions of many transcripts but not in functional proteins
More than a decade ago, a few studies reported that some mRNAs contain TE cassettes in their coding regions 16, 17, 18 that sometimes resulted in disease phenotypes such as the gyrate atrophy of the choroid and retina [19]. These observations led to the hypothesis that, in other cases, TE exaptation could have neutral effects or even enhance fitness and, therefore, might increase protein variability with positive evolutionary consequences [20]. Since then, several studies discovered TE
Identifying proteins with TE-encoded fragments
We searched for TE cassettes only in functionally well-characterized proteins, to eliminate the uncertainty of translation associated with most transcripts (see the online supplementary material for more details). Among the 3764 Protein Databank (PDB; http://www.rcsb.org/pdb/) entries with non-redundant protein chains, we found only 24 proteins with fragments encoded by putative TE cassettes (Tables 1, S1 in the online supplementary material). No additional examples were identified in the
TE exaptation: when did it happen?
A second argument supporting the validity of the L3 cassette is provided by the origin of PTPN1. It is known that PTP diversification occurred by a series of duplication events during early vertebrate evolution 37, 38, 39. This can explain why PTPN1 is located ∼7.3-Mb apart from PTPRT on chromosome 20q, similar to their closest homologs, PTPN2 and PTPRM, respectively (Ref. [39]; Figure 2), which are located ∼4.4-Mb-apart on chromosome 18p. The most likely scenario is that an intra-chromosomal
TE exaptation: how did it happen?
According to Ohno [40], gene duplications create the raw material for evolutionary ‘innovations’. He argues that newly duplicated genes are free of functional constraints and can undergo significant changes until they acquire new specific functions. Provided that duplications are the documented source for PTP diversification as discussed earlier, it is easy to imagine that the future PTPN1 could have easily acquired a TE fragment after the activation of a cryptic splice site in a manner similar
Concluding remarks and directions for further research
The confirmation that TEs are present at the protein level is by no means a surprise, and they are certainly not the only category of DNA sequence to be exapted successfully into functional proteins. Hayashi et al. showed that any random sequence could acquire biological functions if it had sufficient time to evolve [43]. It is, however, their prevalence and mobility within genomes that make TEs important players in molecular and genomic evolution.
Acknowledgements
We thank Dimitra Chalkia for her contribution to the construction of MME and ARFIP2 phylogenies, and her constructive criticisms on the initial versions of the article. We are grateful to Vamsi Veeramachaneni for setting up the initial protein data set; to Jordi Bella and Paul McEwan for their help with protein structure visualization tools; to Nikolas Nikolaidis, Jongmin Nam and Arthur Lesk for critically reading the article; and to the reviewers for their useful comments and suggestions. V.G.
Glossary
- Transposable elements (TEs):
- all DNA segments that have the ability to move or multiply within genomes generating self-copies interspersed with non-repetitive DNA. The term is often used for referring to copies of such elements that lost the ability to move or multiply once integrated at a new genomic location because of either mutation or fragmentation. For those segments, ‘TE-derived sequences’ or ‘transposed elements’ would better describe the current status of the sequence. The more general
References (57)
Mobile elements and mammalian genome evolution
Curr. Opin. Genet. Dev.
(2003)Genomic scrap yard: how genomes utilize all that junk
Gene
(2000)Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions
Trends Genet.
(2003)Complex controls: the role of alternative promoters in mammalian genome
Trends Genet.
(2003)Transposable elements as a significant source of transcription regulating signals
Gene
(2006)Transposable elements encoding functional proteins: pitfalls in unprocessed genomic data?
FEBS Lett.
(2002)Isolation and sequence analysis of a cDNA clone encoding the fifth complement component
J. Biol. Chem.
(1985)Alu sequences in the coding regions of mRNA: a source of protein variability
Trends Genet.
(1994)- et al.
Transposable elements are found in a large number of human protein-coding genes
Trends Genet.
(2001) Are non-functional, unfolded proteins (‘junk proteins’) common in the genome?
FEBS Lett.
(2003)
Translation control: bridging the gap between genomics and proteomics?
Trends Biochem. Sci.
Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution
Curr. Biol.
An Alu cassette in the human epithelial sodium channel
Biochim. Biophys. Acta
Alu repeats and human disease
Mol. Genet. Metab.
Crystal structure of a pivotal domain of human syncytin-2, a 40 million years old endogenous retrovirus fusogenic envelope gene captured by primates
J. Mol. Biol.
Eukaryotic transposable elements and genome evolution
Trends Genet.
Initial sequencing and analysis of the human genome
Nature
So much ‘junk’ DNA in our genome
Brookhaven Symp. Biol.
Selfish genes, the phenotype paradigm and genome evolution
Nature
Selfish DNA: the ultimate parasite
Nature
Selfish DNA: a sexually-transmitted nuclear parasite
Genetics
Retroposons – seeds of evolution
Science
Mining treasures from ‘junk DNA’
Science
Two autosomal dominant neuropathies result from reciprocal DNA duplication/deletion of a region on chromosome 17
Hum. Mol. Genet.
Mobile elements: drivers of genome evolution
Science
Cloning of decay-accelerating factor suggests novel use of splicing to generate two proteins
Nature
A human rel proto-oncogene cDNA containing an Alu fragment as a potential coding exon
Oncogene
Splice-mediated insertion of an Alu sequence inactivates ornithine delta-aminotransferase: a role for Alu elements in human mutation
Proc. Natl. Acad. Sci. U. S. A.
Cited by (73)
MIR retroposon exonization promotes evolutionary variability and generates species-specific expression of IGF-1 splice variants
2016, Biochimica et Biophysica Acta - Gene Regulatory MechanismsGenomic regions harboring insecticide resistance-associated Cyp genes are enriched by transposable element fragments carrying putative transcription factor binding sites in two sibling Drosophila species
2014, GeneCitation Excerpt :These data, together with the small sizes of the insertions, corroborate the hypothesis that the genes analyzed in both species contain old copies of DNAREP1. Because long evolutionary periods are necessary for young copies of TEs to acquire new biological functions (Gotea and Makalowski, 2006), the presence of small and divergent copies of the DNAREP1 sequences, inserted preferentially in the 5′-ends of the Cyps and presenting putative TFBSs, suggests that these copies may play an adaptive role in these species by affecting gene expression. To investigate this possibility, it is important to understand how TE fragments become enriched with cis-regulatory elements.
Transposable elements domesticated and neofunctionalized by eukaryotic genomes
2013, PlasmidCitation Excerpt :When transposition occurs in coding and regulatory regions, it results in loss of genes or changes in the level of gene expression (Kashkush et al., 2003). Genes of TEs have been recruited (domesticated) by the host genome during evolution (Lyon, 2000; Nekrutenko and Li, 2001; Curcio and Derbyshire, 2003; Gotea and Makalowski, 2006). Such events are evident because there are a number of functional differences between autonomous ancestral TEs and domesticated TEs (Table 1).
Integration of TE Induces Cancer Specific Alternative Splicing Events
2022, International Journal of Molecular Sciences