Trends in Biochemical Sciences
The insertion of palindromic repeats in the evolution of proteins
Section snippets
Coding potential of a palindromic sequence
Within a palindromic sequence, the left half and the right half of the sequence from the same DNA strand are, by definition, complementary to each other. How could such a sequence therefore emerge by chance in the course of evolution? Although the detailed mechanisms are still unknown, the duplication of a DNA segment followed by its inverted insertion at one of the extremity of the original segment is the most probable scenario for the generation of a palindromic sequence (Fig. 2). There is
Palindromic ORFs lead to well-behaved putative proteins
In addition to a higher probability of being more ‘open’ than the other antisense reading frames, RF−1 corresponds to amino acid frequencies close to the composition of actual proteins [9]. This is shown in Table 1, where the χ2 value was computed to measure the difference between the typical composition of actual proteins (RF+1) and proteins derived from other frames. Using this criterion, the amino acid composition derived from RF−1 is closest to that of normal proteins (RF+1). This implies
Protein folding
Blalock's molecular recognition theory 14, 15 claims that a peptide derived from the antisense RF−1 exhibits a more than random binding affinity to the peptide derived from the sense RF+1. Still controversial, this theory is based on a tendency for the ‘antipeptides’ encoded on the antisense strand (RF−1) to exhibit hydropathy profiles that are somewhat complementary to the protein encoded by the sense ORF (RF+1) [15]. Although the mechanisms of the molecular interaction between ‘complementary’
Structure of the RPEs
The arguments outlined previously suggest that palindromic elements generated in the RF+1/RF−1 configuration have: (1) a high coding probability; (2) probably lead to a soluble peptide; and (3) might have a tendency to adopt a compactly folded, self-contained domain-like structure. This leads to the prediction that RF+1/RF−1 should be the dominant configuration for the identified RPEs.
Testing this prediction with the RPE sequences of today is not straightforward as they have accumulated
Peptide insertion as a good evolutionary strategy
In contrast to other repeats, RPE insertions show no preference for noncoding sequences versus coding sequences. Within protein coding regions, the insertion sites of the RPE-derived peptides always appear to be at the surface of the protein structures 1, 3. In a typical bacterial genome, noncoding sequences and ORFs represent about 20% and 80% of the sequence, respectively. Considering that a quarter of a protein sequence corresponds to its surface residues [21], the target-sequence sizes
Creating new proteins from old repeats
The contribution of noncoding repeated elements to the evolution of proteins has been recurrently argued and remains controversial. It is clear that their mobility and selfish amplification enables them to play a major role in the plasticity of genomic sequences. Short tandem repeats of DNA oligomers, such as microsatellites, are abundant in both prokaryotic and eukaryotic genomes 30, 31. Their expansion mechanism is thought to involve slipped-strand mispairing, which might be the result of
Concluding remarks
Until now, a clear case of a well-conserved large repeat family identified at high frequency in both the coding and non-coding fraction of a genome was missing. This is now provided by RPE-1 and, to a lesser extent, RPE-2 and RPE-3. These repeats exhibit a palindromic structure (required for mobility and amplification), a high entropy sequence (required for real protein creativity), a length compatible with stable self-contained folding (35–50 residues), and evidence for multiple insertions
Note added in proof
For additional speculations about proteins arising from opposite strands of the same gene see Carter, C.W. and Duax, W.L. (2002) Did tRNA synthetase classes arise on opposite strands of the same gene? Mol. Cell. 10, 705–708.
Acknowledgements
We would like to thank Chantal Abergel for helpful discussions and for allowing access to her experimental work on Rickettsia palindromic element-containing proteins before publication. We also thank Karsten Suhre and David Pollock for their critical reading of this article.
References (42)
Novel intergenic repeats of Escherichia coli K-12
Res. Microbiol.
(1999)G and C accumulation at silent positions of codons produces additional ORFs
Trends Genet.
(1995)- et al.
Hydropathic anti-complementarity of amino acids based on the genetic code
Biochem. Biophys. Res. Commun.
(1984) Assembly of exons from unitary transposable genetic elements: implications for the evolution of protein–protein interactions
J. Theor. Biol.
(1998)Sequences with ‘unusual’ amino acid compositions
Curr. Opin. Struct. Biol.
(1994)- et al.
Analysis of insertions/deletions in protein structures
J. Mol. Biol.
(1992) - et al.
Glutamine, alanine or glycine repeats inserted into the loop of a protein have minimal effects on stability and folding rates
J. Mol. Biol.
(1997) Insertion of foreign random sequences of 120 amino acid residues into an active enzyme
FEBS Lett.
(1997)PH domain: the first anniversary
Trends Biochem. Sci.
(1994)Interspersed repeats and other mementos of transposable elements in mammalian genomes
Curr. Opin. Genet. Dev.
(1999)
A complete Alu element within the coding sequence of a central gene
Cell
Transposable elements are found in a large number of human protein-coding genes
Trends Genet.
Selfish DNA in protein-coding genes of Rickettsia
Science
Mechanisms of evolution in Rickettsia conorii and R. prowazekii
Science
Protein coding palindromes are a unique but recurrent feature in Rickettsia
Genome Res.
Repeated sequences
Genome deterioration: loss of repeated sequences and accumulation of junk DNA
Genetica
Structure, function, and evolution of bacterial reverse transcriptase
Virus Genes
Open reading frames in the antisense strands of genes coding for glycolytic enzymes in Saccharomyces cerevisiae
Mol. Gen. Genet.
A frame-specific symmetry of complementary strands of DNA suggests the existence of genes on the antisense strand
J. Mol. Evol.
Sense in antisense?
J. Mol. Evol.
Cited by (39)
Insertions and deletions in protein evolution and engineering
2022, Biotechnology AdvancesCitation Excerpt :Examples of the latter have been detected in the genome of Rickettsia conorii, where mobile palindromic repeat elements were discovered that are capable of insertion in open reading frames (ORFs). Surprisingly, the mobile elements persistently appear at the surface of the proteins coded by those ORFs (Claverie and Ogata, 2003). In this way, the original fold and function of the scaffold proteins are unaffected by the insertion.
Efficient biosynthesis of 1-cyanocyclohexaneacetic acid using a highly soluble nitrilase by N-terminus modification of novel peptide tags
2021, Biochemical Engineering JournalCitation Excerpt :Palindromic sequences, which have a high tendency to form helixes and relatively low structural complexity, typically consist of multiple repeating units containing one or two polar amino acids, and have net positive or negative charges [27]. The insertion and migration of palindromic sequences probably play an important role in the evolution of proteins, and have a significant impact on protein folding and solubility [28]. Nitrilase (EC 3.5.5.1) can efficiently convert cyano groups to carboxyl groups in a one-step reaction, which plays an important role in the synthesis of fine chemicals and carboxylic acids.
Paradoxical evolution of rickettsial genomes
2019, Ticks and Tick-borne DiseasesCitation Excerpt :Bacteria may use this random strategy to adapt their genetic repertoire in response to selective environmental pressure. The presence of a mobile element inserted in many unrelated genes also suggests the potential role of selfish DNA in rickettsial genome for de novo creation of new protein sequences during the course of evolution, suggesting an implication in the dynamics of genome evolution (Claverie and Ogata, 2003). Moreover, genomic comparison also enabled the identification of several copies of Ankyrin and Tetratricopeptide (TPR)-repeats in rickettsiae.
Alphaproteobacteria species as a source and target of lateral sequence transfers
2014, Trends in MicrobiologyCitation Excerpt :DNA sequence gains can also be related to duplications, particularly through proliferation of DNA repeats and palindromic sequences [12]. Repeated palindromic elements of 100–150 base pairs are known to occur in Rickettsiales genomes [12,13], and they are considered to be selfish DNA. The stable integration of transferred sequences into a new genome depends on four features: (i) the opportunity to encounter other species to exchange genetic elements, which is favored in bacterial communities with a sympatric lifestyle over cells living isolated allopatric lifestyles; (ii) the ability or power to integrate into host cells due to the existence of a mobilome; (iii) a tRNA repertoire that allows the translation and use of transferred sequences that results in gene expression; and (iv) the use of a gene product that allows for positive selection (Box 3).
Life span extension via eIF4G inhibition is mediated by posttranscriptional remodeling of stress response gene expression in C. elegans
2011, Cell MetabolismCitation Excerpt :It has been proposed that natural selection favors shorter genetic coding sequence length for higher transcriptional efficiency, efficient protein synthesis, and the avoidance of deleterious mutation accumulation. However, imparting new or improved functions to a protein usually requires elongating its coding sequence (Zhang, 2000, Akashi, 2003, Claverie and Ogata, 2003). We speculate that longer genes in eukaryotes are important for responding to changing environmental conditions and evolved later in time than those necessary for the most basic functions of growth and reproduction.