The insertion of palindromic repeats in the evolution of proteins

https://doi.org/10.1016/S0968-0004(02)00036-1Get rights and content

Abstract

The current theory of protein evolution is that all contemporary proteins are derived from an ancestral subset. However, each new sequenced genome exhibits many genes with no detectable homologues in other species, leading to the paradoxical picture of a universal ancestor with more genes than any of its progeny. Standard explanations indicate that fast evolving genes might disappear into the ‘twilight zone’ of sequence similarity. Regardless of the size of the original ancestral subset, its origin and the potential mechanisms of its subsequent enlargement are rarely addressed. Sequencing of Rickettsia conorii genome recently led to the discovery of three families of repeat–mobile elements frequently inserted into the middle of protein coding genes. Although not yet identified in other species of bacteria, this discovery has provided the first clear evidence for the de novo creation of long protein segments (up to 50 amino acid residues) by repeat insertion. Based on previous results and theories on the coding potential of palindromic elements, we speculate that their insertion and mobility might have played a significant role in the early stages of protein evolution.

Section snippets

Coding potential of a palindromic sequence

Within a palindromic sequence, the left half and the right half of the sequence from the same DNA strand are, by definition, complementary to each other. How could such a sequence therefore emerge by chance in the course of evolution? Although the detailed mechanisms are still unknown, the duplication of a DNA segment followed by its inverted insertion at one of the extremity of the original segment is the most probable scenario for the generation of a palindromic sequence (Fig. 2). There is

Palindromic ORFs lead to well-behaved putative proteins

In addition to a higher probability of being more ‘open’ than the other antisense reading frames, RF−1 corresponds to amino acid frequencies close to the composition of actual proteins [9]. This is shown in Table 1, where the χ2 value was computed to measure the difference between the typical composition of actual proteins (RF+1) and proteins derived from other frames. Using this criterion, the amino acid composition derived from RF−1 is closest to that of normal proteins (RF+1). This implies

Protein folding

Blalock's molecular recognition theory 14, 15 claims that a peptide derived from the antisense RF−1 exhibits a more than random binding affinity to the peptide derived from the sense RF+1. Still controversial, this theory is based on a tendency for the ‘antipeptides’ encoded on the antisense strand (RF−1) to exhibit hydropathy profiles that are somewhat complementary to the protein encoded by the sense ORF (RF+1) [15]. Although the mechanisms of the molecular interaction between ‘complementary’

Structure of the RPEs

The arguments outlined previously suggest that palindromic elements generated in the RF+1/RF−1 configuration have: (1) a high coding probability; (2) probably lead to a soluble peptide; and (3) might have a tendency to adopt a compactly folded, self-contained domain-like structure. This leads to the prediction that RF+1/RF−1 should be the dominant configuration for the identified RPEs.

Testing this prediction with the RPE sequences of today is not straightforward as they have accumulated

Peptide insertion as a good evolutionary strategy

In contrast to other repeats, RPE insertions show no preference for noncoding sequences versus coding sequences. Within protein coding regions, the insertion sites of the RPE-derived peptides always appear to be at the surface of the protein structures 1, 3. In a typical bacterial genome, noncoding sequences and ORFs represent about 20% and 80% of the sequence, respectively. Considering that a quarter of a protein sequence corresponds to its surface residues [21], the target-sequence sizes

Creating new proteins from old repeats

The contribution of noncoding repeated elements to the evolution of proteins has been recurrently argued and remains controversial. It is clear that their mobility and selfish amplification enables them to play a major role in the plasticity of genomic sequences. Short tandem repeats of DNA oligomers, such as microsatellites, are abundant in both prokaryotic and eukaryotic genomes 30, 31. Their expansion mechanism is thought to involve slipped-strand mispairing, which might be the result of

Concluding remarks

Until now, a clear case of a well-conserved large repeat family identified at high frequency in both the coding and non-coding fraction of a genome was missing. This is now provided by RPE-1 and, to a lesser extent, RPE-2 and RPE-3. These repeats exhibit a palindromic structure (required for mobility and amplification), a high entropy sequence (required for real protein creativity), a length compatible with stable self-contained folding (35–50 residues), and evidence for multiple insertions

Note added in proof

For additional speculations about proteins arising from opposite strands of the same gene see Carter, C.W. and Duax, W.L. (2002) Did tRNA synthetase classes arise on opposite strands of the same gene? Mol. Cell. 10, 705–708.

Acknowledgements

We would like to thank Chantal Abergel for helpful discussions and for allowing access to her experimental work on Rickettsia palindromic element-containing proteins before publication. We also thank Karsten Suhre and David Pollock for their critical reading of this article.

References (42)

  • H. Margalit

    A complete Alu element within the coding sequence of a central gene

    Cell

    (1994)
  • A. Nekrutenko et al.

    Transposable elements are found in a large number of human protein-coding genes

    Trends Genet.

    (2001)
  • H. Ogata

    Selfish DNA in protein-coding genes of Rickettsia

    Science

    (2000)
  • H. Ogata

    Mechanisms of evolution in Rickettsia conorii and R. prowazekii

    Science

    (2001)
  • H. Ogata

    Protein coding palindromes are a unique but recurrent feature in Rickettsia

    Genome Res.

    (2002)
  • S. Bachellier

    Repeated sequences

  • A.C. Frank

    Genome deterioration: loss of repeated sequences and accumulation of junk DNA

    Genetica

    (2002)
  • S. Inouye et al.

    Structure, function, and evolution of bacterial reverse transcriptase

    Virus Genes

    (1995)
  • E. Boles et al.

    Open reading frames in the antisense strands of genes coding for glycolytic enzymes in Saccharomyces cerevisiae

    Mol. Gen. Genet.

    (1994)
  • T. Yomo et al.

    A frame-specific symmetry of complementary strands of DNA suggests the existence of genes on the antisense strand

    J. Mol. Evol.

    (1994)
  • D.R. Forsdyke

    Sense in antisense?

    J. Mol. Evol.

    (1995)
  • Cited by (39)

    • Insertions and deletions in protein evolution and engineering

      2022, Biotechnology Advances
      Citation Excerpt :

      Examples of the latter have been detected in the genome of Rickettsia conorii, where mobile palindromic repeat elements were discovered that are capable of insertion in open reading frames (ORFs). Surprisingly, the mobile elements persistently appear at the surface of the proteins coded by those ORFs (Claverie and Ogata, 2003). In this way, the original fold and function of the scaffold proteins are unaffected by the insertion.

    • Efficient biosynthesis of 1-cyanocyclohexaneacetic acid using a highly soluble nitrilase by N-terminus modification of novel peptide tags

      2021, Biochemical Engineering Journal
      Citation Excerpt :

      Palindromic sequences, which have a high tendency to form helixes and relatively low structural complexity, typically consist of multiple repeating units containing one or two polar amino acids, and have net positive or negative charges [27]. The insertion and migration of palindromic sequences probably play an important role in the evolution of proteins, and have a significant impact on protein folding and solubility [28]. Nitrilase (EC 3.5.5.1) can efficiently convert cyano groups to carboxyl groups in a one-step reaction, which plays an important role in the synthesis of fine chemicals and carboxylic acids.

    • Paradoxical evolution of rickettsial genomes

      2019, Ticks and Tick-borne Diseases
      Citation Excerpt :

      Bacteria may use this random strategy to adapt their genetic repertoire in response to selective environmental pressure. The presence of a mobile element inserted in many unrelated genes also suggests the potential role of selfish DNA in rickettsial genome for de novo creation of new protein sequences during the course of evolution, suggesting an implication in the dynamics of genome evolution (Claverie and Ogata, 2003). Moreover, genomic comparison also enabled the identification of several copies of Ankyrin and Tetratricopeptide (TPR)-repeats in rickettsiae.

    • Alphaproteobacteria species as a source and target of lateral sequence transfers

      2014, Trends in Microbiology
      Citation Excerpt :

      DNA sequence gains can also be related to duplications, particularly through proliferation of DNA repeats and palindromic sequences [12]. Repeated palindromic elements of 100–150 base pairs are known to occur in Rickettsiales genomes [12,13], and they are considered to be selfish DNA. The stable integration of transferred sequences into a new genome depends on four features: (i) the opportunity to encounter other species to exchange genetic elements, which is favored in bacterial communities with a sympatric lifestyle over cells living isolated allopatric lifestyles; (ii) the ability or power to integrate into host cells due to the existence of a mobilome; (iii) a tRNA repertoire that allows the translation and use of transferred sequences that results in gene expression; and (iv) the use of a gene product that allows for positive selection (Box 3).

    • Life span extension via eIF4G inhibition is mediated by posttranscriptional remodeling of stress response gene expression in C. elegans

      2011, Cell Metabolism
      Citation Excerpt :

      It has been proposed that natural selection favors shorter genetic coding sequence length for higher transcriptional efficiency, efficient protein synthesis, and the avoidance of deleterious mutation accumulation. However, imparting new or improved functions to a protein usually requires elongating its coding sequence (Zhang, 2000, Akashi, 2003, Claverie and Ogata, 2003). We speculate that longer genes in eukaryotes are important for responding to changing environmental conditions and evolved later in time than those necessary for the most basic functions of growth and reproduction.

    View all citing articles on Scopus
    View full text