Elsevier

Gene

Volume 317, 23 October 2003, Pages 39-47
Gene

Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content

https://doi.org/10.1016/S0378-1119(03)00660-7Get rights and content

Abstract

A number of recent studies have shown that thermophilic prokaryotes have distinguishable patterns of both synonymous codon usage and amino acid composition, indicating the action of natural selection related to thermophily. On the other hand, several other studies of whole genomes have illustrated that nucleotide bias can have dramatic effects on synonymous codon usage and also on the amino acid composition of the encoded proteins. This raises the possibility that the thermophile-specific patterns observed at both the codon and protein levels are merely reflections of a single underlying effect at the level of nucleotide composition. Moreover, such an effect at the nucleotide level might be due entirely to mutational bias. In this study, we have compared the genomes of thermophiles and mesophiles at three levels: nucleotide content, codon usage and amino acid composition. Our results indicate that the genomes of thermophiles are distinguishable from mesophiles at all three levels and that the codon and amino acid frequency differences cannot be explained simply by the patterns of nucleotide composition. At the nucleotide level, we see a consistent tendency for the frequency of adenine to increase at all codon positions within the thermophiles. Thermophiles are also distinguished by their pattern of synonymous codon usage for several amino acids, particularly arginine and isoleucine. At the protein level, the most dramatic effect is a two-fold decrease in the frequency of glutamine residues among thermophiles. These results indicate that adaptation to growth at high temperature requires a coordinated set of evolutionary changes affecting (i) mRNA thermostability, (ii) stability of codon–anticodon interactions and (iii) increased thermostability of the protein products. We conclude that elevated growth temperature imposes selective constraints at all three molecular levels: nucleotide content, codon usage and amino acid composition. In addition to these multiple selective effects, however, the genomes of both thermophiles and mesophiles are often subject to superimposed large changes in composition due to mutational bias.

Introduction

It has been known for more than 20 years that different genomes each have their own characteristic patterns of synonymous codon usage (Grantham et al., 1980), but it has not been easy to provide a satisfactory explanation for the particular pattern that is found in a given genome. Among single-celled organisms, highly expressed genes often use a more restricted set of “preferred” synonymous codons than other, less highly expressed genes (Gouy and Gautier, 1982), suggesting that codon usage patterns have a functional significance. In many cases, it has been shown that codon usage mirrors the distribution of tRNA abundances Ikemura, 1981, Shields and Sharp, 1987, Stenico et al., 1994, indicating that the “preferred” codons are those that match the more abundant anticodons. Although this correlation with tRNA abundance can explain the codon frequencies, it begs the question of what selective forces determine the variations in anticodon frequencies between genomes. Indeed, it has been suggested that the variations in codon frequencies might be the causes rather than the consequences of varying tRNA abundance (Bulmer, 1991). Recently, it has been shown that there is a consistent difference in the pattern of synonymous codon usage between thermophilic and mesophilic prokaryotes Kanaya et al., 2001, Lynn et al., 2002 and there is strong evidence that this difference is the result of selection linked to thermophily (Lynn et al., 2002).

Other recent studies (e.g., Kreil and Ouzounis, 2001) have shown that thermophiles and mesophiles can also be distinguished based on the amino acid composition of their proteomes and several authors have tried to relate these differences to functional adaptation Haney et al., 1999, Vieille et al., 2001, Gromiha, 2001, Dalhus et al., 2002, Zavala et al., 2002.

In addition to variations in synonymous codon usage and amino acid content of encoded proteins, genomes may also differ greatly in their nucleotide composition. Several recent genomic surveys have demonstrated that variations in nucleotide composition can have very significant effects, both on the patterns of codon usage Stenico et al., 1994, Frank and Lobry, 1999, Sueoka and Kawanishi, 2000, Kanaya et al., 2001, Grocock and Sharp, 2002, Lynn et al., 2002 and on the amino acid composition of the encoded proteins Lobry, 1997, Singer and Hickey, 2000, Knight et al., 2001, Kreil and Ouzounis, 2001. In some cases, it has been shown that mutational bias at the nucleotide level can cause parallel changes in both codon usage and amino acid content (e.g., Singer and Hickey, 2000, Knight et al., 2001). One of the best studied examples of mutational bias involves variation in the G+C content of genomes, although other forms of nucleotide bias, such as strand-specific biases, can also affect codon usage (McInerney, 1998). In the case of thermophiles, it has been suggested that a nucleotide bias in favor of certain nucleotides may explain the apparent adaptation at the level of amino acid sequence (Lao and Forsdyke, 2000).

The goal of this study was to compare the genome sequences of mesophiles and thermophiles, and to look for parallel patterns of change at the nucleotide, codon and amino acid levels. Since the effects of G+C variation have already been very well documented, we deliberately minimized this effect by limiting our analysis to those genomes with intermediate levels of G+C.

Section snippets

Data acquisition

The coding sequences (*.ffn files) from completely sequenced bacterial genomes were downloaded from the GenBank FTP site (ftp://ftp.ncbi.nih.gov/genomes/Bacteria), including the coding sequences from sequenced plasmids when they were available. We then measured the overall nucleotide contents of each genome and selected only those genomes with intermediate G+C contents in order to reduce the effects of GC bias in our analyses. The 16 genomes that were retained for analysis are listed in Table 1

Results

First, we analyzed the patterns of synonymous codon usage among the 16 genomes (see Table 2 and Fig. 1). In the pair-wise comparisons (shown in Table 2), we observe significant differences (p<0.01) in the frequencies of 23 codons (out of a total of 59 synonymous codons). Among the thermophiles, there were increases in the relative frequencies of 11 codons (GGA, AGG, AGA, AAG, AAC, ATA, TAC, TTC, CAC, CTT and CTC) and decreases in 12 codons (AAT, ATT, ATC, TAT, TTG, TTT, CGG, CGA, CGT, CGC and

Discussion

Our analyses revealed several significant differences between the genome sequences of mesophiles and thermophiles. At the level of nucleotide composition, we confirmed the finding of Lao and Forsdyke (2000) that the coding sequences of thermophiles are relatively rich in purines, although we found that this increase was due almost entirely to an increased frequency of adenine (A). This confirms the suggestions of Schultes et al. (1997) and Wang and Hickey (2002) that purines—and especially

Acknowledgements

This work was supported by a Research Grant from NSERC Canada (D.A.H.) and graduate scholarships from the University of Ottawa and NSERC (G.A.C.S.).

References (39)

  • M. Bulmer

    The selection-mutation-drift theory of synonymous codon usage

    Genetics

    (1991)
  • J.H. Cate et al.

    RNA tertiary structure mediation by adenosine platforms

    Science

    (1996)
  • G.D. Clarke et al.

    Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores

    J. Bacteriol.

    (2002)
  • G. Deckert

    The complete genome of the hyperthermophilic bacterium Aquifex aeolicus

    Nature

    (1998)
  • W. Deng

    Genome sequence of Yersinia pestis KIM

    J. Bacteriol.

    (2002)
  • M. Dumontier et al.

    Species-specific protein sequence and fold optimizations

    BMC Bioinformatics

    (2002)
  • J. Felsenstein

    Phylogenies and the comparative method

    Am. Nat.

    (1985)
  • J. Felsenstein

    PHYLIP (Phylogeny Inference Package) version 3.6a3. Distributed by the author

    (2002)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Eugen.

    (1936)
  • Cited by (154)

    • Genetic evolution and codon usage analysis of NKX-2.5 gene governing heart development in some mammals

      2020, Genomics
      Citation Excerpt :

      The moderate low values were shown by the T ending dinucleotides, while the moderate high values were preferentially found for G ending dinucleotides. These results suggest that the main contributors to codon bias appear to be compositional mutation bias and natural selection, with varying relative magnitude in different species [65,66]. It is becoming increasingly evident that synonymous SNPs play a major role in differential responses of individuals to the same drug.

    View all citing articles on Scopus

    Supplementary data associated with this article can be found, in the version, at doi:10.1016/S0378-1119(03)00660-7.

    1

    Present address: Department of Genetics, Trinity College, Dublin.

    View full text