Elsevier

Gene

Volume 238, Issue 1, 30 September 1999, Pages 65-77
Gene

Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms

https://doi.org/10.1016/S0378-1119(99)00297-8Get rights and content

Abstract

In the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be such that A=T and C=G (this state is called Parity Rule type 2, PR2). At a genome scale, i.e. when considering the base composition of a whole genome, PR2 is a good approximation, but there are local and systematic deviations. The question is whether these deviations are a consequence of an underlying bias in mutation or selection. We have tried to review published hypotheses to classify them within the mutational or selective group. This dichotomy is, however, too crude because there is at least one hypothesis based simultaneously upon mutation and selection.

Introduction

Chargaff (1950) experimentally determined A=T and G=C equimolar frequencies when analysing both DNA strands together. Three years later, Watson and Crick (1953) determined the DNA secondary structure and stated the base-pairing rules that explain such frequencies. More surprisingly, these equalities are still observed within each strand (Lin and Chargaff, 1967). Under no-strand-bias conditions, when mutation and selection have equal effect on both strands, there are six possible substitution rates instead of 12, as stated in the Parity Rule type 1, PR1 (Sueoka, 1995). The rationale for this is as follows: since substitution rates are scored on one strand, a change such as T to C in a given strand results from either a T→C substitution on that strand or an A→G on the complementary strand. The type-2 parity rule, PR2 can be formally derived from PR1 (Lobry, 1995) to give the base frequencies within each strand at equilibrium: A=T and G=C. Moreover, convergence to PR2 is also expected when the substitution rates are not constant over time (Lobry and Lobry, 1999). Any deviation from PR2 implies asymmetric substitution: the result of different mutation rates, different selective pressures, or both, between the two strands of DNA. There are two principal ways of studying asymmetric substitution; phylogenetic reconstruction of base substitution and detection of deviations from PR2.

In the first method, asymmetries are detected by aligning homologous sequences, estimating the substitution matrix and comparing the frequencies of complementary changes. Wu and Maeda (1987) used this method to test for asymmetric substitution in a region of the β-globin complex of primates. They did detect asymmetries, but since origins and termini of replication of their data were not known, their results are not reliable, as shown by Bulmer (1991), who re-examined an adjacent region of the human β-globin complex. Francino et al. (1996) used the same method to search for asymmetric substitution in eubacteria. They found a difference between complementary changes C→T and G→A when scoring substitutions on the coding strand. The advantage of this method is that it directly detects the number and type of substitutions, but the access to a suitable data set is rather limited because of the difficulty of finding orthologous sequences with an adequate divergence time.

The second method (Lobry, 1996a) builds on the analysis of the DNA sequences for deviations from A=T and G=C frequencies. In 1990 such deviations in SV40 were interpreted as evidence for asymmetric mutation pressure because of a polarity switch at the origin of replication (Filipski, 1990). GC AT skews and are measured for example as the quantity of (C−G)/(G+C) and (A−T)/(A+T) along a DNA sequence using a sliding window. Lobry (1996a) showed the existence of GC and AT skews in the genome of Haemophilus influenzae and in parts of the Escherichia coli and Bacillus subtilis genomes. There is a disadvantage to this method as it is indirect, but the increasing number of completely sequenced genomes allows an extensive analysis of the variation in nucleotide composition within and between genomes. Several recent studies used this method to analyse mitochondrial, viral and bacterial genomes (with emphasis on the latter) for compositional asymmetries, revealing systematic deviations from PR2 in these genomes.

In bacteria, the deviations switch sign at the origin and terminus of replication, such that the leading strand of replication is generally richer in G than in C, and in T than in A. Absolute values for AT skews tend to be lower than for GC skews. Grigoriev (1998) measured skews over all bases and found that the leading strand generally contains more G than C. Third codon position skews show a GT-rich leading strand in all eubacteria (Francino and Ochman, 1997, McLean et al., 1998) except in the Mycoplasma species, where it is CT-rich (McLean et al., 1998), and in Synechocystis, where no skew was detected (McLean et al., 1998, Mrázek and Karlin, 1998). Archaebacteria generally do not skew (Karlin, 1998, Mrázek and Karlin, 1998), except for M. thermoautotrophicum, where a weak skew has been detected (McLean et al., 1998, Rocha et al., 1998). Compositional studies of all genome positions in several bacteria report a correlation between purine and coding strand excess (Freeman et al., 1998), and an excess of keto bases (GT) over amino bases (AC) in the leading strand (Freeman et al., 1998, Perrière et al., 1996). Rocha et al. (1998) observed compositional asymmetries between the leading and lagging strand genes at the level of nucleotides, codons and amino acids. Additionally, a strand compositional asymmetry was confirmed in the complete genomes of B. subtilis (Kunst et al., 1997), E. coli (Blattner et al., 1997), Rickettsia prowazekii (Andersson et al., 1998) and Treponema pallidum (Fraser et al., 1998).

The deviation divides the chromosome into two segments that are homogenous for GC(AT) skews, called chirochores (Lobry, 1996a) in analogy with isochores (Bernardi, 1989), which are domains of mammalian chromosomes with homogenous GC content. Chirochores coincide with replichores (Blattner et al., 1997), so that skews switch sign at the origin and terminus of replication. This polarity switch allows for the confirmation of the origin of replication. The method was used to predict the origin in Mycoplasma genitalium (Lobry, 1996b), R. prowazekii (Andersson et al., 1998), T. pallidum (Fraser et al., 1998) and Borrelia burgdorferi (Fraser et al., 1997), where the origin could not be detected (because of a lack of consensus patterns) or had not yet been detected experimentally. There is now experimental evidence that the replication origin is located where it was predicted in B. burgdorferi (Picardeau et al., 1999).

The perhaps clearest skews are seen in mitochondria: studies of the nucleotide composition of mitochondrial genomes (Jermiin et al., 1995, Perna and Kocher, 1995, Reyes et al., 1998, Tanaka and Ozawa, 1994) all report patterns of asymmetric substitution.

An early study of the bacteriophage λ genome (Daniels et al., 1983) reveals base distribution skew in this molecule, but gives no biological interpretation for the skew. Recently, Mrázek and Karlin (1998) observed asymmetric substitution in some herpesviruses and in the phages λ and T7, and Grigoriev (1998) detected skew in adenovirus type 40.

During revision of this article, several additional publications appeared, showing the existence of strand asymmetries for instance in chloroplasts and ds DNA viruses (see Note added in proof).

Generally, there are two ways of looking at the evolutionary changes of nucleotide composition; the selectionist and the neutralist point of view. These hypotheses differ in the estimate of the role of selection on base substitution. The neutralist hypothesis assumes that the average composition of non-coding DNA depends on a bias of selectively neutral mutations which accumulate during evolution. As an example, there are two main theories that explain the origin of isochores. According to the selectionist hypothesis, isochores are the result of positive selection for GC content as an adaptation to the high body temperature in warm blooded vertebrates (Bernardi et al., 1985). GC content would thereby be the result of positive selection for the functional advantages of the GC content itself. The mutational hypothesis assumes that the compositional biases of mutagenic processes are different in structurally and functionally distinct segments of DNA (Sueoka, 1988, Sueoka, 1992). This hypothesis is based on directional mutation pressure, and must therefore be regarded as being neutral rather than selective. Similarly, selective and mutational theories can be developed for the origin of strand specific nucleotide composition (although the selective hypotheses do not assume that asymmetric substitution is positively selected for because of a functional advantage of the asymmetry itself). Even though the mechanisms creating such patterns are not fully understood, recent publications provide us with several plausible hypotheses, which have been partly summarised in two recent papers (Francino and Ochman, 1997, Mrázek and Karlin, 1998). The idea of this review is to investigate current hypotheses and classify them as mutational or selective. There is, however, at least one hypothesis that must be regarded as based on both mutation and selection.

Section snippets

Bias on local scale

In organisms in which a large proportion of the genome consists of coding sequence (prokaryotes, mitochondria, chloroplasts and viruses), selective bias acting on a local scale can potentially influence global nucleotide composition. Because of the low proportion of control sequences and different species of RNA that do not translate into proteins, only protein coding sequences will be considered here.

Mutational mechanisms

Two important facts strongly suggest that strand asymmetries could be caused by mutational mechanisms. First, the violation of PR2 is pronounced at third codon positions and intergenic regions (Lobry, 1996a), where the selective pressure should be nearly neutral or at least weak. Second, the GC and AT deviations switch sign at origin and terminus of replication, which suggests a coupling with replication, repair or both.

Combination of selection and mutation

There is at least one possible mechanism that involves both selection and mutation. Francino et al. (1996) and Francino and Ochman (1997) suggested that processes which distinguish between transcribed/non-transcribed strand can account for DNA asymmetry. Transcription alone would not distinguish between leading and lagging strand, but in combination with biased gene orientation (discussed in Section 2.2.2), transcription-induced mutations could generate the compositional asymmetry between

Discussion

Compositional studies of bacterial, mitochondrial and viral genomes has established the existence of deviations from the frequencies A=T and G=C expected under no-strand-bias conditions. Skew values differ depending on what part of the genome is studied and different genomes conform differently to the predicted models. Therefore, compositional asymmetry could be a result of superposition of different mechanisms that influence base composition to different extents, and act differently in

Note added in proof

During the revision of this paper, some additional articles of great interest were published. A study by Grigoriev (1999, Virus Res. 60, 1–19) reveals compositional asymmetry between leading and lagging strand in 22 complete sequences of ds DNA viruses. Possible contributions of transcription and replication (and their associated repair mechanisms) are discussed along with other potential sources of strand bias. Similarly, in the chloroplast genome of Eugena gracilis (Morton, 1999, Proc. Natl.

References (103)

  • L.V. Mendelman

    Base mispair extension kinetics. Comparison of DNA polymerase alpha and reverse transcriptase

    J. Biol. Chem.

    (1990)
  • G.J. Olsen et al.

    Archaeal genomics: an overview

    Cell

    (1997)
  • J.D. Roberts et al.

    Mispair, site-, and strand-specific error rates during simian virus 40 origin-dependent replication in vitro with excess deoxythymine triphoshate

    J. Biol. Chem.

    (1994)
  • S.L. Salzberg et al.

    Skewed oligomers and origins of replication

    Gene

    (1998)
  • R.M. Schaaper

    Base selection, proofreading and mismatch repair during DNA replication in Escherichia coli

    J. Biol. Chem.

    (1993)
  • P.M. Sharp et al.

    Codon usage and genome evolution

    Curr. Opin. Genet. Dev.

    (1994)
  • M. Tanaka et al.

    Strand asymmetry in human mitochondrial DNA mutations

    Genomics

    (1994)
  • E.N. Trifonov

    Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences

    J. Mol. Biol.

    (1987)
  • T.V. Wang et al.

    Discontinuous DNA replication in a lig-7 strain of Escherichia coli is not the result of mismatch repair, nucleotide-excision repair, or the base-excision repair of DNA uracil

    Biochem. Biophys. Res. Comm.

    (1989)
  • S. Andersson

    Sequence and organization of the human mitochondrial genome

    Nature

    (1981)
  • S.G. Andersson

    The genome sequence of Rickettsia prowazekii and the origin of mitochondria

    Nature

    (1998)
  • T.A. Baker et al.

    Genetics and enzymology of DNA replication in Escherichia coli

    Annu. Rev. Genet.

    (1992)
  • A. Beletskii et al.

    Transcription-induced mutations: increase in C to T mutations in the non-transcribed strand during transcription in Escherichia coli

    Proc. Natl. Acad. Sci. USA

    (1996)
  • A. Beletskii et al.

    Correlation between transcription and C to T mutations in the non-transcribed DNA strand

    Biol. Chem.

    (1998)
  • G. Bernardi

    The mosaic genome of warm-blooded vertebrates

    Science

    (1985)
  • G. Bernardi

    The isochore organization of the human genome

    Annu. Rev. Genet.

    (1989)
  • F.R. Blattner

    The complete genome sequence of Escherichia coli K-12

    Science

    (1997)
  • J.L. Boore et al.

    Deducing the pattern of arthropod phylogeny from mitochondrial DNA rearrangements

    Nature

    (1995)
  • M. Bulmer

    Strand symmetry of mutation rates in the β-globin region

    J. Mol. Evol.

    (1991)
  • E. Chargaff

    Chemical specificity of nucleic acids and mechanism of their enzymatic degradation

    Experientia

    (1950)
  • S.T. Cole

    Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence

    Nature

    (1998)
  • F. Cornet et al.

    Restriction of the activity of the recombination site dif to a small zone of the Escherichia coli chromosome

    Genes Dev.

    (1996)
  • D.L. Daniels et al.

    Features of bacteriophage lambda: analysis of the complete nucleotide sequence

  • A. Datta et al.

    Association of increased spontaneous mutation rates with high levels of transcription in yeast

    Science

    (1995)
  • H. Echols et al.

    Fidelity mechanisms in DNA replication

    Annu. Rev. Biochem.

    (1991)
  • A.R. Fersht et al.

    DNA polymerase accuracy and spontaneous mutation rates: frequencies of purine·purine, purine·pyrimidine and pyrimidine·pyrimidine mismatches during DNA replication

    Proc. Natl. Acad. Sci. USA

    (1981)
  • I.J. Fijalkowska et al.

    Mutants in the Exo I motif of Esherichia coli dnaQ: defective proofreading and inviability due to error catastrophe

    Proc. Natl. Acad. Sci. USA

    (1996)
  • I.J. Fijalkowska et al.

    Unequal fidelity of leading and lagging strand DNA replication on the Escherichia coli chromosome

    Proc. Natl. Acad. Sci. USA

    (1998)
  • J. Filipski

    Evolution of DNA sequence, contributions of mutational bias and selection to the origin of chromosomal compartments

  • R.D. Fleischmann

    Whole-genome random sequencing and assembly of Haemophilus influenzae Rd

    Science

    (1995)
  • M.P. Francino et al.

    Asymmetries generated by transcription-coupled repair in enterobacterial genes

    Science

    (1996)
  • G.K. Frank et al.

    G and T nucleotide content show specie invariant negative correlation for all three codon positions

    J. Biomol. Struct. Dynam.

    (1997)
  • C.M. Fraser

    The minimal gene complement of Mycoplasma genitalium

    Science

    (1995)
  • C.M. Fraser

    Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi

    Nature

    (1997)
  • C.M. Fraser

    Complete genome sequence of Treponema pallidum, the syphilis spirochete

    Science

    (1998)
  • L.A. Frederico et al.

    A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation energy

    Biochemistry

    (1990)
  • J.M. Freeman et al.

    Patterns of genome organization in bacteria

    Science

    (1998)
  • M. Furusawa et al.

    Asymmetrical DNA replication promotes evolution: disparity theory of evolution

    Genetica

    (1998)
  • M. Gouy et al.

    Codon usage in bacteria: correlation with gene expressivity

    Nucleic Acids Res.

    (1982)
  • A. Grigoriev

    Analysing genomes with cumulative skew diagrams

    Nucleic Acids Res.

    (1998)
  • Cited by (249)

    • Strand asymmetries across genomic processes

      2023, Computational and Structural Biotechnology Journal
    • Codon usage patterns and evolution of HSP60 in birds

      2021, International Journal of Biological Macromolecules
      Citation Excerpt :

      The CUB is species- and gene-specific, and can be affected by many factors, including nucleotide composition, expression level, tRNA abundance, gene length, RNA stability, protein structure and function, hydrophobicity and hydrophilicity, and environmental stress [2,4–12]. Several models have been postulated to explain CUB phenomenon including the genome hypothesis, the mutational theory, the natural selection theory, and the selection-mutation-drift model [1,5,11,13,14]. Among them, selection-mutation-drift model with the increasing attention considers that codon usage pattern result from the combined effects of three evolutionary forces (selection forces, mutation pressure and genetic drift) [13,15,16].

    View all citing articles on Scopus
    View full text