Ribosomal ITS sequences and plant phylogenetic inference

https://doi.org/10.1016/S1055-7903(03)00208-2Get rights and content

Abstract

One of the most popular sequences for phylogenetic inference at the generic and infrageneric levels in plants is the internal transcribed spacer (ITS) region of the 18S–5.8S–26S nuclear ribosomal cistron. The prominence of this source of nuclear DNA sequence data is underscored by a survey of phylogenetic publications involving comparisons at the genus level or below, which reveals that of 244 papers published over the last five years, 66% included ITS sequence data. Perhaps even more striking is the fact that 34% of all published phylogenetic hypothesis have been based exclusively on ITS sequences. Notwithstanding the many important contributions of ITS sequence data to phylogenetic understanding and knowledge of genome relationships, a number of molecular genetic processes impact ITS sequences in ways that may mislead phylogenetic inference. These molecular genetic processes are reviewed here, drawing attention to both underlying mechanism and phylogenetic implications. Among the most prevalent complications for phylogenetic inference is the existence in many plant genomes of extensive sequence variation, arising from ancient or recent array duplication events, genomic harboring of pseudogenes in various states of decay, and/or incomplete intra- or inter-array homogenization. These phenomena separately and collectively create a network of paralogous sequence relationships potentially confounding accurate phylogenetic reconstruction. Homoplasy is shown to be higher in ITS than in other DNA sequence data sets, most likely because of orthology/paralogy conflation, compensatory base changes, problems in alignment due to indel accumulation, sequencing errors, or some combination of these phenomena. Despite the near-universal usage of ITS sequence data in plant phylogenetic studies, its complex and unpredictable evolutionary behavior reduce its utility for phylogenetic analysis. It is suggested that more robust insights are likely to emerge from the use of single-copy or low-copy nuclear genes.

Introduction

As testified by the launching of the journal in which these words appear, molecular sequence data have revolutionized phylogenetic analysis. Since the late 1980s but at a seemingly ever-increasing pace over the last decade, molecular phylogenetic hypotheses are being forwarded for nearly all groups of organisms. In plants, the majority of sequenced-based molecular phylogenetic studies, particularly in the early years, were based exclusively on genes and spacers from the plastid genome (Catalán et al., 1997; Clegg, 1993; Olmstead and Palmer, 1994; Olmstead and Reeves, 1995; Soltis et al., 1998), most notably rbcL (Chase et al., 1993). With increasing recognition of the dangers inherent in relying exclusively on what typically are uniparentally inherited sequences for phylogenetic inference (Rieseberg and Soltis, 1991; Rieseberg and Wendel, 1993), widespread enthusiasm developed in the plant systematics community for the inclusion of sequence data from nuclear markers. For reasons enumerated below but accelerated by sociological factors, a single kind of nuclear locus experienced a meteoric rise in popularity, becoming almost a sine qua non for phylogenetic inference at generic and infrageneric levels in plants. Accordingly, this tool, the internal transcribed spacer (ITS) region of the 18S–5.8S–26S nuclear ribosomal cistron, now is extensively employed around the globe, having first been utilized scarcely a decade ago (Baldwin, 1992, Baldwin, 1993).

To illustrate just how popular ITS sequence-based phylogenetic analyses have become since the early review by Baldwin et al. (1995), we surveyed plant phylogenetic publications during the last five years in several of the most prominent systematics and evolution journals. This tabulation revealed that of 244 papers, fully two-thirds (66%) involving comparisons at the genus level or below included ITS sequence data. Perhaps even more striking is the fact that more than one third (34%) of all published phylogenetic hypothesis have been based exclusively on ITS sequences.

Why has ITS-based phylogenetic analysis come to dominate plant molecular phylogenetic methodology? Apparently for at least two sets of reasons, one based on its presumed advantageous properties for phylogenetic inference, but the other apparently from a rather powerful bandwagon effect, whereby ITS utilization was accelerated in the community by usage itself, without much explicit challenge of the appropriateness of the tool. We will not delve into this latter set of sociological factors further here, but instead reiterate the long-noted (Baldwin et al., 1995) properties of ITS loci that were claimed to be advantageous for purposes of phylogenetic reconstruction:

Biparental inheritance. Since 18S–26S rDNA arrays reside in the nuclear genome, ITS sequences are biparentally inherited, and are thus distinguished from the cpDNA loci in widespread use. Some of the earlier studies demonstrated how valuable this property is for revealing past cases of reticulation, hybrid speciation, and parentage of polyploids (Baldwin, 1992; Baldwin et al., 1995; Kim and Jansen, 1994; Rieseberg et al., 1990; Rieseberg and Soltis, 1991; Rieseberg and Wendel, 1993; Wendel et al., 1995).

Universality. White et al. (1990) described a set of primers that was useful for amplifying ITS sequences from most plant and fungal phyla. This obviated the need for primer design or prior sequence knowledge, meaning that ITS sequence data could be more readily obtained than perhaps any other nuclear marker.

Simplicity. Nuclear ribosomal genes are constituents of individual 18S–5.8S–26S repeats, which typically are in the size range of about 10 kb. These repeats are tandemly reiterated at one or more chromosomal loci per haploid complement. Because there are hundreds to thousands of nuclear rDNA repeats in plant genomes, they are more easily isolated than most low-copy nuclear loci, requiring little experimental expertise to successfully amplify. In plants, ITS sequences vary in length from approximately 500–700 bp in angiosperms (Baldwin et al., 1995) to 1500–3700 bp in some gymnosperms (Bobola et al., 1992; Germano and Klein, 1999; Liston et al., 1996; Maggini et al., 2000; Marrocco et al., 1996). Excluding gymnosperms, both the high copy number and the small size of the target DNA fragment facilitate ITS amplification by PCR, even permitting the use of ancient material, herbarium specimens, and samples other than from living material (much as with cpDNA).

Intragenomic uniformity. It has long been recognized that multigene families in general and ITS sequences in particular may be subject to a phenomenon termed concerted evolution (Ainouche and Bayer, 1997; Brochmann et al., 1996; Elder and Turner, 1995; Franzke and Mummenhoff, 1999; Fuertes Aguilar et al., 1999a; Hillis et al., 1991; Roelofs et al., 1997; Schlötterer and Tautz, 1994; Wendel et al., 1995; Zimmer et al., 1980, among others). Concerted evolution occurs when sequence differences among reiterated copies in the genome, which should be accumulating their own distinct mutations, become homogenized to the same sequence type by mechanisms such as high-frequency unequal crossing over or gene conversion. When carried to completion, this process eliminates both sequence variation within genomes and potentially confounding variation, leaving only species- and clade-specific character-state changes to inform phylogenetic reconstruction efforts.

Intergenomic variability. An early observation was that ITS sequence variation levels are suitable for phylogenetic inference at the specific, generic or even family levels (Baldwin, 1992; Baldwin et al., 1995). Baldwin and others noted that the variation at hierarchical levels at which most phylogeneticists work (generic and sub-generic) is attributable mostly to nucleotide polymorphisms, but that insertion–deletion polymorphisms (indels) are also common. They further reported divergence values that ranged from 0 to 39% in pairwise comparisons between taxa, with 5–59% of these being potentially phylogenetically informative (Baldwin et al., 1995).

Low functional constraint. It was thought that since the ITS sequences are removed via splicing during transcript processing, they would be subject to reasonably mild functional constraints, which in turn would offer a preponderance of nucleotide sites that would evolve essentially neutrally. The functionality of ITS is related to specific cleavage of the primary transcript within ITS-1 and ITS-2 during maturation of the small subunit (SSU), 5.8S, and the large subunit (LSU) ribosomal RNAs (Hadjiolova et al., 1984, Hadjiolova et al., 1994; Musters et al., 1990; Nashimoto et al., 1988; Veldman et al., 1981; van Nues et al., 1994). Although this maturation and splicing process depends on the secondary structure of ITS, implying some degree of conservation at the sequence or at the structure level (Mai and Coleman, 1997), the presumption of limited functional constraint was widely adopted and further justified by observations of extensive nucleotide and length variation.

The foregoing list of properties constitute an impressive set of advantages for experimental design, and so it is not surprising that ITS-based phylogenetics rapidly grew in popularity. Given the prevalence of ITS sequence data in plant phylogenetic analyses, it seems prudent to pause and reflect upon these and other molecular evolutionary properties that are relevant to its utilization. We were motivated by the realization that the several advantages noted above may be counterbalanced by phenomena that are expected to confound phylogenetic analyses, and that a deeper understanding of process might lead to enhanced insight into evolutionary history. Some of the relevant phenomena have been revealed or informed by the actual process of phylogenetic inference, where unexpected results or incongruent topologies were recovered. We review here molecular genetic processes that impact ITS sequence variation, drawing attention to the implications for phylogenetic inference.

Section snippets

Multiple rDNA arrays

It has long been known that 18S–26S rDNA arrays and their RNA products constitute an essential component of eukaryotic NORs (nucleolus organizing regions). The number and distribution of NOR loci in eukaryotic genomes are variable, as is their size (Brown et al., 1993; Panzera et al., 1996; Pedersen and Linde-Laursen, 1994; Tartof and Dawid, 1976; Vanzela et al., 1998; Worton et al., 1988). Moreover, the number and genomic location of NOR arrays is evolutionary labile within families and genera

Conclusions

Our purpose is writing this review was to illuminate and bring attention to the many molecular evolutionary and organism-level processes that may impact sequence variation for ITS repeats in plants, and thereby hopefully contribute to a more informed utilization in phylogenetic analyses. Although we have discussed a rather lengthy list of phenomena that may generate intragenomic sequence variation, to a certain extent the problem this creates for phylogenetic analysis is the same regardless of

Acknowledgments

Financial support was provided by the National Science Foundation and the Spanish Ministry of Education, Culture and Sports.

References (181)

  • M. Mayol et al.

    Why nuclear ribosomal DNA spacers (ITS) tell different stories in Quercus

    Mol. Phylogenet. Evol.

    (2001)
  • S.P. Adams et al.

    Ribosomal DNA evolution and phylogeny in Aloe (Asphodelaceae)

    Am. J. Bot.

    (2000)
  • M.L. Ainouche et al.

    On the origins of the tetraploid Bromus species (section Bromus, Poaceae): Insights from the internal transcribed spacer sequences of nuclear ribosomal DNA

    Genome

    (1997)
  • I. Álvarez Fernández et al.

    A phylogenetic analysis of Doronicum (Asteraceae, Senecioneae) based on morphological, nuclear ribosomal (ITS), and chloroplast (trnL-F) evidence

    Mol. Phylogenet. Evol.

    (2001)
  • K. Anamthawat-Jónsson

    Molecular cytogenetics of introgressive hybridization in plants

    Methods Cell Sci.

    (2001)
  • K. Anamthawat-Jónsson et al.

    Genomic and genetic relationships among species of Leymus (Poaceae: Triticeae) inferred from 18S–26S ribosomal genes

    Am. J. Bot.

    (2001)
  • N. Arnheim

    Concerted evolution of multigene families

  • V.E.T.M. Ashworth

    Phylogenetic relationships in Phoradendreae (Viscaceae) inferred from three regions of the nuclear ribosomal cistron. I. Major lineages and paraphyly of Phoradendron

    Syst. Bot.

    (2000)
  • Bailey, C.D., Hughes, C.E., Harris, S.A., Carr, T.G., 2002. Characterization of angiosperm nrDNA polymorphism: Paralogy...
  • B.G. Baldwin

    Molecular phylogenetics of Calycadenia (Compositae) based on ITS sequences of nuclear ribosomal DNA: Chromosomal and morphological evolution reexamined

    Am. J. Bot.

    (1993)
  • B.G. Baldwin et al.

    The ITS region of nuclear ribosomal DNA: A valuable source of evidence on angiosperm phylogeny

    Ann. Missouri Bot. Gard.

    (1995)
  • T.J. Barkman et al.

    Hybrid origin and parentage of Dendrochilum acuiferum (Orchidaceae) inferred in a phylogenetic context using nuclear and plastid DNA sequence data

    Syst. Bot.

    (2002)
  • D.A. Baum et al.

    A phylogenetic analysis of Epilobium (Onagraceae) based on nuclear ribosomal DNA sequences

    Syst. Bot.

    (1994)
  • D.A. Baum et al.

    Biogeography and floral evolution of baobabs (Adansonia, Bombacaceae) as inferred from multiple data sets

    Syst. Biol.

    (1998)
  • A. Baumel et al.

    Molecular investigations in populations of Spartina anglica C.E. Hubbard (Poaceae) invading coastal Brittany (France)

    Mol. Ecol.

    (2001)
  • P.M. Beardsley et al.

    Redefining Phrymaceae: The placement of Mimulus, tribe Mimuleae, and Phryma

    Am. J. Bot.

    (2002)
  • B. Bechmann et al.

    Improvement of PCR amplified DNA sequencing with the aid of detergents

    Nucleic Acids Res.

    (1990)
  • M.S. Bobola et al.

    Five major nuclear ribosomal repeats represent a large and variable fraction of the genomic DNA of Picea rubens and P. mariana

    Mol. Biol. Evol.

    (1992)
  • E. Bortiri et al.

    Phylogeny and Syatematics of Prunus (Rosaceae) as determined by sequence analysis of ITS and the chloroplast trnL-trnF spacer DNA

    Syst. Bot.

    (2001)
  • E. Bortiri et al.

    The phylogenetic utility of nucleotide sequences of sorbitol 6-phosphate dehydrogenase in Prunus (Rosaceae)

    Am. J. Bot.

    (2002)
  • C. Brochmann et al.

    A classic example of postglacial allopolyploid speciation re-examined using RAPD markers and nucleotide sequences: Saxifraga osloensis (Saxifragaceae)

    Symb. Bot. Ups.

    (1996)
  • G.R. Brown et al.

    Preliminary karyotype and chromosomal localization of ribosomal DNA sites in white spruce using fluorescence in situ hybridization

    Genome

    (1993)
  • E.S. Buckler et al.

    Zea ribosomal repeat evolution and substitution patterns

    Mol. Biol. Evol.

    (1996)
  • E.S. Buckler et al.

    Zea systematics: Ribosomal ITS evidence

    Mol. Biol. Evol.

    (1996)
  • E.S. Buckler et al.

    The evolution of ribosomal DNA: Divergent paralogous and phylogenetic implications

    Genetics

    (1997)
  • F.J. Camacho et al.

    Endophytic fungal DNA, the source of contamination in spruce needle DNA

    Mol. Ecol.

    (1997)
  • C.S. Campbell et al.

    Persistent nuclear ribosomal DNA sequence polymorphism in the Amelanchier agamic complex (Rosaceae)

    Mol. Biol. Evol.

    (1997)
  • M.W. Chase et al.

    Phylogenetics of seed plants: An analysis of nucleotide sequences from the plastid gene rbcL

    Ann. Missouri Bot. Gard.

    (1993)
  • M.T. Clegg

    Chloroplast gene sequences and the study of plant evolution

    Proc. Natl. Acad. Sci. USA

    (1993)
  • J.A. Clevinger et al.

    Phylogenetic analysis of Silphium and subtribe Engelmanniinae (Asteraceae: Heliantheae) based on ITS and ETS sequence data

    Am. J. Bot.

    (2000)
  • H.P. Comes et al.

    Molecular phylogeography, reticulation, and lineage sorting in Mediterranean Senecio sect. Senecio (Asteraceae)

    Evolution Int. J. Org. Evolution

    (2001)
  • J.A. Compton et al.

    Phylogeny and circumscription of tribe Actaeae (Ranunculaceae)

    Syst. Bot.

    (2002)
  • R.C. Cronn et al.

    Duplicated genes evolve independently after polyploid formation in cotton

    Proc. Natl. Acad. Sci. USA

    (1999)
  • R.C. Cronn et al.

    PCR-mediated recombination in amplification products derived from polyploid cotton

    Theor. Appl. Genet.

    (2002)
  • R.C. Cronn et al.

    Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes

    Am. J. Bot.

    (2002)
  • K. Dagne et al.

    Number and sites of rDNA loci of Guizotia abyssinica (L. f.) Cass. as determined by fluorescence in situ hybridization

    Hereditas

    (2000)
  • K.J. Danna et al.

    5S rRNA genes in tribe Phaseoleae: Array size, number, and dynamics

    Genome

    (1996)
  • J.I. Davis et al.

    Data decisiveness, data quality, and incongruence in phylogenetic analysis: An example from the monocotyledons using mitochondrial atp A sequences

    Syst. Biol.

    (1998)
  • M.T. Dixon et al.

    Ribosomal RNA secondary structure: Compensatory mutations and implications for phylogenetic analysis

    Mol. Biol. Evol.

    (1993)
  • J.J. Doyle

    Gene trees and species trees: Molecular systematics as one-character taxonomy

    Syst. Bot.

    (1992)
  • Cited by (0)

    View full text