Ribosomal ITS sequences and plant phylogenetic inference
Introduction
As testified by the launching of the journal in which these words appear, molecular sequence data have revolutionized phylogenetic analysis. Since the late 1980s but at a seemingly ever-increasing pace over the last decade, molecular phylogenetic hypotheses are being forwarded for nearly all groups of organisms. In plants, the majority of sequenced-based molecular phylogenetic studies, particularly in the early years, were based exclusively on genes and spacers from the plastid genome (Catalán et al., 1997; Clegg, 1993; Olmstead and Palmer, 1994; Olmstead and Reeves, 1995; Soltis et al., 1998), most notably rbcL (Chase et al., 1993). With increasing recognition of the dangers inherent in relying exclusively on what typically are uniparentally inherited sequences for phylogenetic inference (Rieseberg and Soltis, 1991; Rieseberg and Wendel, 1993), widespread enthusiasm developed in the plant systematics community for the inclusion of sequence data from nuclear markers. For reasons enumerated below but accelerated by sociological factors, a single kind of nuclear locus experienced a meteoric rise in popularity, becoming almost a sine qua non for phylogenetic inference at generic and infrageneric levels in plants. Accordingly, this tool, the internal transcribed spacer (ITS) region of the 18S–5.8S–26S nuclear ribosomal cistron, now is extensively employed around the globe, having first been utilized scarcely a decade ago (Baldwin, 1992, Baldwin, 1993).
To illustrate just how popular ITS sequence-based phylogenetic analyses have become since the early review by Baldwin et al. (1995), we surveyed plant phylogenetic publications during the last five years in several of the most prominent systematics and evolution journals. This tabulation revealed that of 244 papers, fully two-thirds (66%) involving comparisons at the genus level or below included ITS sequence data. Perhaps even more striking is the fact that more than one third (34%) of all published phylogenetic hypothesis have been based exclusively on ITS sequences.
Why has ITS-based phylogenetic analysis come to dominate plant molecular phylogenetic methodology? Apparently for at least two sets of reasons, one based on its presumed advantageous properties for phylogenetic inference, but the other apparently from a rather powerful bandwagon effect, whereby ITS utilization was accelerated in the community by usage itself, without much explicit challenge of the appropriateness of the tool. We will not delve into this latter set of sociological factors further here, but instead reiterate the long-noted (Baldwin et al., 1995) properties of ITS loci that were claimed to be advantageous for purposes of phylogenetic reconstruction:
• Biparental inheritance. Since 18S–26S rDNA arrays reside in the nuclear genome, ITS sequences are biparentally inherited, and are thus distinguished from the cpDNA loci in widespread use. Some of the earlier studies demonstrated how valuable this property is for revealing past cases of reticulation, hybrid speciation, and parentage of polyploids (Baldwin, 1992; Baldwin et al., 1995; Kim and Jansen, 1994; Rieseberg et al., 1990; Rieseberg and Soltis, 1991; Rieseberg and Wendel, 1993; Wendel et al., 1995).
• Universality. White et al. (1990) described a set of primers that was useful for amplifying ITS sequences from most plant and fungal phyla. This obviated the need for primer design or prior sequence knowledge, meaning that ITS sequence data could be more readily obtained than perhaps any other nuclear marker.
• Simplicity. Nuclear ribosomal genes are constituents of individual 18S–5.8S–26S repeats, which typically are in the size range of about 10 kb. These repeats are tandemly reiterated at one or more chromosomal loci per haploid complement. Because there are hundreds to thousands of nuclear rDNA repeats in plant genomes, they are more easily isolated than most low-copy nuclear loci, requiring little experimental expertise to successfully amplify. In plants, ITS sequences vary in length from approximately 500–700 bp in angiosperms (Baldwin et al., 1995) to 1500–3700 bp in some gymnosperms (Bobola et al., 1992; Germano and Klein, 1999; Liston et al., 1996; Maggini et al., 2000; Marrocco et al., 1996). Excluding gymnosperms, both the high copy number and the small size of the target DNA fragment facilitate ITS amplification by PCR, even permitting the use of ancient material, herbarium specimens, and samples other than from living material (much as with cpDNA).
• Intragenomic uniformity. It has long been recognized that multigene families in general and ITS sequences in particular may be subject to a phenomenon termed concerted evolution (Ainouche and Bayer, 1997; Brochmann et al., 1996; Elder and Turner, 1995; Franzke and Mummenhoff, 1999; Fuertes Aguilar et al., 1999a; Hillis et al., 1991; Roelofs et al., 1997; Schlötterer and Tautz, 1994; Wendel et al., 1995; Zimmer et al., 1980, among others). Concerted evolution occurs when sequence differences among reiterated copies in the genome, which should be accumulating their own distinct mutations, become homogenized to the same sequence type by mechanisms such as high-frequency unequal crossing over or gene conversion. When carried to completion, this process eliminates both sequence variation within genomes and potentially confounding variation, leaving only species- and clade-specific character-state changes to inform phylogenetic reconstruction efforts.
• Intergenomic variability. An early observation was that ITS sequence variation levels are suitable for phylogenetic inference at the specific, generic or even family levels (Baldwin, 1992; Baldwin et al., 1995). Baldwin and others noted that the variation at hierarchical levels at which most phylogeneticists work (generic and sub-generic) is attributable mostly to nucleotide polymorphisms, but that insertion–deletion polymorphisms (indels) are also common. They further reported divergence values that ranged from 0 to 39% in pairwise comparisons between taxa, with 5–59% of these being potentially phylogenetically informative (Baldwin et al., 1995).
• Low functional constraint. It was thought that since the ITS sequences are removed via splicing during transcript processing, they would be subject to reasonably mild functional constraints, which in turn would offer a preponderance of nucleotide sites that would evolve essentially neutrally. The functionality of ITS is related to specific cleavage of the primary transcript within ITS-1 and ITS-2 during maturation of the small subunit (SSU), 5.8S, and the large subunit (LSU) ribosomal RNAs (Hadjiolova et al., 1984, Hadjiolova et al., 1994; Musters et al., 1990; Nashimoto et al., 1988; Veldman et al., 1981; van Nues et al., 1994). Although this maturation and splicing process depends on the secondary structure of ITS, implying some degree of conservation at the sequence or at the structure level (Mai and Coleman, 1997), the presumption of limited functional constraint was widely adopted and further justified by observations of extensive nucleotide and length variation.
The foregoing list of properties constitute an impressive set of advantages for experimental design, and so it is not surprising that ITS-based phylogenetics rapidly grew in popularity. Given the prevalence of ITS sequence data in plant phylogenetic analyses, it seems prudent to pause and reflect upon these and other molecular evolutionary properties that are relevant to its utilization. We were motivated by the realization that the several advantages noted above may be counterbalanced by phenomena that are expected to confound phylogenetic analyses, and that a deeper understanding of process might lead to enhanced insight into evolutionary history. Some of the relevant phenomena have been revealed or informed by the actual process of phylogenetic inference, where unexpected results or incongruent topologies were recovered. We review here molecular genetic processes that impact ITS sequence variation, drawing attention to the implications for phylogenetic inference.
Section snippets
Multiple rDNA arrays
It has long been known that 18S–26S rDNA arrays and their RNA products constitute an essential component of eukaryotic NORs (nucleolus organizing regions). The number and distribution of NOR loci in eukaryotic genomes are variable, as is their size (Brown et al., 1993; Panzera et al., 1996; Pedersen and Linde-Laursen, 1994; Tartof and Dawid, 1976; Vanzela et al., 1998; Worton et al., 1988). Moreover, the number and genomic location of NOR arrays is evolutionary labile within families and genera
Conclusions
Our purpose is writing this review was to illuminate and bring attention to the many molecular evolutionary and organism-level processes that may impact sequence variation for ITS repeats in plants, and thereby hopefully contribute to a more informed utilization in phylogenetic analyses. Although we have discussed a rather lengthy list of phenomena that may generate intragenomic sequence variation, to a certain extent the problem this creates for phylogenetic analysis is the same regardless of
Acknowledgments
Financial support was provided by the National Science Foundation and the Spanish Ministry of Education, Culture and Sports.
References (181)
- et al.
Potential phylogenetic utility of the low-copy nuclear gene pistillata in dicotyledonous plants: Comparison to nrDNA ITS and trnL intron in Sphaerocardamum and other Brassicaceae
Mol. Phylogenet. Evol.
(1999) Phylogenetic utility of the internal transcribed spacers of nuclear ribosomal DNA in plants: An example from the Compositae
Mol. Phylogenet. Evol.
(1992)- et al.
Origin of Macaronesian Sideritis L. (Lamioideae: Lamiaceae) inferred from nuclear and chloroplast sequence datasets
Mol. Phylogenet. Evol.
(2002) - et al.
Phylogeny of Poaceae subfamily Pooideae based on chloroplast ndhF gene sequences
Mol. Phylogenet. Evol.
(1997) - et al.
Variation in the nrDNA ITS of Pinus subsection Cembroides: Implications for molecular systematic studies of pine species complexes
Mol. Phylogenet. Evol.
(2001) - et al.
How slippage-derived sequences are incorporated into rRNA variable-region secondary structure: Implications for phylogeny reconstruction
Mol. Phylogenet. Evol.
(2000) PCR with 7-deaza-2’-deoxyguanosine triphosphate
- et al.
Molecular evolution and phylogenetic utility of the chloroplast rpl16 intron in Chusquea and the Bambusoideae (Poaceae)
Mol. Phylogenet. Evol.
(1997) - et al.
Three nonorthologous ITS1 types are present in a polypore fungus Trichaptum abietinum
Mol. Phylogenet. Evol.
(2002) - et al.
Phylogeny, biogeography and processes of molecular differentiation in Quercus subgenus Quercus (Fagaceae)
Mol. Phylogenet. Evol.
(1999)