- Split View
-
Views
-
Cite
Cite
Monique Turmel, Christian Otis, Claude Lemieux, The Chloroplast Genome Sequence of Chara vulgaris Sheds New Light into the Closest Green Algal Relatives of Land Plants, Molecular Biology and Evolution, Volume 23, Issue 6, June 2006, Pages 1324–1338, https://doi.org/10.1093/molbev/msk018
- Share Icon Share
Abstract
The phylum Streptophyta comprises all land plants and six monophyletic groups of charophycean green algae (Mesostigmatales, Chlorokybales, Klebsormidiales, Zygnematales, Coleochaetales, and Charales). Phylogenetic analyses of four genes encoded in three cellular compartments suggest that the Charales are sister to land plants and that charophycean green algae evolved progressively toward an increasing cellular complexity. To validate this phylogenetic hypothesis and to understand how and when the highly conservative pattern displayed by land plant chloroplast DNAs (cpDNAs) originated in the Streptophyta, we have determined the complete chloroplast genome sequence (184,933 bp) of a representative of the Charales, Chara vulgaris, and compared this genome to those of Mesostigma (Mesostigmatales), Chlorokybus (Chlorokybales), Staurastrum and Zygnema (Zygnematales), Chaetosphaeridium (Coleochaetales), and selected land plants. The phylogenies we inferred from 76 cpDNA-encoded proteins and genes using various methods favor the hypothesis that the Charales diverged before the Coleochaetales and Zygnematales. The Zygnematales were identified as sister to land plants in the best tree topology (T1), whereas Chaetosphaeridium (T2) or a clade uniting the Zygnematales and Chaetosphaeridium (T3) occupied this position in alternative topologies. Chara remained at the same basal position in trees including more land plant taxa and inferred from 56 proteins/genes. Phylogenetic inference from gene order data yielded two most parsimonious trees displaying the T1 and T3 topologies. Analyses of additional structural cpDNA features (gene order, gene content, intron content, and indels in coding regions) provided better support for T1 than for the topology of the above-mentioned four-gene tree. Our structural analyses also revealed that many of the features conserved in land plant cpDNAs were inherited from their green algal ancestors. The intron content data predicted that at least 15 of the 21 land plant group II introns were gained early during the evolution of streptophytes and that a single intron was acquired during the transition from charophycean green algae to land plants. Analyses of genome rearrangements based on inversions predicted no alteration in gene order during the transition from charophycean green algae to land plants.
Introduction
More than 470 MYA, green algae belonging to the class Charophyceae emerged from their aquatic habitat to colonize the land (Kenrick and Crane 1997; Graham, Cook, and Busse 2000; Lewis and McCourt 2004; Sanderson et al. 2004), giving rise to more than 500,000 land plant species currently found on our planet. In contrast to the large diversity of land plants, only a few thousand charophycean species are living today. These algae exhibit great variability in cellular organization and reproduction (Lewis and McCourt 2004), and together with the land plants, they form the green plant lineage Streptophyta (Bremer et al. 1987). Most, if not all, of the other extant green algae (more than 10,000 species) belong to the sister lineage Chlorophyta (Lewis and McCourt 2004). Six monophyletic groups of charophycean green algae are currently recognized: the Mesostigmatales (Karol et al. 2001), Chlorokybales, Klebsormidiales, Zygnematales, Coleochaetales, and Charales (Mattox and Stewart 1984), given here in order of increasing cellular complexity. Whether the unicellular flagellate Mesostigma viride (Mesostigmatales) belongs to the Streptophyta remains controversial: some phylogenetic analyses placed this alga at the base of the Streptophyta (Bhattacharya et al. 1998; Marin and Melkonian 1999; Karol et al. 2001; Martin et al. 2002) and others before the divergence of the Chlorophyta and Streptophyta (Lemieux, Otis, and Turmel 2000; Turmel et al. 2002; Turmel, Otis, and Lemieux 2002a; Martin et al. 2005).
On the basis of morphological characters, the Charales or the Coleochaetales have been proposed to share a sister relationship with land plants (Qiu and Palmer 1999; Chapman and Waters 2002). Recent analyses of the combined sequences of four genes from the nucleus (18S rRNA gene), chloroplast (atpB and rbcL), and mitochondria (nad5) of 25 charophycean green algae, eight land plants, and five chlorophytes revealed that the Charales and land plants form a highly supported clade; however, moderate bootstrap support was observed for the positions of the other charophycean groups (Karol et al. 2001; McCourt, Delwiche, and Karol 2004). The best trees inferred by Bayesian inference and maximum likelihood (ML) in this four-gene analysis support an evolutionary trend toward increasing cellular complexity (McCourt, Delwiche, and Karol 2004). All previously reported phylogenies of charophycean green algae were reconstructed from a smaller number of genes, showed poor resolution, and yielded conflicting topologies, thus providing no conclusive information regarding the branching order of charophycean lineages and their specific relationships with land plants (Qiu and Palmer 1999; Chapman and Waters 2002; McCourt, Delwiche, and Karol 2004).
To unravel the phylogenetic relationships among charophycean lineages and to understand how and when the highly conservative pattern displayed by land plant chloroplast DNAs (cpDNAs) originated in the Streptophyta, we have undertaken the sequencing of the chloroplast genome from representatives of all charophycean lineages. We have reported thus far the cpDNA sequences of Mesostigma (Mesostigmatales) (Lemieux, Otis, and Turmel 2000), Chaetosphaeridium globosum (Coleochaetales) (Turmel, Otis, and Lemieux 2002b), Staurastrum punctulatum, and Zygnema circumcarinatum (Zygnematales) (Turmel, Otis, and Lemieux 2005). Comparative analyses of Mesostigma cpDNA (137 genes, no introns) with its land plant counterparts (110–120 genes, about 20 introns) revealed that the chloroplast genome underwent substantial changes in its architecture during the evolution of streptophytes (namely gene losses, intron insertions, and scrambling in gene order). At the levels of gene content (125 genes), intron composition (18 introns), and gene order, Chaetosphaeridium cpDNA is highly similar to land plant cpDNAs. Mesostigma, Chaetosphaeridium, and most land plant cpDNAs exhibit a quadripartite structure that is characterized by the presence of two copies of an rRNA-containing inverted repeat (IR) separated by large single-copy (LSC) and small single-copy (SSC) regions. All the genes they have in common, with a few exceptions, reside in corresponding genomic regions. Although the chloroplast genomes of the zygnematalean algae Staurastrum and Zygnema closely resemble their Chaetosphaeridium and bryophyte counterparts at the gene content level, they feature substantial differences in overall structure, gene order, and intron content (8 and 13 introns in Staurastrum and Zygnema, respectively). Like the partially characterized cpDNA of Spirogyra maxima (Manhart, Hoshaw, and Palmer 1990), they lack a large IR, suggesting that loss of one copy of the IR sequence occurred very early during the evolution of this group of charophycean green algae. From these observations, we inferred that the chloroplast genome of the last common ancestor of Staurastrum and Zygnema carried at least 16 of the approximately 20 introns found in land plant cpDNAs and bore more similarity to its land counterparts than to either zygnematalean cpDNA at the gene order level.
In the present study, we report the complete cpDNA sequence of Chara vulgaris (Charales). We have compared this genome sequence with that of Chlorokybus atmophyticus (Chlorokybales) and with those previously reported for the above-mentioned charophycean green algae and selected land plants. Our phylogenetic analyses of shared proteins and genes as well as our analyses of independent sets of structural genomic data provide robust support for the notion that the Charales occupy a basal position relative to both the Coleochaetales and Zygnematales, thus challenging our current view about the phylogeny of charophycean green algae. Our comparative analyses of structural genomic data also indicate that the chloroplast genome remained largely unchanged at the gene order, gene content, and intron composition levels during the transition from charophycean green algae to land plants.
Materials and Methods
DNA Cloning, Sequencing, and Sequence Analysis
Chara vulgaris was collected from a pond located in Quebec City (Quebec, Canada); a voucher (no. QFA468020) of this plant material is held at the herbarium Louis-Marie of Laval University. A random clone library was prepared from a fraction containing both cpDNA and mitochondrial DNA (Turmel, Otis, and Lemieux 2003). DNA templates were obtained with the QIAprep 96 Miniprep kit (Qiagen Inc., Mississauga, Canada) and sequenced as described previously (Turmel, Otis, and Lemieux 2005). Sequences were assembled using SEQUENCHER 4.1.1 (Gene Codes Corporation, Ann Arbor, Mich.). Genes were identified using a custom-built suite of bioinformatic tools allowing the automated execution of the three following steps: (1) open reading frames (ORFs) were found using GETORF in EMBOSS (Rice, Longden, and Bleasby 2000), (2) their translated products were identified by BlastP (Altschul et al. 1990) searches against a local database of cpDNA-encoded proteins or the nonredundant database at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/Blast/), and (3) consecutive 100-bp segments of the genome sequence were analyzed with BlastN and BlastX (Altschul et al. 1990) to identify genes. tRNA genes were localized using tRNAscan-SE (Lowe and Eddy 1997). Intron boundaries were determined by modeling intron secondary structures (Michel, Umesono, and Ozeki 1989; Michel and Westhof 1990) and by comparing intron-containing genes with intronless homologs using FRAMEALIGN of the Genetics Computer Group (Madison, Wis.) software (version 10.3) package. Repeated sequence elements were searched using GenAlyzer (Choudhuri et al. 2004) and Vmatch (http://www.vmatch.de/).
Phylogenetic Reconstructions from cpDNA Sequences
A data set of 76 concatenated protein sequences derived from the chloroplast genomes of Chara (this study), Mesostigma (Lemieux, Otis, and Turmel 2000), Chlorokybus (M. Turmel, C. Otis, and C. Lemieux, unpublished data), Staurastrum (Turmel, Otis, and Lemieux 2005), Zygnema (Turmel, Otis, and Lemieux 2005), Chaetosphaeridium (Turmel, Otis, and Lemieux 2002b), Marchantia polymorpha (Ohyama et al. 1986), Anthoceros formosae (Kugita et al. 2003), and Physcomitrella patens (Sugiura et al. 2003) was prepared as described previously (Turmel, Otis, and Lemieux 2003). In addition, two nucleotide data sets containing the genes for these proteins and differing only by the presence/absence of third codon positions were prepared. To obtain the data set with all three codon positions, the multiple sequence alignment of each protein was converted into a codon alignment, the poorly aligned and divergent regions in each codon alignment were excluded using GBLOCKS 0.91b (Castresana 2000) with the −t = c option, and the individual codon alignments were concatenated. Third codon positions were excluded from the latter data set with PAUP* 4.0b10 (Swofford 2002). Using the same methods, we also generated amino acid and nucleotide data sets with a broader representation of land plants by including the 56 protein-coding genes shared by the nine above-mentioned streptophytes, the lycophyte Huperzia lucidula (Wolf et al. 2005), and the 23 other tracheophyte taxa examined by Leebens-Mack et al. (2005).
The amino acid data sets were used to reconstruct phylogenies with ML, maximum parsimony (MP), and distance methods. ML trees were computed with PHYML 2.4.4 (Guindon and Gascuel 2003) under the cpREV45 + Γ + I model (Adachi et al. 2000). Bootstrap support for each node was calculated using 100 replicates. The confidence limits of the alternative tree topologies recovered in analyses of the 76-protein data set were evaluated under the cpREV45 + Γ + I model using the Shimodaira-Hasegawa test as implemented in CODEML 3.14 (Yang 1997). Support of the individual proteins for alternative topologies was estimated with CODEML 3.14 using the MGENE = 1 option. MP trees and ML-distance trees were inferred using PROTPARS and NEIGHBOR, respectively, in PHYLIP 3.63 (Felsenstein 1995). The ML distances were computed with PUZZLEBOOT 1.03 and Tree-Puzzle 5.2 (Strimmer and von Haeseler 1996) under the cpREV45 + Γ + I model. Robustness of MP and distance trees was assessed by bootstrap percentages after 100 replications.
ML, MP, and ML-distance trees were inferred from each nucleotide data set using PAUP* 4.0b10. Trees were searched with the full heuristic option, and optimization was performed by branch swapping using tree bisection and reconnection. For ML analyses, starting trees were obtained by the stepwise addition of sequences. For MP and ML-distance analyses, starting trees were obtained by the stepwise random addition of sequences with one tree held per addition; 10 replications of the addition procedure were performed. ML and ML-distance trees were constructed under the GTR + Γ + I model, a model that was selected by Modeltest 3.6 (Posada and Crandall 1998) as the one best fitting our nucleotide data. Confidence of branch points was estimated by 1,000 bootstrap replications in all analyses. Shimodaira-Hasegawa tests of alternative tree topologies were carried out using PAUP* 4.0b10.
LogDet-distance trees were computed using PAUP* 4.0b10 with the neighbor-joining search setting. The LogDet distances were calculated with LDDist (Thollesson 2004), and the proportion of invariant sites was estimated using the capture-recapture method of Steel, Huson, and Lockhart (2000). Confidence of branch points was estimated by 1,000 bootstrap replications.
Relative rate tests were performed with RRTree (Robinson-Rechavi and Huchon 2000) using the 76-gene data set containing all three codon positions and the topology of the best ML tree. The three land plant sequences and the two zygnematalean sequences were treated as single lineages, and the Chlorokybus sequence served as an outgroup. The number of synonymous substitutions per synonymous site (Ks) and the number of nonsynonymous substitutions per nonsynonymous site (Ka) were compared.
Analyses of Gene Order Data
The GRIMM Web server (Tesler 2002) was used to infer the number of gene permutations by inversions in pairwise comparisons of green algal cpDNAs. For comparisons involving two IR-containing genomes, genes within one of the two copies of the IR were excluded from the data set, and the SSC and LSC + IR regions were considered as two separate chromosomes. The SSC and LSC regions were assumed to be independent from one another because the conserved gene-partitioning pattern displayed by streptophyte genomes is not consistent with the occurrence of inversions spanning the SSC and LSC regions. For comparisons involving an IR-containing genome and an IR-lacking genome, genes within one copy of the IR were excluded, and no constraints were imposed on the inversion endpoints. Phylogenies based on inversion medians were inferred with GRAPPA 2.0 (http://www.cs.unm.edu/∼moret/GRAPPA/) using the algorithm of Caprara (2003) and a data set of 113 gene positions.
Analyses of Other Structural Genomic Data
MacClade 4.06 (D. Maddison and W. Maddison 2000) was used to generate matrices of gene and intron contents, to trace the encoded characters on tree topologies, and to calculate tree lengths under the Dollo parsimony method (Farris 1977). The presence of a gene, the presence of a pseudogene, and the absence of a gene were coded with values of 2, 1, and 0, respectively. The presence/absence of an intron was coded as binary characters. Gene-coding regions carrying unambiguous insertions/deletions were identified by visual inspection of the nucleotide and amino acid alignments, and indels were coded as unordered, binary characters. The resulting binary data matrix was subjected to parsimony analysis using PAUP* 4.0b10.
Results
Features of Chara cpDNA
Chara cpDNA consists of a circular molecule of 184,933 bp with the typical quadripartite structure found in streptophyte cpDNAs (fig. 1). Table 1 compares the main features of this genome with those of its counterparts in Mesostigma, other charophycean green algae, and a representative of land plants, the bryophyte Marchantia (a liverwort). Most land plant chloroplast genomes that have been completely sequenced to date closely resemble Marchantia cpDNA in gene content, intron content, and gene order. Among all streptophyte chloroplast genome sequences determined thus far, that of Chara has the largest size, the longest single-copy regions, and the highest A + T content. Its increased size is mainly accounted for by expanded intergenic spacers and introns representing 38.8% and 13.4% of the total genome, respectively. The average size of intergenic spacers in Chara cpDNA is 521 bp versus 223 bp and 178 bp in Chaetosphaeridium and Marchantia cpDNAs, respectively, whereas the average intron size is 1,372 bp versus 686 bp and 650 bp in Chaetosphaeridium and Marchantia, respectively. Probing of Chara cpDNA with Vmatch and GenAlyzer revealed that dispersed repeats ≥25 bp are virtually absent. Both intergenic spacers and introns from Chara are richer in A + T (82.9% and 78.5%, respectively) than their homologs in Chaetosphaeridium (79.8% and 77.6%) and Marchantia (80.6% and 76.8%).
Feature . | Mesostigma . | Chlorokybus . | Chara . | Chaetosphaeridium . | Zygnema . | Staurastrum . | Marchantia . |
---|---|---|---|---|---|---|---|
Sizea (bp) | |||||||
IR | 6,057 | 7,640 | 10,919 | 12,431 | — | — | 10,058 |
SSC | 22,619 | 27,876 | 27,280 | 17,639 | — | — | 19,813 |
LSC | 83,627 | 109,098 | 135,815 | 88,682 | — | — | 81,095 |
Total genome | 118,360 | 152,254 | 184,933 | 131,183 | 165,372 | 157,089 | 121,024 |
A + T content (%) | 69.9 | 61.8 | 73.8 | 70.4 | 68.9 | 67.5 | 71.2 |
Gene contentb | 137 | 138 | 127 | 125 | 125 | 121 | 120 |
Introns | |||||||
Group I | 0 | 1 | 2 | 1 | 1 | 1 | 1 |
Group II | |||||||
cis-Spliced | 0 | 0 | 15 | 16 | 11 | 6 | 18 |
trans-Spliced | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
Gene order relative to Marchantia | |||||||
No. of gene clusters | 22 | 19 | 14 | 12 | 20 | 22 | — |
No. of genes in clusters | 95 | 87 | 113 | 116 | 82 | 101 | — |
No. of inversions | 38 | 47 | 17 | 12 | 56 | 34 | — |
Feature . | Mesostigma . | Chlorokybus . | Chara . | Chaetosphaeridium . | Zygnema . | Staurastrum . | Marchantia . |
---|---|---|---|---|---|---|---|
Sizea (bp) | |||||||
IR | 6,057 | 7,640 | 10,919 | 12,431 | — | — | 10,058 |
SSC | 22,619 | 27,876 | 27,280 | 17,639 | — | — | 19,813 |
LSC | 83,627 | 109,098 | 135,815 | 88,682 | — | — | 81,095 |
Total genome | 118,360 | 152,254 | 184,933 | 131,183 | 165,372 | 157,089 | 121,024 |
A + T content (%) | 69.9 | 61.8 | 73.8 | 70.4 | 68.9 | 67.5 | 71.2 |
Gene contentb | 137 | 138 | 127 | 125 | 125 | 121 | 120 |
Introns | |||||||
Group I | 0 | 1 | 2 | 1 | 1 | 1 | 1 |
Group II | |||||||
cis-Spliced | 0 | 0 | 15 | 16 | 11 | 6 | 18 |
trans-Spliced | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
Gene order relative to Marchantia | |||||||
No. of gene clusters | 22 | 19 | 14 | 12 | 20 | 22 | — |
No. of genes in clusters | 95 | 87 | 113 | 116 | 82 | 101 | — |
No. of inversions | 38 | 47 | 17 | 12 | 56 | 34 | — |
Because Staurastrum and Zygnema cpDNAs lack an IR, only the genome size is given for each of these cpDNAs.
Unique ORFs, intron ORFs, and pseudogenes were not taken into account. Note that Chaetosphaeridium tufA was considered to be a functional gene.
Feature . | Mesostigma . | Chlorokybus . | Chara . | Chaetosphaeridium . | Zygnema . | Staurastrum . | Marchantia . |
---|---|---|---|---|---|---|---|
Sizea (bp) | |||||||
IR | 6,057 | 7,640 | 10,919 | 12,431 | — | — | 10,058 |
SSC | 22,619 | 27,876 | 27,280 | 17,639 | — | — | 19,813 |
LSC | 83,627 | 109,098 | 135,815 | 88,682 | — | — | 81,095 |
Total genome | 118,360 | 152,254 | 184,933 | 131,183 | 165,372 | 157,089 | 121,024 |
A + T content (%) | 69.9 | 61.8 | 73.8 | 70.4 | 68.9 | 67.5 | 71.2 |
Gene contentb | 137 | 138 | 127 | 125 | 125 | 121 | 120 |
Introns | |||||||
Group I | 0 | 1 | 2 | 1 | 1 | 1 | 1 |
Group II | |||||||
cis-Spliced | 0 | 0 | 15 | 16 | 11 | 6 | 18 |
trans-Spliced | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
Gene order relative to Marchantia | |||||||
No. of gene clusters | 22 | 19 | 14 | 12 | 20 | 22 | — |
No. of genes in clusters | 95 | 87 | 113 | 116 | 82 | 101 | — |
No. of inversions | 38 | 47 | 17 | 12 | 56 | 34 | — |
Feature . | Mesostigma . | Chlorokybus . | Chara . | Chaetosphaeridium . | Zygnema . | Staurastrum . | Marchantia . |
---|---|---|---|---|---|---|---|
Sizea (bp) | |||||||
IR | 6,057 | 7,640 | 10,919 | 12,431 | — | — | 10,058 |
SSC | 22,619 | 27,876 | 27,280 | 17,639 | — | — | 19,813 |
LSC | 83,627 | 109,098 | 135,815 | 88,682 | — | — | 81,095 |
Total genome | 118,360 | 152,254 | 184,933 | 131,183 | 165,372 | 157,089 | 121,024 |
A + T content (%) | 69.9 | 61.8 | 73.8 | 70.4 | 68.9 | 67.5 | 71.2 |
Gene contentb | 137 | 138 | 127 | 125 | 125 | 121 | 120 |
Introns | |||||||
Group I | 0 | 1 | 2 | 1 | 1 | 1 | 1 |
Group II | |||||||
cis-Spliced | 0 | 0 | 15 | 16 | 11 | 6 | 18 |
trans-Spliced | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
Gene order relative to Marchantia | |||||||
No. of gene clusters | 22 | 19 | 14 | 12 | 20 | 22 | — |
No. of genes in clusters | 95 | 87 | 113 | 116 | 82 | 101 | — |
No. of inversions | 38 | 47 | 17 | 12 | 56 | 34 | — |
Because Staurastrum and Zygnema cpDNAs lack an IR, only the genome size is given for each of these cpDNAs.
Unique ORFs, intron ORFs, and pseudogenes were not taken into account. Note that Chaetosphaeridium tufA was considered to be a functional gene.
At both the levels of gene content and intron composition, Chara cpDNA bears most similarity with Chaetosphaeridium cpDNA (table 1). With 127 genes, its gene repertoire is slightly larger than those of Chaetosphaeridium, Staurastrum, Zygnema, and Marchantia cpDNAs. These four charophycean genomes differ for the presence/absence of 13 genes (Supplementary Table S1, Supplementary Material online), and interestingly, Chara cpDNA features four genes (rpl12, trnL(gag), rpl19, and ycf20) that are entirely missing from the three other charophycean cpDNAs and land plant cpDNAs. Considering that rpl12 is also absent from Mesostigma cpDNA, Chara is the first streptophyte in which this chloroplast gene is reported. With regards to tufA, our results are consistent with previous reports indicating that this chloroplast gene is functional in Charales (Baldauf, Manhart, and Palmer 1990) but not in Coleochaetales (Baldauf, Manhart, and Palmer 1990; Turmel, Otis, and Lemieux 2002b). In contrast to its Chaetosphaeridium and Coleochaete counterparts, the tufA sequence in Chara cpDNA shows substantial conservation along its entire length with its Mesostigma and chlorophyte counterparts (data not shown). As in Chaetosphaeridium cpDNA, a total of 18 introns reside in Chara cpDNA; however, they are not all inserted at common sites in the two genomes (fig. 2). Chara cpDNA lacks three group II introns (in petD, rpoC1, and ycf66) found in Chaetosphaeridium, Zygnema, and land plant cpDNAs and exhibits one group II intron (in ycf3 at site 124) missing in its Chaetosphaeridium counterpart but present in zygnematalean and land plant cpDNAs. Moreover, in contrast to Chaetosphaeridium cpDNA whose introns all have homologs in land plant cpDNAs, two of the Chara chloroplast introns, the group II intron in atpF at site 138 and the group I intron in rrl at site 2500, have not been previously identified in streptophyte cpDNAs. The latter rrl intron, however, has an homolog at the same position within the large subunit rRNA gene of Chara mitochondrial DNA (Turmel, Otis, and Lemieux 2003) as well as in the cpDNAs and mitochondrial DNAs of several chlorophyte green algae (Côté et al. 1993; Turmel et al. 1993; Turmel, Mercier, and Côté 1993; Turmel et al. 1999).
At the level of gene organization, Chara cpDNA differs substantially from its land plant and Chaetosphaeridium counterparts even though it is much less scrambled than zygnematalean cpDNAs. Relative to Marchantia cpDNA, it is more rearranged than is Chaetosphaeridium cpDNA. Using GRIMM, we estimated that 17 and 12 inversions would be required to convert the gene orders of Chara and Chaetosphaeridium cpDNAs, respectively, into that of Marchantia cpDNA (table 1). As 23 inversions would be necessary to interconvert the gene orders of the Chara and Chaetosphaeridium cpDNAs, each charophycean genome more closely resembles its Marchantia counterpart. By searching for the presence of all possible gene pairs with conserved polarities in Marchantia and charophycean genomes, we found that Chara and Marchantia share only three gene pairs (3′ndhF-5′trnN(guu), 3′psbA-5′trnH(gug), and 3′trnY(gua)-5′trnD(guc)) that are not present in all other charophyceans, that Chaetosphaeridium and Chara shares specifically six gene pairs (3′atpA-3′trnR(ucu), 3′cemA-5′petA, 3′chlB-5′trnK(uuu), 3′psaI-5′ycf4, 3′psbZ-5′trnG(gcc), and 3′ycf3-5′psaA), and that Marchantia and Straurastrum/Zygnema shares specifically three gene pairs (3′trnV(uac)-5′ndhC, 5′trnS(gga)-5′ycf3, and 3′trnH(gug)-5′ftsH).
Phylogenetic Analyses of Chloroplast Sequences
Various phylogenetic inference methods were used to analyze an amino acid data set (16,024 sites) and two nucleotide data sets (codons excluding third positions, 34,370 sites; all three codon positions, 51,555 sites) that were derived from the 76 protein-coding genes common to the cpDNAs of Mesostigma, Chlorokybus, Zygnema, Staurastrum, Chaetosphaeridium, and the bryophytes Marchantia (liverwort), Anthoceros (hornwort), and Physcomitrella (moss). In all analyses, the Mesostigma sequences served as an outgroup to root the trees. As shown in figure 3, all three best ML trees share the same branching order for the charophycean green algae. It can be seen that the clade formed by the two zygnematalean species occupies a sister position relative to the land plants, that the representative of the Coleochaetales emerges just before the zygnematalean lineage, that Chara is basal relative to the zygnematalean and coleochaetalean lineages, and that Chlorokybus is the first lineage emerging after the outgroup. A bootstrap support value of 100% is found at all nodes describing these relationships, except that identifying the Zygnematales as sister to land plants where bootstrap values vary from 81% to 97%. Among the alternative topologies recovered from the analysis of the amino acid data set, the sister group to land plants is either Chaetosphaeridium or a clade consisting of Chaetosphaeridium and the two zygnematalean green algae. In the single alternative topology recovered from the analysis of the nucleotide data sets, Chaetosphaeridium is sister to land plants.
The distance and MP methods yielded protein and gene trees highly congruent with ML trees and also revealed that the alternative topologies for the charophycean green algae differ only in the relative positions of Chaetosphaeridium and the zygnematalean green algae (fig. 3). The most notable difference between the trees inferred from the three data sets concern the relative positions of the bryophytes. The earliest diverging bryophyte is Anthoceros in the protein trees, either Physcomitrella or Anthoceros in the gene trees inferred from the first two codon positions, and Marchantia in the gene trees inferred from all three codon positions. This instability is not surprising considering that the relative branching order of the moss, hornwort, and liverwort lineages is currently a problematic issue in phylogenetic studies of land plants (Nishiyama et al. 2004; Shaw and Renzaglia 2004; Goremykin and Hellwig 2005). The addition of 32 RNA-coding genes (the three rRNA and 29 tRNA genes) to the data set containing all three codon positions improved slightly the resolution of charophyte lineages but had no significant effect on the resolution of bryophyte lineages (data not shown).
The two alternative topologies differing in the branching order of charophycean green algae were evaluated using the likelihood-based statistical test of Shimodaira-Hasegawa (Shimodaira and Hasegawa 1999). When this test is applied to the amino acid data set and to the nucleotide data set featuring the first two codon positions, it rejects the hypothesis that the Coleochaetales are sister to land plants (T2) with P values of 0.043 and 0.049, respectively; however, it does not reject it when the nucleotide data set containing all three codon positions is used. The alternative hypothesis that a clade formed by the Coleochaetales and Zygnematales is sister to land plants (T3) is rejected by analyses of the two nucleotide data sets with P values of 0.003 (first two codon positions) and 0.011 (all three codon positions) but not by the analysis of the amino acid data set. We identified no conflict among the proteins in the amino acid data set with respect to their phylogenetic signals; 68 of these proteins favor a single topology among the three detected but are unable to reject either one or both of the other topologies (Supplementary Table S2, Supplementary Material online). A total of 27 individual proteins support the Zygnematales as being the sister to land plants (best topology or T1), 18 proteins support T2, and 23 proteins support T3. The remaining eight proteins equally support the three topologies.
Because most conventional methods of phylogeny reconstructions yield the true topology only when the sequences evolved under a stationary Markov process (Gu and Li 1996), heterogeneity in base and amino acid composition may lead to incorrect topologies (Steel, Lockhart, and Penny 1993; Lockhart et al. 1994). To overcome this potential problem and to also minimize potential problems of phylogeny reconstruction due to heterotachy (Lopez, Casane, and Philippe 2002) or among-lineage heterogeneity (Kolaczkowski and Thornton 2004; Spencer, Susko, and Roger 2005), we used LogDet distances (Lake 1994; Lockhart et al. 1994; Steel 1994; Gu and Li 1996). LogDet-distance analysis of each of the three data sets analyzed identified the same branching order for the charophycean green algae as did ML, MP, or ML-distance analyses (fig. 3).
The long-branch attraction phenomenon can also lead to erroneous phylogenies (Felsenstein 1978). This reconstruction artifact tends to cluster long branched but otherwise unrelated taxa in the inferred phylogeny. Although there is no reliable way to determine whether a phylogenetic analysis is plagued with this problem, two sets of results provide no indication that the long-branch attraction phenomenon might be a cause of errors in our phylogeny reconstructions. First, relative rate tests using the nucleotide data set containing all three codon positions and Chlorokybus as an outgroup detected no significant difference (P < 0.05) in substitution rates among streptophyte lineages (data not shown). Ks was found to be saturated in all six pairwise comparisons, and there was no significant variation in Ka in all comparisons. Second, removal of the nine fastest evolving proteins from our amino acid data set (i.e., those with a specific rate of evolution >3.50 relative to AtpA, see Supplementary Table S2, Supplementary Material online) had no effect on the best tree topology recovered in analyses with the ML, MP, ML-distance, and LogDet-distance methods (data not shown). The use of slowly evolving sequences is one of the three approaches proposed by Philippe, Lartillot, and Brinkmann (2005) to reduce the impact of potential long-branch attraction artifacts.
To explore whether the restricted representation of land plants in our phylogenies might have lead to misleading topologies, we analyzed using ML and LogDet-distance methods an amino acid data set (11,025 sites) and a nucleotide data set (22,712 sites, first two codon positions) that were supplemented with 24 tracheophyte taxa (fig. 4). In all analyses, the positions of Chlorokybus and Chara remained identical to those shown in figure 3 and received robust support. The ML trees provided support for the Zygnematales being sister to all land plants (fig. 4A and C), whereas the LogDet-distance analyses favored a clade uniting Chaetosphaeridium and the Zygnematales as sister to all land plants (fig. 4B and D). In agreement with previously reported chloroplast phylogenies (Nishiyama et al. 2004; Goremykin and Hellwig 2005), the branching order of the three bryophyte lineages varied depending upon the method of analysis and the data set used. Bryophytes were paraphyletic in the ML gene tree (fig. 4C) and monophyletic in the ML protein tree (fig. 4A) and the two LogDet trees (fig. 4B and D). Regarding the relationships among tracheophytes, the ML trees were consistent with recently published chloroplast phylogenies (Goremykin et al. 2005; Leebens-Mack et al. 2005; Wolf et al. 2005).
Testing Phylogenetic Hypotheses Using Structural Genomic Features
The availability of complete chloroplast genome sequences offers the opportunity to test phylogenetic hypotheses using structural genomic features. We analyzed four independent data sets of structural genomic features (gene order, gene content, intron content, and insertions/deletions in gene-coding regions) to evaluate the branching orders of charophycean green algae inferred from the 76 chloroplast proteins and genes analyzed in the present study and from the four genes encoded by the nuclear, chloroplast, and mitochondrial genomes in the study of Karol et al. (2001).
We used a gene order data set featuring all chloroplast genes common to Marchantia, Mesostigma, and the charophycean algae to infer a phylogeny based on inversion medians using GRAPPA and the algorithm of Caprara (2003). A single land plant taxon was selected because this type of analysis is computationally intensive and feasible with only a limited number of taxa and also because the three bryophytes used in our phylogenetic analyses of cpDNA sequences exhibit very similar gene orders. Although GRAPPA can use breakpoint medians to compute trees, we chose inversion medians because this method has been shown to greatly outperform breakpoint medians in phylogeny reconstruction (Moret et al. 2002). We recovered two best trees with 144 inversion medians that display the T1 and T3 topologies (fig. 5). A separate GRAPPA analysis constrained to the topology of the four-gene tree (T4) yielded a tree featuring nine extra inversions (fig. 5). The difference in length between the latter inversion tree and that carrying the T1 topology is more pronounced when we consider only the portions featuring the crown taxa (i.e., Marchantia, the zygnematalean green algae, Chaetosphaeridium, and Chara); 106 inversions are then scored for the tree compatible with T1 versus 125 inversions for that compatible with T4.
Thirty-six of the 144 chloroplast genes predicted to have been present in the common ancestor of Mesostigma and streptophytes sustained losses during the evolution of streptophytes. Mapping of these genes on the T1 and T4 topologies revealed that 27 were lost only once in T1 (fig. 6) and that 23 sustained single losses in T4, whereas all remaining genes were lost on two or more occasions. To determine which of these topologies is the most parsimonious in term of predicted gene losses, we analyzed a gene content data set in which we coded the presence/absence of a gene or pseudogene as three-state characters. The predicted scenario of gene losses inferred from T1 was found to be associated with 109 steps (fig. 6), whereas that inferred from T4 involves 19 additional steps. Five genes (tufA, rpl19, ycf20, trnL(gag), and rpl12) account for these extra steps (fig. 7). The trnL(gag), rpl19, and ycf20 genes are each lost once in T1 (two steps/gene) as compared to three times in T4 (six steps/gene), whereas rpl12 suffers three independent losses in T1 (six steps) and five losses in T4 (10 steps). Note that the highly divergent tufA sequence present in the Coleochaetales was considered to be a pseudogene in this analysis. In T1, conversion of the functional tufA gene present in Chara into a pseudogene (one step) supports the basal position of the Charales relative to the Coleochaetales, whereas the subsequent disappearance of this pseudogene before the emergence of the Zygnematales (one step) supports the latter lineage as being sister to land plants. On the other hand, in T4, tufA is associated with two losses (four steps) and is independently converted into a pseudogene (one step).
Figure 8 shows the scenarios of gains/losses of chloroplast group II introns predicted by T1 and T4. By coding the presence/absence of each group II intron as binary characters, we found that the pattern of intron gains/losses based on T1 is associated with 41 steps, whereas that based on T4 comprises two additional steps. Compared to T4, T1 predicts that group II introns were gained more progressively by the chloroplast genome during the evolution of streptophytes and that fewer introns underwent more than one loss event. In each scenario, 15 of the 20 introns common to charophycean green algae and land plants (with five introns specific to each scenario) took their origin before the emergence of the Charales, Coleochaetales, and Zygnematales. T1 also predicts that following their birth, chloroplast group II introns remained generally stable in all streptophyte lineages, with the exception of the zygnematalean lineages where losses of 16 introns were mapped in the common ancestor of Staurastrum and Zygnema and in the separate lineages leading to these green algae.
Figure 9 presents our phylogenetic analysis of insertions/deletions in protein- and RNA-coding genes. We identified a total of 131 loci carrying unambiguous insertions/deletions in our alignments of proteins and genes. MP analysis of these indels, 52 of which are phylogenetically informative, yielded a majority-rule consensus tree that is congruent with the T3 topology (fig. 9). The basal divergence of the Chara lineage relative to those occupied by Chaetosphaeridium, the Zygnematales, and the bryophytes is supported by 90% bootstrap value.
Discussion
Phylogenetic Relationships Among Charophycean Green Algae and Their Interrelationships with Land Plants
Although previous studies have provided robust support for the notion that the Charales, Coleochaetales, and Zygnematales diverged after the Klebsormidiales (Karol et al. 2001; Turmel et al. 2002), unequivocal resolution of the branching order of the former three charophycean lineages has been problematic. In the phylogenetic study reported here, which is based on 76 chloroplast genes/proteins from five charophycean green algae and three bryophytes and on 56 chloroplast genes/proteins from a broader taxon sampling including 24 tracheophytes, all methods of analysis yielded a best tree in which Chara is basal with respect to Chaetosphaeridium and the zygnematalean algae Staurastrum and Zygnema (figs. 3 and 4). Regardless of the method used, the zygnematalean lineage was sister to land plants (T1) in the best trees inferred from the 76-gene/protein data sets as well as in ML analyses of the 56-gene/protein data sets, whereas a clade consisting of Chaetosphaeridium and the two zygnematalean green algae showed a sister relationship with land plants (T3) in LogDet analyses of the latter data sets. In agreement with these results, best trees consistent with the T1 and T3 topologies were recovered in analyses of the chloroplast large and small subunit rRNA genes from 14 charophycean green algae and five bryophytes (Turmel et al. 2002); however, four alternative topologies not significantly different from the best tree, three of which showed the Charales as sister to land plants, were also recovered in these analyses. A phylogenetic study of GAPDH proteins also favors the notion that Chara is basal relative to the Coleochaetales and land plants, but no representative of the Zygnematales was included in this analysis (Petersen, Brinkmann, and Cerff 2003).
The multigene chloroplast phylogenies reported here are not congruent with the four-gene phylogeny inferred by Karol et al. (2001). Violations of the models used in these phylogenetic analyses may be the underlying cause of the conflicting results. The processes of sequence evolution assumed by current phylogeny inference programs may deviate substantially from those occurring in nature; these deviations include lineage-specific departures from an assumed common distribution of across-site rate variation (covarion evolution) and lineage-specific departures from an assumed symmetric substitution model (compositional heterogeneity) (Delsuc, Brinkmann, and Philippe 2005; Martin et al. 2005). Considering that only a few chloroplast genome sequences are currently available for charophycean green algae, artifacts in phylogeny reconstructions due to limited taxon sampling of these algae may also explain the incongruence between the multigene chloroplast phylogenies and the four-gene tree. Genome-scale phylogenetic studies with sparse taxon sampling are particularly susceptible to long-branch attraction (Soltis et al. 2004; Leebens-Mack et al. 2005; Philippe, Lartillot, and Brinkmann 2005). We found no evidence, however, that our 76-gene chloroplast phylogeny is artifactual. Methods that are insensitive to differences in base and protein composition (LogDet analyses) had no effect on the topology of the best tree recovered (fig. 3), no significant difference in the rate of nucleotide substitution was identified, and removal of rapidly evolving proteins from the amino acid data set in an attempt to attenuate potential problems of long-branch attraction revealed no change in the best topology. Selection of an outgroup more closely related to the taxa examined is another strategy that has been recommended to attenuate potential long-branch attraction problems (Malek et al. 1996; Philippe, Lartillot, and Brinkmann 2005), but unfortunately a taxon that meets this criterion is not available. Considering that Mesostigma may represent a lineage that predates the split of the Streptophyta and Chlorophyta (Lemieux, Otis, and Turmel 2000; Turmel, Otis, and Lemieux 2002a; Martin et al. 2005), we tried to use all currently available chlorophyte chloroplast genome sequences as outgroup and found that Chara still remains basal relative to Chaetosphaerium and the two zygnematalean green algae (data not shown). If we assume that sparse taxon sampling of charophycean green algae is the main cause of the conflict between our multigene chloroplast phylogenies and the four-gene tree of Karol et al. (2001), analysis of the four genes examined by these authors using a reduced taxon sampling comparable to that of the 76-gene analysis would be expected to position Chara before the divergence of the Coleochaetales and Zygnematales; however, the sister relationship of Chara and land plants was still observed (Supplementary Fig. S1, Supplementary Material online) in this analysis. It will be necessary to expand taxon sampling by including sequence data derived from additional charophycean chloroplast genomes to identify without any ambiguity the divergence order of the Charales, Coleochaetales, and Zygnematales.
In addition to the multigene chloroplast phylogenies reported here, our analyses of three independent sets of structural genomic data (gene order, gene content, and indels) provide robust support for the early emergence of Chara. Scenarios of genome rearrangements and gene losses inferred from T1 were found to be more parsimonious than those inferred from the topology of the four-gene tree (T4) (figs. 5–7). Phylogenetic inference from gene order data yielded two best trees with topologies corresponding to T1 and T3. Similarly, the gene content data did not allow us to distinguish unequivocally between the T1 and T3 hypotheses as the gene-loss scenarios inferred from these hypotheses are almost identical. In the gene-loss scenario compatible with T1, the tufA pseudogene is lost only once (in the common ancestor of land plants and the Zygnematales), whereas in the scenario compatible with T3, it is lost independently in the lineages leading to the Zygnematales and to the land plants. As observed for the gene content and gene order data, phylogenetic analysis of indels in coding regions favored the notion that Chara is basal relative to the Zygnematales and Chaetosphaeridium but failed to identify the precise branching orders of the Coleochaetales and Zygnematales relative to the Charales (fig. 9). In contrast, the intron content data provided weak support for the basal position of Chara (fig. 8) as the scenarios of intron gains/losses inferred from T1 and T4 are associated with about the same number of steps and invoke multiple intron losses due to intron instability in the Zygnematales.
The basal placement of Chara in our multigene chloroplast phylogenies is in apparent conflict with the mitochondrial genomic data currently available for Chara (Turmel, Otis, and Lemieux 2003) and Chaetosphaeridium (Turmel, Otis, and Lemieux 2002b). Chara mitochondrial DNA (mtDNA) more closely resembles its land plant counterparts at the levels of gene content, gene order, and intron composition than does Chaetosphaeridium mtDNA, and consistent with this structural comparison and the four-gene phylogeny, Chara affiliates robustly with land plants in phylogenetic analyses based on 23 mitochondrial genes and proteins (Turmel, Otis, and Lemieux 2003). Obviously, because no members of the Zygnematales were included in these analyses, it will be necessary to examine a representative of this lineage before drawing any firm conclusions about the branching order of charophycean lineages in the mitochondrial tree. Assuming that the divergence order of the Charales and Coleochaetales in this phylogeny was not affected by sparse taxon sampling, we propose the following explanation for the incongruence between our chloroplast phylogenies and both the mitochondrial and four-gene trees. Given that several cases of horizontal gene transfers have been recently documented for the mitochondria of land plants (Bergthorsson et al. 2003, 2004), we speculate that the difference in topology between our chloroplast and mitochondrial phylogenies might reflect the occurrence of such events in the charalean lineage. Similarly, if the mitochondrial nad5 gene experienced horizontal transfer, conflicting phylogenetic signals in the concatenated data set of chloroplast and mitochondrial gene sequences analyzed by Karol et al. (2001) might have generated an incorrect tree. Our ML analyses of two subsets of the four genes analyzed by these authors (i.e., nad5 alone and the atpB and rbcL gene pairs) indicate that the mitochondrial nad5 gene is mainly responsible for the high level of support observed for the clade uniting the Charales and land plants (data not shown). We found 99% bootstrap support for this clade when we used the data set containing nad5 alone but only 24% support with the data set containing the two chloroplast genes. It should be mentioned that, in agreement with the ML analyses of atpB and rbcL previously conducted by Delwiche et al. (2002) and by Cimino and Delwiche (2002), the analysis of the two-gene data set recovered a best tree whose topology is identical to that of the four-gene tree. In all three analyses, the sister relationship of the Charales and land plants received less than 50% bootstrap support.
Alternatively, the multigene chloroplast phylogenies and both the mitochondrial and four-gene phylogenies can be reconciled by assuming that chloroplast genes from an early-diverging charophycean green alga were transferred horizontally to the chloroplast genome of a charalean alga. This hypothesis, however, seems less plausible than that invoking the horizontal transfer of mitochondrial genes because it implies the almost complete replacement of the recipient genome via transfers of large segments of the donor chloroplast genome. Such massive gene transfers must be postulated to account for the numerous ancestral characters observed at the level of gene order throughout the Chara genome. Because no cases of horizontal transfers of chloroplast genes have been reported in green plants, it is hard to envision that the Charales acquired most or all their chloroplast genome from a different charophycean lineage through this process. Perhaps, replacement of the chloroplast genome occurred in the Charales by organellar introgression rather than by horizontal gene transfer per se. Considering that epiphytic and endophytic charophycean algae frequently colonize the Charales in nature (Cimino and Delwiche 2002), one can contemplate the idea that interactions of this sort might have lead to the integration of foreign chloroplasts into the cytoplasm of an ancestral charalean alga and ultimately to the total replacement of host chloroplasts. Note that we have not considered the hypothesis that chloroplast genes from an ancestral land plant were transferred horizontally to the chloroplast genome of a zygnematalean alga because this hypothesis fails to explain the strong support for the early emergence of the Charales.
An accurate phylogeny encompassing streptophyte algae and basal land plants will be essential to understand the suite of molecular events that allowed green plants to adapt and colonize the land. If the position of Chara in the multigene chloroplast phylogenies reported here proves to be correct, it will alter significantly our current view of streptophyte evolution. The four-gene phylogeny reported by Karol et al. (2001) supports the notion that the evolution of charophycean green algae was accompanied by a gradual increase in cellular complexity (McCourt, Delwiche, and Karol 2004). Given the little similarity in cellular organization between the Zygnematales, Coleochaetales, and land plants, we were very surprised to find that the zygnematalean lineage or a clade affiliating the Zygnematales and Coleochaetales has a sister relationship with land plants in the best trees supported by our chloroplast genomic data. In placing the charophycean group exhibiting the most complex cellular organization (the Charales) at a basal position relative to the Coleochaetales and Zygnematales, our comparative analyses of chloroplast genomes can be interpreted as indicating that a number of cellular features displayed by land plants were acquired from charalean green algae earlier than anticipated and that some of these features were lost from extant members of the coleochaetalean and zygnematalean lineages. In other words, the common ancestor of the Charales, Coleochaetales, and Zygnematales might have displayed a relatively complex cellular organization that became reduced in more derived green algal lineages. Perhaps, such a reductive evolution became necessary because the ancestral green algae bearing a complex organization were unable to compete with land plants for survival in the same habitat. Note here that the idea that charophycean green algae became secondarily aquatic has been previously proposed (Stebbins and Hill 1980). Alternatively, cellular features currently considered to be shared by the Charales and land plants might have emerged independently in these lineages.
Chloroplast Genome Evolution in the Streptophyta
One of the most important observations that emerged from our comparative analysis of structural cpDNA features is that the chloroplast genome has remained largely unchanged in terms of gene content, gene order, and intron composition during the transition from charophycean green algae to land plants. Our results indicate that during this evolutionary period, the chloroplast genome has lost only four genes, has gained a single intron, and has become resistant to gene rearrangements. As shown by the scenario of genome rearrangements inferred from the T1 topology, Marchantia has retained a chloroplast gene order identical to that of its common ancestor with the zygnematalean green algae (ancestor A2 in fig. 5), whereas the IR-lacking cpDNAs of Staurastrum and Zygnema have diverged considerably from this ancestral gene order (≥35 inversions). Like its Chaetosphaeridium counterpart, Marchantia cpDNA displays about 15 inversions relative to the genome of the common ancestor of the Charales, Coleochaetales, and Zygnematales (ancestor A1 in fig. 5).
Chara cpDNA has also retained a high degree of ancestral features, this genome being the most similar to the A1 ancestral genome in terms of primary sequence, gene content, and gene order. The most distinctive features of Chara cpDNA relative to other IR-containing streptophyte cpDNAs are its loosely packed genes and expanded introns. These features, accounting for the larger size of Chara cpDNA compared to its streptophyte counterparts, are likely to be shared by the chloroplast genomes of other charalean species, as Nitella translucens cpDNA has been estimated to be 400 kb in size (Palmer 1991).
Our analysis of intron content data revealed that 15 of the 21 group II introns found in land plant genomes took their origin early during the evolution of streptophyte algae, that is, during the evolutionary interval separating the Chlorokybales and Charales. Considering that the Coleochaetales and Charales are each represented by a single member, that occasional loss of introns may have occurred in these lineages, and that chloroplast introns sharing identical insertion sites may have been lost independently in the Staurastrum and Zygnema lineages, this predicted number of introns in the common ancestor of these algae must be considered as a minimal estimate. Because of their significant similarity at the sequence level, introns inserted at identical insertion sites in charophycean and land plant chloroplast genomes most likely share a common vertical ancestry. As none of these introns has been identified in the same gene context outside of the Streptophyta, it appears that intragenomic proliferation mainly accounts for the numerous group II introns in streptophyte cpDNAs. These introns have remained generally stable in all streptophyte lineages examined, except in the zygnematalean green algae Staurastrum and Zygnema.
Charles Delwiche, Associate Editor
We thank Marc-André Bureau and Jean-François Rochette for their help in determining the Chara cpDNA sequence and Patrick Charlebois and Jules Gagnon for their assistance with the bioinformatic analyses. This work was supported by the Natural Sciences and Engineering Research Council of Canada (to C.L. and M.T.).
References
Adachi, J., P. J. Waddell, W. Martin, and M. Hasegawa.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman.
Baldauf, S. L., J. R. Manhart, and J. D. Palmer.
Bergthorsson, U., K. L. Adams, B. Thomason, and J. D. Palmer.
Bergthorsson, U., A. O. Richardson, G. J. Young, L. R. Goertzen, and J. D. Palmer.
Bhattacharya, D., K. Weber, S. S. An, and W. Berning-Koch.
Bremer, K., C. J. Humphries, B. D. Mishler, and S. P. Churchill.
Castresana, J.
Chapman, R. L., and D. A. Waters.
Choudhuri, J. V., C. Schleiermacher, S. Kurtz, and R. Giegerich.
Cimino, M. T., and C. F. Delwiche.
Côté, V., J.-P. Mercier, C. Lemieux, and M. Turmel.
Delsuc, F., H. Brinkmann, and H. Philippe.
Delwiche, C. F., K. G. Karol, M. T. Cimino, and K. J. Sytsma.
Felsenstein, J.
———.
Goremykin, V. V., and F. H. Hellwig.
Goremykin, V. V., B. Holland, K. I. Hirsch-Ernst, and F. H. Hellwig.
Graham, L. E., M. E. Cook, and J. S. Busse.
Gu, X., and W. H. Li.
Guindon, S., and O. Gascuel.
Karol, K. G., R. M. McCourt, M. T. Cimino, and C. F. Delwiche.
Kenrick, P., and P. R. Crane.
Kolaczkowski, B., and J. W. Thornton.
Kugita, M., A. Kaneko, Y. Yamamoto, Y. Takeya, T. Matsumoto, and K. Yoshinaga.
Lake, J. A.
Leebens-Mack, J., L. A. Raubeson, L. Cui, J. V. Kuehl, M. H. Fourcade, T. W. Chumley, J. L. Boore, R. K. Jansen, and C. W. Depamphilis.
Lemieux, C., C. Otis, and M. Turmel.
Lewis, L. A., and R. M. McCourt.
Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny.
Lopez, P., D. Casane, and H. Philippe.
Lowe, T. M., and S. R. Eddy.
Maddison, D., and W. Maddison.
Malek, O., K. Lattig, R. Hiesel, A. Brennicke, and V. Knoop.
Manhart, J. R., R. W. Hoshaw, and J. D. Palmer.
Marin, B., and M. Melkonian.
Martin, W., O. Deusch, N. Stawski, N. Grunheit, and V. Goremykin.
Martin, W., T. Rujan, E. Richly, A. Hansen, S. Cornelsen, T. Lins, D. Leister, B. Stoebe, M. Hasegawa, and D. Penny.
Mattox, K. R., and K. D. Stewart.
McCourt, R. M., C. F. Delwiche, and K. G. Karol.
Michel, F., K. Umesono, and H. Ozeki.
Michel, F., and E. Westhof.
Moret, B. M. E., A. C. Siepel, J. Tang, and T. Liu.
Nishiyama, T., P. G. Wolf, M. Kugita et al. (12 co-authors).
Ohyama, K., H. Fukuzawa, T. Kohchi et al. (13 co-authors).
Palmer, J. D.
Petersen, J., H. Brinkmann, and R. Cerff.
Philippe, H., N. Lartillot, and H. Brinkmann.
Posada, D., and K. A. Crandall.
Qiu, Y. L., and J. D. Palmer.
Rice, P., I. Longden, and A. Bleasby.
Robinson-Rechavi, M., and D. Huchon.
Sanderson, M. J., J. L. Thorne, N. Wikstrom, and K. Bremer.
Shaw, J., and K. Renzaglia.
Shimodaira, H., and M. Hasegawa.
Soltis, D. E., V. A. Albert, V. Savolainen et al. (11 co-authors).
Spencer, M., E. Susko, and A. J. Roger.
Stebbins, G. L., and G. J. C. Hill.
Steel, M.
Steel, M., D. Huson, and P. J. Lockhart.
Steel, M. A., P. J. Lockhart, and D. Penny.
Strimmer, K., and A. von Haeseler.
Sugiura, C., Y. Kobayashi, S. Aoki, C. Sugita, and M. Sugita.
Swofford, D. L.
Thollesson, M.
Turmel, M., M. Ehara, C. Otis, and C. Lemieux.
Turmel, M., R. R. Gutell, J. P. Mercier, C. Otis, and C. Lemieux.
Turmel, M., C. Lemieux, G. Burger, B. F. Lang, C. Otis, I. Plante, and M. W. Gray.
Turmel, M., J.-P. Mercier, and M.-J. Côté.
Turmel, M., C. Otis, and C. Lemieux.
———.
———.
———.
Wolf, P. G., K. G. Karol, D. F. Mandoli, J. Kuehl, K. Arumuganathan, M. W. Ellis, B. D. Mishler, D. G. Kelch, R. G. Olmstead, and J. L. Boore.