Main

Mesostigma is a scaly, green biflagellate that belongs to the Prasinophyceae8, a morphologically heterogeneous class that includes descendants of the earliest diverging green algae2,7,8,9. No unique character unites the Prasinophyceae to the exclusion of other green algal classes. In phylogenetic trees inferred from nuclear small subunit (SSU) ribosomal DNA1,2,3,7,9 and actin-coding sequences10, all prasinophytes examined so far, except Mesostigma, form independent lineages at the base of Chlorophyta. Mesostigma represents the earliest divergence within Streptophyta: in SSU rRNA trees1, this position is supported by low bootstrap values (52%), whereas in actin-gene trees10, it is more strongly supported (>81%). A specific affinity between Mesostigma and charophytes was previously predicted on the basis of the identical orientation of multilayered structures relative to flagellar roots1,7,11.

The 135 genes in Mesostigma chloroplast DNA (cpDNA) ( Fig. 1) represent the largest gene repertoire ever reported among green algal and land plant cpDNAs. We analysed the concatenated protein sequences (10,629 amino-acid positions) from 53 genes that are common to the cpDNAs of Mesostigma, three land plants (Marchantia polymorpha, Pinus thunbergii and Nicotiana tabacum) and three chlorophyte green algae (Nephroselmis olivacea, Chlorella vulgaris and Pedinomonas minor). The homologous proteins from the glaucocystophyte Cyanophora paradoxa, the red alga Porphyra purpurea and the cyanobacterium Synechocystis sp. PCC6803 were used to root the green algal phylogeny in these analyses. It is generally accepted that all chloroplasts were derived from a single primary endosymbiotic event involving the capture of a cyanobacterium12. The chloroplasts of glaucocystophytes, red algae and green algae are thought to be direct products of this primary endosymbiotic event12. Phylogenetic evidence indicates that glaucocystophyte chloroplasts evolved before those of red and green algae13.

Figure 1: Mesostigma cpDNA.
figure 1

Inverted repeat sequences (IRA, IRB) separate small (SSC) and large (LSC) single-copy regions. Genes outside the map are transcribed clockwise. Genes absent from Marchantia cpDNA are represented in beige. Gene clusters shared with Marchantia are shown as series of green and red boxes; those corresponding to brackets inside the map are shared exclusively with land plants. Genes present in Marchantia but located outside conserved clusters are shown in grey. Thick brackets indicate clusters that are shared with some non-green algae but are absent from Synechocystis. Thin brackets denote clusters that are shared specifically with both chlorophytes and streptophytes (asterisks) or only chlorophytes.

Trees constructed with distance, maximum parsimony and maximum likelihood inference methods using Cyanophora proteins as an outgroup were found to be congruent in showing a strongly supported topology, ‘T1’ (Fig. 2a), in which Mesostigma emerges before the divergence of Streptophyta and Chlorophyta. Only T1 was detected in quartet-puzzling and distance-based analyses under the amino-acid substitution models of Jones et al.14 (JTT-F) and Dayhoff et al.15, with either a uniform or a gamma-distributed rate of substitution across sites. In maximum-parsimony analysis, bootstrap support for T1 was 97%, and two alternative topologies, ‘T2’ and ‘T3’ (Fig. 2b ), differing with respect to the position of Mesostigma, were each recovered in 1.5% of the bootstrap samples. T1, T2, and T3 were also the only topologies recovered in maximum-likelihood analysis under the JTT-F model with a uniform rate of substitution; T1 was found in 98.9% of bootstrap samples, and both T2 and T3 were significantly worse than T1 (T2, P < 0.05; T3, P < 0.01) in the Kishino-Hasegawa test16. Removal of constant sites from the data set had negligible effect on bootstrap support for T1 in distance and maximum-likelihood analyses, and on the confidence limit of tree topologies under the Kishino-Hasegawa test (see Supplementary Information ). Considering that most constant sites (5,585 out of 5,628) were estimated to be invariable using SPLITSTREE, these observations eliminate the possibility that misleading effects of invariable sites17 contributed to the recovery of T1. Even when the Cyanophora sequences were excluded from the data set, or when other outgroup sequences (Porphyra proteins, or a combination of Cyanophora, Porphyra and Synechocystis proteins) were used, trees compatible with T1 remained the best topologies (see Supplementary Information).

Figure 2: Phylogenetic position of Mesostigma as inferred from 53 chloroplast proteins.
figure 2

a, Preferred tree topology found in all analyses. The log likelihood (ln L) value was calculated using PROTML and the JTT-F model. The numbers above the nodes indicate the bootstrap percentages and reliability percentages obtained in distance and puartet-puzzling analyses, respectively. The numbers below the nodes indicate the bootstrap percentages in maximum-parsimony analysis. b, Alternative tree topologies detected in maximum-likelihood, maximum-parsimony and LogDet analyses. The ΔlnL value indicates the difference in log likelihood relative to T1 and the standard error of that difference.

Systematic biases in amino-acid composition can give rise to incorrect relationships in reconstructed phylogenetic trees18. Chi-squared analysis of amino-acid composition at variable sites reveals that the 53 concatenated proteins of Mesostigma do not significantly differ (P < 0.05) from those of Cyanophora, Marchantia, Pedinomonas and Pinus. Our failure to detect a systematic bias in amino-acid composition thus suggests that the basal position of Mesostigma is the result of a genuine phylogenetic signal. In support of this conclusion, neighbour-joining analysis of LogDet distances calculated after removal of all constant sites retrieved the T1 topology in 93% of bootstrap samples, with T2 being the only alternative topology detected. LogDet distances allow the recovery of the correct tree when sequences differ markedly in amino-acid frequencies for cases when substitution processes are otherwise uniform across the underlying tree19.

We also carried out maximum-likelihood analyses of concatenated chloroplast SSU and large subunit (LSU) rDNA sequences (4,016 positions, of which 1,026 are variable) from the taxa shown in Fig. 2, using the HKY model of nucleotide substitution16 (for analyses with other models, see Supplementary Information). The reconstructed trees were found to be congruent with those inferred from chloroplast proteins in providing unequivocal support for T1. T1 and T3 accounted for 99.0% and 1.0% of bootstrap samples, respectively (T2 was not recovered), and T3 was significantly worse than T1 (P < 0.05) in the Kishino-Hasegawa test. Removal of all constant sites from the data had little effect on support for T1.

In favouring the placement of Mesostigma within Streptophyta (a position corresponding to T3), actin-gene trees10 contrast with those that we inferred from chloroplast sequence data. We used the Kishino-Hasegawa test under the HKY model to assess the confidence limit of actin-tree topologies (see Supplementary Information) and found that the phylogenetic information in the actin data set (198 variable sites) is not sufficient to eliminate the hypothesis that Mesostigma branches at the base of Chlorophyta and Streptophyta (T1). The actin trees that support this hypothesis represented a meaningful proportion (12.7%) of bootstrap samples in maximum-likelihood analysis and were not significantly worse (P 0.30 ) than those compatible with T3 (best maximum-likelihood trees). From this result, we conclude that the discrepancy between the chloroplast and the actin tree is only apparent.

The positioning of Mesostigma as the earliest divergence in the phylogeny of green plants is supported by our finding that this prasinophyte has retained more ancestral cpDNA features than previously examined green plants. The most relevant of these ancestral features are (1) a quadripartite structure in which common genes reside in corresponding genomic regions—a feature that is also shared with the cpDNAs of some non-green algae, most land plants and Nephroselmis20; (2) the presence of five genes (trnA(ggc), odpB, ycf20, ycf61 and ycf65 ) that were previously identified only in Porphyra cpDNA; (3) the presence of all genes that were lost specifically in Chlorophyta or Streptophyta20, with the exception of one (rpl21) of the eight genes specifically missing from Chlorophyta and three (rne, rnpB and rpl12) of the sixteen genes missing from Streptophyta; (4) the presence of nine gene clusters that are found in some non-green algal cpDNAs but not in the Synechocystis genome (Fig. 1)—only subsets of these clusters have been retained in other green plants ( Table 1); and (5) the absence of introns—a feature that is also shared with the cpDNAs of Nephroselmis20 and most non-green algae12.

Table 1 Distribution of the Mesostigma chloroplast gene clusters that are shown in Figure 1

Many genes in Mesostigma cpDNA form clusters that are shared exclusively with chlorophytes and/or streptophytes (Table 1), thus strengthening the notion that Mesostigma represents the most basal green plant lineage. These clusters provide clues into how the green plant chloroplast genome diverged from the ancestral pattern of gene organization during the evolution of Chlorophyta and Streptophyta.

The gene organization of Mesostigma cpDNA is highly similar to that of land plant cpDNAs, with 81% of its genes being found in clusters that are shared with land plant cpDNAs (Fig. 1). This observation indicates that the chloroplast genome from the common ancestor of chlorophytes and streptophytes has been highly preserved in structure and gene order during the long evolutionary period (800 Myr (ref. 6)) separating this ancestor from land plants. Such an exceptional conservation of gene order was unexpected, as Nephroselmis cpDNA, the green algal cpDNA previously known to display the most ancestral characters, shares a limited number of gene clusters with its land plant counterparts20. Our results predict that cpDNA rearrangements occurred relatively infrequently in Streptophyta and that analysis of these events will be useful not only to study how the chloroplast genome evolved during the transition from the most ancestral green flagellate to land plants, but also to clarify phylogenetic relationships among streptophytes.

The morphological characteristics8 of Mesostigma support the view that the most ancestral green flagellate was a biflagellated and asymmetric cell that had an underlayer of square scales (a character unique to green algae), an eye spot and a cruciate flagellar root system with multilayered structures1,7. The number of flagella is highly variable among prasinophytes and other green algae. Square-shaped scales are found in members of both Chlorophyta and Streptophyta; however, the presence of an eye spot has been noted only in chlorophytes. Multilayered structures are found in chlorophytes and streptophytes, and appear to be an ancestral character of algae, as similar structures have been described in other algal groups6.

Methods

DNA sequencing

Chloroplast DNA from Mesostigma viride (NIES-296) was isolated from total cellular DNA as an AT-rich fraction by CsCl-bisbenzimide isopycnic centrifugation20. This DNA preparation was sheared by nebulization, and 1,500–3,000-bp fragments were recovered by electroelution after agarose gel electrophoresis. These fragments were treated with Escherichia coli Klenow fragment and T7 DNA polymerase, and cloned into the SmaI site of Bluescript II KS+. After hybridization of the clones with the original DNA used for cloning, DNA templates from positive clones were prepared with the QIAprep 8 Miniprep kit (Quiagen). Nucleotide sequences were determined with the PRISM dye terminator cycle sequencing kit (Applied Biosystems) on a DNA sequencer (model 373; Applied Biosystems) using T3 and T7 primers. Sequences were assembled and analysed as described20. Short genomic regions not represented in the clones analysed were sequenced from PCR-amplified fragments.

Phylogenetic analysis

Genome sequences were retrieved from GenBank. Pedinomonas cpDNA sequences are from our unpublished data. Individual protein and rRNA gene sequences were aligned with CLUSTALW 1.74 (ref. 21), alignments were concatenated, and ambiguously aligned regions containing gaps were excluded. The alignments and data sets are available in Supplementary Information. The program packages MOLPHY 2.3b3 (ref. 16), PHYLIP 3.573c22, PUZZLE 4.0.223 and SPLITSTREE 2.424 were used for phylogenetic analyses. Symmetric distance matrices were computed with PUZZLE and PROTDIST22, whereas Logdet distances were calculated with SPLITSTREE. Distance trees were constructed with NEIGHBOR22 and/or FITCH22, maximum-parsimony trees were obtained with PROTPARS22, and quartet-puzzling trees were generated with PUZZLE. The robustness of distance and maximum-parsimony trees was assessed by bootstrap percentages after 100 replications. In the case of quartet-puzzling trees, reliability percentages of the occurrence of the nodes were estimated after 10,000 puzzling steps. Maximum-likelihood analyses of protein and DNA sequences were carried out with PROTML16 and NUCML16, respectively, and local bootstrap probability was estimated by resampling of the estimated log likelihood16.