Introduction

The Drosophila virilis group is one of the best-studied species groups in the subgenus Drosophila. The number of species included in the D. virilis group is object of disagreement among scientists, depending on the classification of D. americana americana and D. americana texana as two distinct species or alternatively as two chromosomal forms of the same species. Results obtained mainly from sequence polymorphism analysis support the hypothesis that D. a. americana and D. a. texana are two forms of the same species (McAllister and Charlesworth, 1999; McAllister and McVean, 2000; Vieira et al, 2001; McAllister, 2002; McAllister, 2003; Vieira et al, 2003). Phylogenetic analyses recognize the existence of 14 species in the D. virilis group, divided in the D. virilis and D. montana phylads (Spicer and Bell, 2002). Recently, the D. virilis group has been further subdivided into four lineages: the D. virilis phylad, and the D. montana, D. littoralis and D. kanekoi subphylads (Spicer, 1992; Spicer and Bell, 2002). The analyses of multiple individuals for nuclear genes, however, indicated considerable sequence divergence within species resulting in groupings not consistent with the presumed phylogeny (Hilton and Hey, 1996; Nurminsky et al, 1996; Hilton and Hey, 1997). It has been argued that this is a reflection of a high proportion of shared ancestral polymorphism in the D. virilis group.

Recently, microsatellite polymorphism analysis was successfully used to infer the phylogeny of the D. melanogaster group, which also shares ancestral polymorphism (Harr et al, 1998; Harr and Schlötterer, 2004). Interestingly, the microsatellite-based phylogeny of the D. melanogaster group is identical to the one inferred by sequence analysis of the Odysseus gene. As this gene is involved in sterility of hybrid males produced by crossing D. simulans and D. mauritiana (Ting et al, 1998, 2000), phylogenies of Odysseus and other genes involved in hybrid sterility are more likely to reflect the species phylogeny than other, randomly selected genes. In the absence of information about genes involved in hybrid sterility, a consensus phylogeny based on multiple neutrally evolving loci should also reflect the species phylogeny (Schlötterer, 2001).

We used 48 microsatellite loci isolated from D. virilis to infer the phylogenetic relationships in the D. virilis phylad. Special emphasis was given to the D. virilis subphylad encompassing D. virilis, D. a. americana, D. a. texana, D. novamexicana and D. lummei. While our analysis indicated that members of the D. virilis group are well separated from each other, only very low resolution was observed for the D. montana phylad.

Material and methods

The list of species used in this work is shown in Table 1 in the online supplementary material. A total of 48 microsatellite loci isolated from D. virilis (Schlötterer, 2000; Huttunen and Schlötterer, 2002) were used for phylogenetic analyses. DNA was extracted from a single individual from each strain using a high salt extraction protocol (Miller et al, 1988). Microsatellite analysis followed standard protocols (Schlötterer, 1998). In brief, PCR reactions were performed in a final volume of 10 μl containing 50–100 ng of genomic DNA, 1 μM of each primer (the forward one was end-labeled with 32P), 200 μM dNTPs, 1.5 mM MgCl2 and 1 U Taq Polymerase. In total, 30 PCR cycles were performed with the following profile: 1 min at 95°C, 30 s at 45–59°C (depending on the locus) and 30 s at 72°C. An initial denaturing step of 5 min at 95°C and a terminal extension of 30 min at 72°C were applied. PCR products were separated on a 7% denaturing polyacrylamide gel (32% formamide, 5.6 M urea) at 90 W and visualized by autoradiography after 12–24 h. Allele sizes were determined running a ‘PCR slippage ladder’ and a known size standard adjacent to the samples (Schlötterer and Zangerl, 1999).

Statistical analyses

For all analyses, we randomly selected one allele for each individual/locus to account for the inbreeding effect during the propagation of the isofemale lines. All genetic distances including the proportion of shared alleles, (δμ)2, and FST were calculated with the Microsatellite Analyser Program (MSA) version 3.12 (Dieringer and Schlötterer, 2003). The genetic distance based on the proportion of shared allele was calculated as –ln (proportion of shared allele). Standard statistical analyses were performed with the SPSS 10 program. UPGMA and Neighbor Joining (NJ) trees for species and populations were generated using PHYLIP 3.6 (Felsenstein, 1991). Phylogenetic trees were visualized using the Treeview 1.5 (Page, 1996).

The program STRUCTURE (Pritchard et al, 2000a) was used to assign individuals to homogenous clusters (species) without consideration of the original species assignment. This program uses a Bayesian model-based clustering method for multilocus genotypes to simultaneously determine the most probable number of homogenous groups in a given data set and assigns individuals to one or more of them. The number of clusters is inferred by calculating the probability P(XK) of the data, given a certain prior value of K (number of clusters) over a number of Monte Carlo Markov Chain (MCMC) iterations. The posterior probabilities P(KX) can be calculated following Bayes' rule. The clusters are characterized by different allele frequencies and individuals are probabilistically assigned to one or more clusters, according to their allele distribution. The scores of individuals in the clusters correspond to the probability of ancestry in any one of them. In this study, we assumed prior values of K from 1 to 5. All calculations shown in this report are based on 1 000 000 iterations of the MCMC, following a ‘burn-in’ period of 50 000 iterations (the burn-in period is the first set of iterations of the MCMC that is dependent on the start configuration – the iterations are not incorporated in the final calculation of the posterior probability. We ran the program without incorporation of prior species information and allele frequencies were treated as uncorrelated.

Results and discussion

Reflecting the public availability of strains, our data set was not balanced with respect to the number of individuals representing a given species. For some species, only a single line was used, while for others multiple lines were available. Therefore, the phylogenetic analysis was performed on the basis of individuals (taken from separate isofemale lines) rather than species. Both the UPGMA and the NJ trees based on the proportion of shared alleles (Bowcock et al, 1994) indicate that isofemale lines from the same species tend grouping together. The phylogenetic trees obtained with UPGMA (Figure 1, Figure 2 supplementary material) and NJ (Figure 1 supplementary material) were very similar, but the UPGMA tree provides a better clustering according to species status. In the D. virilis phylad, D. novamexicana and D. a. texana are more closely related than D. a. americana and D. a. texana. D. lummei is well separated from these three species.

Figure 1
figure 1

UPGMA tree of the whole set of individuals considered, using the proportion of shared alleles distance.

The phylogenetic relationship in the D. montana phylad is not resolved, despite that D. littoralis, D. montana and D. flavomontana isofemale lines group according to species status. The two D. eozana lines do not group together. One line clusters with D. montana while the other one groups with D. kanekoi. In the NJ tree, both individuals also group to different clades (Figure 2 supplementary material). While it would have been desirable to test whether this grouping is the consequence of insufficient statistical power, or has a biological meaning or is an artefact due to contamination of one of the lines, we could not address this question further as we had no alive flies after the analyses were completed.

One very interesting aspect of the microsatellite data is the low differentiation between species. In fact, the tree of individuals shown in Figure 1 resembles more an intraspecific tree of individuals rather than a tree containing different species. In the D. melanogaster group, microsatellite analysis based on the proportion of shared alleles separated the four species by long branches and high bootstrap (Harr et al, 1998; Harr and Schlötterer, 2004). The low differentiation among species in the D. virilis phylad may have two reasons. First, as D. virilis microsatellites are longer than D. melanogaster (Schlötterer and Harr, 2000a), their mutation rate may be higher. Thus, alleles may be frequently identical by state but not by descent (homoplasy), rendering the proportion of shared alleles a problematic genetic distance. Alternatively, the species in the D. virilis group may share substantial ancestral variation or they are exposed to gene flow, which results in a low genetic differentiation. In order to discriminate between high mutation rate and shared ancestral alleles, we calculated pairwise genetic distances among species. Two genetic distance measurements were used – the proportion of shared alleles and (δμ)2. As (δμ)2 is linear with time and mutation rate (Goldstein et al, 1995), genetic distances based on (δμ)2 should reflect true divergence times. The proportion of shared alleles, however, asymptotes for long divergence times and/or high mutation rates (Goldstein et al, 1995). If the low differentiation between species is caused by a high microsatellite mutation rate, then the pairwise distances based on (δμ)2 and proportion of shared alleles should not be (or only loosely) correlated. Nevertheless, we found a highly significant correlation between them (Kendall's τ=0.411, P< 0.001). Also the coefficient of variation of all pairwise distances was similar for (δμ)2 and proportion of shared alleles (0.42 vs 0.28). The slightly higher coefficient of variation observed for (δμ)2 may indicate that this distance measurement provides more differentiation among the species studied (Table 1). Nevertheless, it has to be noted that (δμ)2 is highly sensitive to indels occurring in the sequence flanking the microsatellite repeat. As such indels occur frequently (Colson and Goldstein, 1999), they may inflate (δμ)2 based distances. Given that we failed to observe obvious differences between two distance measurements, of which one is highly sensitive to a high mutation rate, but the other ((δμ)2) is not, we conclude that the low differentiation between the species in the D. virilis group is not primarily caused by a high microsatellite instability.

Table 1 Pairwise genetic distances among species, calculated for the proportion of shared alleles (lower triangular matrix) and (δμ)2 (upper triangular matrix)

Apart from homoplasy (allele sharing generated by mutation rather by shared descent), the low differentiation among the species in the D. virilis group could have two reasons. Either there is still ongoing gene flow among the species or they maintained a large effective population size, which resulted in a large number of shared ancestral alleles. For D. a. americana and D. a. texana, these two hypotheses have been intensively discussed. A fusion between element B and the X chromosome in D. a. americana differentiates D. a. americana and D. a. texana (Throckmorton, 1982). Nevertheless, sequence analysis of several genes provided ambiguous results (Hilton and Hey, 1996; Nurminsky et al, 1996; Hilton and Hey, 1997; Vieira and Charlesworth, 2000; O'Grady et al, 2001). Sequence analysis data of Alcohol dehydrogenase (Adh) (McAllister and Charlesworth, 1999) and a phylogenetic study performed using the Adh gene (Nurminsky et al, 1996) revealed the absence of divergence between D. a. americana and D. a. texana. Furthermore, the X-linked per gene (Hilton and Hey, 1996) and a population survey of RFLP variation on chromosome 4 failed to support genetic differentiation between D. a. americana and D. a. texana (McAllister, 2002). On the contrary, a highly significant differentiation between D. a. americana and D. a. texana was detected using microsatellites (Schlötterer, 2000) and the fused1 locus (Vieira et al, 2001).

More recently, the absence of differentiation between D. a. americana and D. a. texana was assessed through the study of sequence polymorphism of four different genes (McAllister, 2003; Vieira et al, 2003).

We evaluated the statistical significance of the differentiation in the D. virilis phylad by generating pseudoreplicas of the microsatellite data by bootstrapping. To reduce the number of nodes, all D. virilis individuals were grouped into a single OTU, but for the other species of the D. virilis phylad each isofemale line was treated as separate OTU (Figure 2).

Figure 2
figure 2

NJ tree based on the proportion of shared allele of the D. virilis phylad. All D. virilis specimens are collapsed in a single OTU. Bootstrap values above 50% are shown.

The monophyletic origin of D. lummei and D. novamexicana is strongly supported. This result is surprising as DNA sequence analysis of two genes in D. novamexicana indicated the presence of two diverged clusters clearly predating the speciation event (Hilton and Hey, 1996). The authors noted, however, that the assignment of individuals to one of the clades differed among the analyzed genes. As our data set is based on 48 microsatellite loci, such clusters formed by divergent ancestral alleles are very likely to be averaged out. Thus, the strong support for the monophyletic origin of D. novamexicana demonstrates the advantage of multilocus phylogenies.

The grouping of D. a. texana with D. novamexicana is only weakly supported. Furthermore, the (δμ)2 distance grouped D. a. texana and D. a. americana together and D. lummei grouped closer to D. a. texana and D. a. americana than D. novamexicana (data not shown), a result not consistent with previous analyses (Hilton and Hey, 1996; O'Grady et al, 2001; Spicer and Bell, 2002). The statistical support for these groupings was very weak in a tree of individuals. Furthermore, other distance measurements – Nei's chord distance (Nei et al, 1983) and Cavalli Sforza and Edwards chord distance (Cavalli-Sforza and Edwards, 1967) – also supported the tree topology obtained by the proportion of shared alleles (data not shown). The results were essentially the same when UPGMA clustering algorithm was used rather than NJ (data not shown). The strong separation we observed between D. a. americana and D. a. texana could be due to the presence of high number of chromosomal inversions in D. a. americana that are present only at low frequency in D. a. texana (Hsu, 1952). The different frequency of inversions likely suppress recombination between D. a. americana and D. a. texana in a 600 Kb region around each inversion breakpoint and at the base of the X chromosome (Vieira et al, 2003). Therefore, recombination between chromosomes is prevented in a considerable proportion of the euchromatic genome. Microsatellites located in these genomic regions of reduced recombination between D. a. americana and D. a. texana are expected to show a more pronounced level of differentiation. The FST values per locus, however, indicated that 11 out of 43 loci (three loci did not amplify, vir44, vir63, v71-38 and two were monomorphic, vir35cs, vir32) were significantly different (P<0.05) between D. a. americana and D. a. texana. This result demonstrates that the phylogenetic differentiation between D. a. americana and D. a. texana is genome wide and not attributable to a low number of markers falling into the nonrecombining part of the chromosomes.

For further evidence for the separation of D. a. texana and D. a. americana, we used a model-based clustering method for multilocus genotype data to infer population/species structure and to assign individuals to groups (Pritchard et al, 2000a). In combination with Baysian statistics, this method provides the most likely number of groups in the sample. Table 2 indicates that the most likely number of groups is three. Individuals were assigned to each group according to their taxonomic status (ie: D. a. americana, D. a. texana and D. novamexicana). Only a single individual (D. a. americana, 0951.0) had a relatively high coefficient of coancestry with D. a. texana specimens (≈60%). This clear separation between D. a. americana and D. a. texana is fully consistent with a recent report using a small subset of loci mapping to Muller's element B (Schlötterer, 2000). As our data set contained the same loci used by Schlötterer (2000), we wanted to rule out that the signal of differentiation between D. a. americana and D. a. texana is limited to loci mapping to the fused neo-sex chromosomes. We repeated the analysis excluding all previously analyzed loci and obtained similar results (Table 2). Significant pairwise FST values between these three groups further substantiated our observation of a genome-wide differentiation between D. a. americana, D. a. texana and D. novamexicana (Table 3).

Table 2 Inferring the number of distinct groups in a sample consisting of D. a. americana, D. a. texana and D. novamexicana individuals
Table 3 Pairwise FST values (upper triangular matrix) and the corresponding P-values after Bonferroni correction (lower triangular matrix)

Previous studies also indicated that D. a. texana and D. a. americana are not the closest species pair (Hilton and Hey, 1996, 1997; Spicer and Bell, 2002). Nevertheless, it is surprising that while D. a. americana and D. a. texana share the fusion between Muller's element X and Y, D. novamexicana has the more ancestral karyotype lacking this fusion. A better sampling of D. a. americana and D. a. texana covering the distribution of both species is required to determine whether the significant differentiation between these species reflects geographic separation rather than different species status.