Introduction

Arabidopsis lyrata is a close relative of Arabidopsis thaliana, the model organism for plant molecular biology, and itself is becoming a model for ecology and evolution (Clauss and Koch, 2006). It benefits from a wealth of molecular tools and information accumulated for its congener, and the genomic resources for A. lyrata are also rapidly improving (Kuittinen et al., 2002, 2004; http://www.jgi.doe.gov/sequencing/why/CSP2006/AlyrataCrubella.html). In contrast to A. thaliana, it is a perennial, outcrossing species and forms stable populations (Clauss and Koch, 2006). As such, it is a more tractable organism for population genetics and allows addressing evolutionary questions and testing the validity of theoretical models (Riihimäki et al., 2005). For instance, A. lyrata, together with other species of the genus Arabidopsis, is intensively used for the study of self-incompatibility evolution (Mable et al., 2005; Schierup et al., 2006) or the genetic basis of adaptation at different geographic scales (for example, Kärkkäinen et al., 2004; Riihimäki et al., 2005, Kivimäki et al. 2007).

Despite this interest, the evolutionary history of the species is not well known. Follwing three subspecies are distinguished (Al-Shehbaz and O’Kane, 2002): A. lyrata ssp. lyrata (North America), A. lyrata ssp. petraea (Europe and Siberia) and A. lyrata ssp. kamchatica (from Eastern Asia to Northwest America). Within Europe, ssp. petraea has a patchy distribution extending from Central Europe to Norway (Jalas and Suominen, 1994; Jonsell et al., 1995). It occurs under a variety of climatic and ecological conditions, but is most often cold-tolerant and grows in low competition habitats (Jonsell et al., 1995; Clauss and Mitchell-Olds, 2006). Studying the large-scale population structure and history of this species is needed as a prerequisite for the study of potentially selected variation (Wright and Gaut, 2005). It is also interesting because A. lyrata constitutes a good example of the evolution of a species complex affected by the climatic oscillations of the Pleistocene (Abbott and Brochmann, 2003).

Indeed, the Pleistocene has been a period of intense and recurrent climatic changes that have affected the distribution and genetic diversity of many species, leading to the evolution of differentiated genetic entities and in some cases to speciation (Hewitt, 2004). Cold periods displaced the distribution ranges southwards and European refugia have been described for a variety of organisms in Spain, Italy and the Balkans (Taberlet et al., 1998). During interglacial times, species were able to expand northward; in Europe, different routes of colonization have contributed to the present-day distribution of many different species (Taberlet et al., 1998). In addition to these North–South movements, the evolution of species having a ‘circumboreal’ distribution such as A. lyrata, has been affected by East–West movements, from Europe to America, across the Atlantic, as well as through Russia and Asia (Abbott and Brochmann, 2003). In plants, the vast majority of studies of glacial and postglacial history have concerned tree species (for example, Heuertz et al., 2004), and a few other circumboreal species (Alsos et al., 2005).

Previous studies on neutral variation, based on isozymes, microsatellites or nucleotide polymorphism, have revealed a strong differentiation between A.l. ssp. petraea and A.l. ssp. lyrata, and—generally but not always—a lower level of diversity in ssp. lyrata, potentially due to population bottlenecks (Van Treuren et al., 1997; Savolainen et al., 2000; Ramos-Onsins et al., 2004; Wright et al., 2006). Within Europe, genetic diversity appears to be strongly influenced by population history and demography (Kärkkäinen et al., 2004; Ramos-Onsins et al., 2004). A recent study (Clauss and Mitchell-Olds, 2006) analyzed the genetic diversity of ssp. petraea in Central Europe and suggested that the species may have persisted in this region during the last glacial maximum. The relationships of Northern European, especially Scandinavian populations, with this potential refugium, the importance of founder effects during recolonization and the level of differentiation within Europe compared to the between continent differentiation (that is, differentiation with the subspecies ssp. lyrata) remain to be studied.

With their high level of diversity, ease of use and codominant nature, microsatellite markers have already proven useful in small-scale analyses of A. lyrata (Clauss et al., 2002; Clauss and Mitchell-Olds, 2006). In the present study, we surveyed microsatellite variation at 22 loci in a sample including seven European populations of ssp. petraea and one American population of ssp. lyrata. We asked the following main questions: how has postglacial colonization impacted the patterns of within population diversity, and divergence of populations? Have the populations reached a genetic equilibrium after the colonization? What are the implications of the demographic history for studies of molecular ecology and functional genomics?

We show that populations of A. lyrata exhibit highly variable levels of diversity and are strongly differentiated, especially within Europe, as a consequence of the different routes of postglacial colonization. Further, we illustrate how the variable within-population diversity can impact the methods of analyzing population differentiation. Finally, our results can serve as baseline information for the study of functionally important variation.

Materials and methods

Sampling of populations and loci

We included seven European populations of A. lyrata ssp. petraea (Figure 1): Karhumäki (Russia 62°55′N, 34°25′E), Stubbsand (Sweden 63°13′N, 18°90′E), Spiterstulen (Norway, 61°38′N, 8°24′E), Lom (Norway, 61°50′N, 8°30′E, 20 km from Spiterstulen), Reykjavik (Iceland, 64°09′N, 21°58′W), Plech (Germany 49°39′N, 11°29′E) and Bohemia (Czech Republic 50°03′N, 14°06′E), and one American population of A. lyrata ssp. lyrata—Mayodan (North Carolina, USA, 36°41′N, 79°97′W). Sample sizes ranged from 11 to 30 per population for a total sample size of 186 individuals.

Figure 1
figure 1

Distribution map of A. lyrata ssp. petraea in Europe (Jalas and Suominen, 1994). The triangles show the location of the sampled European populations. Lom and Spiterstulen are indicated by the same symbol.

For all populations except Bohemia, we analyzed field-collected seeds sampled from different families. Bohemia seeds were a population sample propagated in the laboratory, kindly provided by Mark McNair. Reykjavik seeds were kindly provided by MH Schierup and Mayodan seeds by CH Langley.

We used 22 microsatellite loci among which 21 were previously described (Table 1). The primers used for SOC1 were 5′ TGTCAAATGTATTCGAGCAAGA and 5′ TGTAAGAGCAAGCACAAGAGGA. Some of these loci are mapped (Kuittinen et al., 2004; J Leppäla, unpublished). Except for ELF3 and SOC1, all loci have been derived from A. thaliana.

Table 1 Description of the loci used in the present study, summary of sequence information and overall level of diversity

Laboratory methods

DNA was extracted from about 200 mg of plant leaves using the FastDNA kit and the FastPrep instrument (Qbiogene Inc, Carlsbad, CA, USA). The amplification reaction consisted of 10 ng DNA, 0.25 μM each primer, one of which was fluorescently labelled with TET, HEX or FAM, 1 × reaction buffer (Promega, Madison, WI, USA), 2.5 mM MgCl2, 100 μM dNTP, 0.15 U Taq DNA polymerase (Promega), in a total volume of 15 μl. The amplification method was as follows: 94 °C for 3 min, 35 cycles of 94 °C for 20 s, 50 °C for 30 s, and 72 °C for 10s, followed by a final extension for 45 min at 72 °C. Electrophoresis was performed on an ABI Prism 377 sequencer. The GENESCAN 3.1 and GENOTYPER 2.0 (Applied Biosystems, Foster City, CA, USA) were used to analyze the DNA fragments and to score the genotypes. We reported all observed size classes. The final extension time was long to ensure that one T was added to all fragments by the Taq polymerase.

As most of the microsatellite loci had been derived from A. thaliana, their molecular structure and thus their mode of evolution in A. lyrata could differ from that of typical microsatellites. For 18 loci, we thus sequenced one allele from each of the populations Karhumäki, Plech, Spiterstulen and Mayodan. For homozygote genotypes, we directly sequenced the polymerase chain reaction product. For heterozygotes, we cloned the polymerase chain reaction products into TOPO TA PCR 2.1 cloning vector (Invitrogen, Carlsbad, CA, USA) and sequenced one single clone per individual. The sequencing reactions were run in the ABI PRISM 377 sequencer and the contigs were read and aligned using the Staden-package (http://staden.sourceforge.net/). For each locus, we scored whether it was a perfect microsatellite (that is, shows one stretch of multiple repeats only), a compound microsatellite (that is, shows more than one stretch of repeats) or not a microsatellite (that is, does not show any clear repeat), and whether we observed point mutations between alleles, or indels outside the repeat area. Sequences are available under accession numbers AM411873–AM411876 and AM413048–AM413110.

Data analysis

The number of alleles and unbiased gene diversity (Nei, 1987) within populations and on the whole data set were calculated using the program GENETIX (Belkhir et al., 2001). The number of different alleles per population was standardized for a sample size of 22 gene copies or 11 diploid individuals using the rarefaction method of Petit et al. (1998) implemented in the software CONTRIB; the value obtained after rarefaction was denoted allelic richness. The rarefaction method was also applied to the number of private alleles using the software HP-RARE (Kalinowski, 2005) to estimate a private allelic richness. As the different populations showed highly varying levels of allelic richness, and as these differences could by themselves impact the absolute value of the private allelic richness, we also computed the ratio of private allelic richness to allelic richness.

To perform multilocus analyses, we needed to consider subsets of independent loci. Therefore, we analyzed genotypic disequilibrium for all pairs of loci in each population with exact tests using GENEPOP (Raymond and Rousset, 1995), and applied a sequential Bonferroni correction (Rice, 1989). We applied a liberal Bonferroni threshold (equal to 10 times the correct threshold) to be conservative in the conclusion of independence between loci. Then, we used the results of these tests together with the available linkage information to make up all possible subsets of independent loci (20 subsets, see Results). All multilocus analyses described below were made on each subset separately, as well as on the full set of 22 loci.

The significance of differences in genetic diversity parameters between pairs of populations was tested using Wilcoxon signed-rank tests comparing parameters for the same loci in different populations using SAS software (SAS Institute, Cary, NC, USA). Hardy–Weinberg equilibrium was tested for each locus and across loci within each population using exact test implemented in GENEPOP.

Using the software BOTTLENECK (Piry et al., 1999), we tested for deviation from mutation-drift equilibrium within each population, using both a sign-test and a Wilcoxon signed-rank test for heterozygosity excess or deficiency (Cornuet and Luikart, 1996). We performed the tests under the IAM (Infinite Allele Model) and SMM (Stepwise Mutation Model) models. Using BOTTLENECK, we also computed the T2 statistic, which is a function of the standardized deviates of the sample heterozygosities relative to the heterozygosity expected given the allelic richness. A positive value of T2 denotes a heterozygosity excess (Cornuet and Luikart, 1996).

Wright's FST were estimated using the method of Weir and Cockerham (Weir, 1996), using the program GENEPOP. The significance of population differentiation was assessed using exact tests also in GENEPOP. These analyses were performed for individual loci and over loci, in pairwise population comparisons and across all populations.

The value of FST is strongly dependent on the level of average within population gene diversity (HS, Hedrick, 1999), and HS was highly variable among loci. Hedrick (2005) expressed the maximum value that Nei's estimator of FST (GST; Nei, 1987) can reach, given the value of HS, and proposed computing a standardized value of GST (called GST’) by dividing the estimated value by the maximum value (Hedrick, 2005). As we were using the estimator of FST of Weir and Cockerham (Weir, 1996), instead of GST, we manually computed the maximum possible value of this estimator, given the observed levels of intrapopulation diversity. Namely, we modified the numbering of the alleles observed in different populations so that populations were not sharing any alleles. Then we estimated all new FST values, which now correspond to FSTmax values, using GENEPOP on the modified data, and from this, computed standardized, FST′ values.

To depict the relationships between populations, we built neighbor-joining trees (Saitou and Nei, 1987) based on two different matrices of genetic distances, Nei's genetic distance (Nei, 1987) and Cavalli-Sforza's chord distance (Cavalli-Sforza and Edwards, 1967), using Phylip (Felsenstein, 2005). We assessed the confidence of the trees in two ways. First, we built 20 different trees based on the 20 different subsets of loci. Second, we bootstrapped the data (for the full set of loci) a thousand times.

Results

General levels of diversity

Over the whole microsatellite data set, 190 alleles were scored and the average number of alleles detected per locus was 9.63 (range 4–22). The average over loci of total gene diversity (estimated for the whole data set) was 0.67 (range 0.38–0.87). When restricted within Europe, the per-locus statistics were similar (9.23 and 0.65, respectively). The level of polymorphism was highly variable between loci (Table 1).

The observed size classes were rarely concordant with the repeat type described in A. thaliana or in A. lyrata. Many loci displayed single base-pair differences among alleles, although we would expect 2 or 3 base pair differences. These observed size differences were repeatable when similar samples were amplified or loaded twice, and were confirmed by the sequence information when available (see below).

Sequence variation at microsatellite loci

The general information gathered through the sequence data are summarized in Table 1, and three representative alignments are shown in Figure 2. When it was possible to compare A. thaliana and A. lyrata sequences, large differences were observed. In addition to base substitutions and indels, the stretch of repeats was often interrupted (compound microsatellite) or absent in A. lyrata (Figure 2, locus AthDET1). Moreover, the number of repeats was usually lower in A. lyrata than in thaliana. The alleles often differed by point mutations or indels outside the main repeat region. Thus, mutation at these loci in A. lyrata does not conform to mutation models invoked for microsatellites (for example, Two Phase Model—TPM—or SMM).

Figure 2
figure 2

(a) Sequence alignment for three representative microsatellite loci. Only the central part of the loci is shown. Nucleotide substitutions and insertions between sequences are highlighted in bold. The size obtained through genotyping is indicated for each allele, as well as the population of origin. (1) AthDET1: compound microsatellite showing evidence of homoplasy. The variation between alleles is due both to indels and point mutation. Homoplasy is observed for the allele size 129. (2) F20D22: perfect microsatellite with other sources of variation than the repeat motif. Homoplasy probably occurs due to point mutations. A single-base insertion gives rise to two kind of size classes (see distribution). Nucleotide substitution within size classes suggests that homoplasy exists even if not directly observed. (3) ELF3: compound microsatellite where the main source of variation is due to variation in repeat number in two distinct stretches of repeats. Homoplasy is observed for the allele size 306. (b) Allele size distribution the locus F20D22

From our sequences, we observed that both nucleotide substitutions and indels outside the repeat region created alleles with identical sizes, but differing at the sequence level (for example, homoplasy; Figure 2, AthDET1 and ELF3). Even if not directly observed, this homoplasy could also be deduced from the allele size distribution (F20D22).

Linkage disequilibrium

Among 1208 tests for genotypic disequilibrium between pairs of loci, 80 (6.6%). were significant at the level of P<0.05, 45 of which concerning population Bohemia. Overall, only one test was significant when applying Bonferroni correction, and seven tests when applying a more liberal Bonferroni threshold (Table 2). These significant tests involved different pairs of loci and only two involved loci that are genetically linked in A. lyrata (Table 2). Among loci for which linkage group was known in A. lyrata, 867 comparisons were possible: 6.4% were significant among independent loci and 7.7% were significant among linked loci. Linkage only played a small role in the actual disequilibrium and, except in population Bohemia, the loci are largely independent.

Table 2 Summary of genotypic disequilibrium

Taking into account these results, and the available linkage information, we constructed 20 subsets of 14 or 15 independent loci; these subsets contained loci separated by more than 30 cM, and not showing any significant genotypic disequilibrium with each other (after the liberal Bonferroni correction) in any population.

Variation in the levels of diversity across Europe

The levels of microsatellite diversity were highly variable across populations (Tables 3 and 4). Within Europe, we observed lower diversity statistics in the North, with significant differences between populations from Central Europe (average He=0.57) and Northern populations (average He=0.28; Table 4). The population diversity in Mayodan was among the lowest (He=0.21).

Table 3 Diversity statistics per microsatellite locus and per population
Table 4 Summary statistics of within population microsatellite diversity

Within Europe, alleles detected in the North were often a subset of those observed in the South. The populations from Central Europe (Plech and Bohemia) had the highest private allelic richness and ratio of private allelic richness to total allelic richness. The Scandinavian populations showed the lowest values of these statistics. Interestingly, among Northern European populations, Karhumäki showed the highest ratio of private allelic richness to allelic richness.

Among 138 tests of Hardy–Weinberg Equilibrium, 23 were significant, revealing an excess of homozygotes. Only three remained significant after Bonferroni correction: one in Bohemia (F19K23) and two in Plech (F19G10 and ICE3). Over all loci, FIS values were significant in the populations Bohemia, Plech, Reykjavik and Mayodan. The same significance results were obtained in all 20 subsets of loci (Table 4).

All tests of mutation-drift equilibrium (BOTTLENECK) detected highly significant heterozygosity excess within Bohemia, on the whole data set as well as on the 20 subsets, and using the IAM as well as the SMM model. In Mayodan and Reykjavik, no deviation from drift-mutation equilibrium was detected, and the T2 statistic was negative or positive depending on the subset of loci. In the other populations, the results were less consistent; under IAM, significant heterozygosity excess was detected in Lom, Stubbsand and Plech in the whole data set. In the subsets of the data, only five subsets were significant for Lom, and none for the others. Overall in these populations, there were always more loci with heterozygosity excess. Accordingly positive T2 values were observed (Table 4) and tests were often marginally significant. Under SMM, no test was significant, and T2 was most often negative.

Population differentiation and relationship with diversity levels

Across populations, FST was highly significant (FST=0.458. Average over loci, P<0.0001) and highly variable across loci. Among subsets of independent loci, the average FST was 0.471. Among European populations only, FST value was equal to 0.42.

The overall FST values for individual loci were significantly negatively correlated with the average within population gene diversity (HS; Figure 3, r=−0.719, P=0.0002). One locus, ICE2, did not follow this relationship and had a lower FST value than expected based on its diversity. When FST values where standardized to the maximum value they can reach given HS, the correlation with HS disappeared (FST′; Figure 3).

Figure 3
figure 3

Plot of among-population FST and FST′ measures against average within-population heterozygosity HS for individual loci.

Pairwise FST values and population relationships

All populations of the data set were significantly differentiated from each other, with highly variable FST estimates over loci (Table 5). Again a negative relationship was detected between average within-population diversity and FST values (Figure 4). Some pairs of populations did not follow this relationship: the comparisons between populations belonging to the same region (that is, Scandinavian populations, all showing a low level of diversity), and the comparisons between the population Reykjavik and the Scandinavian populations (Figure 4). The negative relationship was still present when using the standardized values of FST (FST′; Figure 4), and for other genetic distances (M Nei and LL Cavalli-Sforza, not shown).

Table 5 Matrix of pairwise genetic distances between populations
Figure 4
figure 4

Plot of pairwise FST and FST′ values against average within-population heterozygosity HS. All plotted values are average over estimates obtained in the 20 subsets of independent loci.

Population relationships are depicted in Figure 5. For both matrices of genetic distances, the same topology was obtained for all 20 subsets of loci, but with variable branch lengths. These trees showed that Scandinavian populations were closely related to each other and to some extent to Reykjavik. The Russian population Karhumäki was more closely related to the North American population, than to Scandinavian populations. The same topology was obtained for trees built based on pairwise FST or FST′ values.

Figure 5
figure 5

Neighbor-joining trees based on the mean Nei's genetic distance (a) and on the mean Cavalli-Sforza's genetic distance (b), estimated for 20 subsets of loci. Percentages on each branch indicate the proportion of bootstrap replicates in which the two sets separated by that branch appear.

Discussion

Genome-wide and worldwide pattern of diversity in A. lyrata

We analyzed the genetic diversity and structure of A. lyrata in a broad-scale sample, based on microsatellite polymorphism. We found wide differences in the levels of diversity among populations and among loci, as well as a high divergence between populations. This general pattern extends and explains partial and sometimes conflicting results previously reported (for example, extent of the difference of diversity between ssp. lyrata and petraea, Ramos-Onsins et al., 2004; Balana-Alcaides et al., 2006), and shows that in this species, a wide sample is necessary to adequately describe the genetic structure.

Within Europe, intrapopulation diversity decreased from Central Europe toward Northern Europe and populations were strongly differentiated. A recent microsatellite study had found a decreased diversity in Iceland, relative to Central Europe (Clauss and Mitchell-Olds, 2006), and had estimated similar HE and FST values than ours within Central Europe. In Sweden, an isozyme-based analysis of population structure (Kärkkäinen et al., 2004) resulted in FST values similar to ours between Scandinavian populations. Averaging on a large number of different loci thus leads to robust conclusions, despite the wide variability among loci.

At the subspecies level, we also found a high divergence between A.l. lyrata and A.l. petraea. Diversity in the A.l. lyrata population analyzed for microsatellites (Mayodan) was as low as in the less diverse European populations. However, earlier studies (Van Treuren et al., 1997; Balana-Alcaides et al., 2006) suggest that North American populations have also widely variable levels of polymorphism. Therefore, general conclusions cannot be drawn on comparisons between A.l. ssp. lyrata and A.l. ssp. petraea before a broad survey of diversity across North America is conducted.

Evolutionary history of A. lyrata: postglacial colonization in Europe and worldwide differentiation

The patterns of diversity and differentiation detected within Europe suggest that Northern European populations probably derive from two distinct refugia. First, A. lyrata ssp. petraea may have persisted in Central Europe during the Last Glacial Maximum (Clauss and Mitchell-Olds, 2006). Expansion from this refugium would have given rise to the Iceland and Scandinavian populations, as shown by decreasing levels of diversity from South to North and patterns of pairwise differentiation. In contrast, the Russian population may originate from the Eastern part of the distribution of ssp. petraea (Siberia; Jonsell et al., 1995). This hypothesis is supported by high pairwise FST values between Karhumäki and the other European populations and by a high relative amount of private microsatellite alleles. Karhumäki had already been shown to differ from other European populations, with estimates of population differentiation at markers or sequences (Van Treuren et al., 1997; Balana-Alcaides et al., 2006), leaf morphological variation (Jonsell et al., 1995) or segregation distortion in crosses with a Swedish population (Kuittinen et al., 2004).

The founder effects induced by the postglacial colonization from Central Europe are very strong, with a twofold difference in diversity statistics. For comparison, in tree species, the magnitude of variation of within-population diversity is lower (He varies between 0.85 and 0.7 for microsatellite loci in common ash; Heuertz et al., 2004). For Dryas octopetala, an Arctic–Alpine species widely present in Northern habitats, the diversity is even reduced in Southern compared to Northern populations (Skrede et al., 2006). The fate of genetic diversity during colonization indeed strongly depends on the life-history traits of the species (Austerlitz et al., 2000), and on the fate of the different routes of colonization from the refugia (Taberlet et al., 1998). Two routes of colonization of Fennoscandinavia have been recognized; in some cases, they have met, resulting in admixture, as in Northern Sweden for Cerastium alpinum (Nyberg Berglund and Westerbergh, 2001), which can increase local population diversity. For A.l. ssp. petraea, our results suggest that two separate routes of recolonization exist but they have remained distinct. Moreover, the species has a fragmented distribution, with populations distant from each other (Figure 1): A. lyrata has probably disappeared from the regions between Central Europe and its present locations in Northern Europe. Its seeds are only dispersed by gravity (Clauss and Mitchell-Olds, 2006). All these characteristics result in a low level of gene flow, increasing genetic drift between populations, and preventing the genetic homogenization between regions and populations. This contrasts for instance with the high levels of pollen flow and continuous populations described for trees (Austerlitz et al., 2000) or D. octopetala (Skrede et al., 2006).

Overall, only weak evidence of bottlenecks was detected in the North. As the species appears to be a pioneer cold-tolerant species, and as it persisted in Central Europe during the Ice Ages, colonization of Northern Europe may have occurred rapidly after the retreat of the ice. The excess of heterozygosity induced by bottlenecks and founder effects is detectable only in a window of time of 0.5 to 5 times the effective population size of the bottlenecked population (Cornuet and Luikart, 1996). Consequently, for rapid recolonization and small population sizes, the microsatellite markers may have had time to recover a pattern resembling drift-mutation equilibrium.

At the whole species level, we showed a strong differentiation between three groups, North America, Russia and Western Europe, but the first two regions are only represented by one population each. The genealogical relationships between the three subspecies are not well resolved (Al-Shehbaz and O’Kane, 2002; Beck et al., 2007) and no information is available on the population structure of ssp. kamchatica and on ssp. petraea in the Eastern part of its distribution. The subspecies ssp. lyrata could be directly derived from Western European populations, through Trans-Atlantic dispersal (Abbott and Brochmann, 2003). It could also represent the extreme part of a worldwide distribution across Siberia and Alaska. These contrasting hypotheses have been debated for many circumboreal species (Abbott and Brochmann, 2003). Distinguishing them, as well as understanding the consequences of climatic oscillations within each region requires sampling extensively in Northwestern America, Alaska, Eastern Asia and Russia.

We observed or inferred homoplasy in the microsatellite polymorphism, but we estimated statistics designed for markers mutating according to the IAM. Homoplasy generally results in lower power to detect population differentiation and to underestimation of genetic distances (Queney et al., 2001). Here, it would be conservative and not alter our general conclusions. However, its impact is probably stronger in comparison between continents and other kinds of markers, such as nucleotide sequences have to be used for a more precise description of genetic and genealogic relationships at the species level (Queney et al., 2001).

Local population diversity and structure

Northern European populations are at Hardy–Weinberg equilibrium, but significant positive FIS values have been estimated in Central Europe and in North America. Similar results were obtained by Balana-Alcaides et al. (2006) on two genes. A. lyrata is strictly self-incompatible even if a breakdown of the self-incompatibility system is observed in some populations (for example, Mable et al., 2005). Gene flow can however be restricted to short distances within population (Schierup et al., 2006), and high differentiation has been detected even within the continuous distribution of the species in Iceland (Schierup, 1998). Population substructure may therefore explain significant FIS values. Indeed, in Plech, Clauss and Mitchell-Olds (2006) estimated an FST of 0.092 between different patches of the population, a value identical to the FIS we estimated for this population (where the location of the individuals is unknown). The Bohemia sample is from laboratory propagation, and it serves us only as a reference for allelic composition, without further demographic interpretation. Finally, within Mayodan, probably null alleles at two loci are mainly responsible for the significant deviation from Hardy–Weinberg equilibrium, (ELF3 and nga 248).

Markers with different mutation rates and populations with different sizes: proceed with caution!

The microsatellite loci surveyed showed highly different diversity levels, and thus probably had different mutation rates. A strong negative correlation was detected between average intrapopulation diversity (HS) and overall population differentiation estimated for individual loci (Figure 3). This relationship has been investigated by Hedrick (1999, 2005); from Hedrick's graph (Figure 2 in Hedrick, 2005), we can see that in a model of divergence between populations, the negative correlation between FST and HS increases with the time of divergence. Hedrick showed that FST cannot be higher than a maximum value dependent on HS and proposed a standardized value (FST′) where FST is divided by this maximum value. However, FST′ is not independent of HS: in the same model of population divergence, there is a positive correlation between FST′ and HS; this correlation decreases with the time of divergence and progressively disappeared as FST reaches its asymptotic value (Hedrick, 2005). In our case, the negative correlation was strong between overall FST and HS, and absent between FST′ and HS (Figure 3), suggesting that the divergence between the whole set of populations is so old that the asymptotic FST values are almost reached. When calculated among Scandinavian populations, which have probably diverged more recently, the per-locus FST values are not correlated with HS (not shown).

This effect of diversity on FST estimates is more complex when considering pairwise population comparisons. Indeed, populations of A. lyrata have highly different levels of diversity, possibly reflecting different population sizes. We observed a negative correlation between pairwise FST among loci and average HS in the compared populations (Figure 4). This relationship can be explained both by the inherent effect of HS on FST (see above) and by the fact that genetic drift is stronger in smaller populations. Moreover, it is also observed for the standardized measure proposed by Hedrick; this is expected as pairs of populations with smaller effective sizes reach their asymptotic FST value faster. These confounding effects of population size and time of divergence raise the question of whether our different pairwise FST estimates can be interpreted with respect to different migration rates or divergence times (Figure 5) or only in terms of different population sizes. Recall that we also observed a negative correlation of other measures of genetic distance with HS. Additionally, Scandinavian populations did not follow the common correlation, and were clearly less differentiated from each other than from the Russian population (which had a similar low level of diversity). This suggests that Karhumäki's special status is not only due to a small population size.

Implication for the study of local adaptation in A. lyrata

Detecting the loci and nucleotide sites responsible for adaptive variation relies on detecting deviations from the standard neutral model (Wright and Gaut, 2005), or on detecting deviations from a genome wide pattern of polymorphism (for example, Beaumont and Balding, 2004). The demographic history of A. lyrata alone has given rise to large reductions in variation and high divergence, as shown by the genome-wide pattern detected from microsatellites in this study. It is possible that population structure effects may even be important at the within population level (Clauss and Mitchell-Olds, 2006; Schierup et al., 2006). Thus, any tests for the adaptive significance of nucleotide variation will need to be based on deviation from this pattern. The results also imply that any association tests will need to take this population structure into account.