Introduction

Arabidopsis thaliana (L.) Heynh is an annual, weedy and mostly autogamous species that is native to Europe and central Asia and now naturalized worldwide (Al-Shehbaz and O’Kane, 2002). Arabidopsis thaliana (A. thaliana) has been adopted as a model organism for establishing an in-depth understanding of plant biology. The use of this species as a model organism has become increasingly important since the elucidation of its genome sequence (The Arabidopsis Genome Initiative, 2000). In addition to a few well-known ecotypes such as Colombia and Landsberg, which have been cultivated in laboratory conditions for plant physiological, developmental, molecular and functional genomic studies, more and more naturally distributed populations (or ecotypes) have been collected and characterized for their genetic variation (King et al., 1993; Innan et al., 1997; Kuittinen et al., 1997; Ullrich et al., 1997; Bergelson et al., 1998; Loridon et al., 1998; Vos, 1998; Breyne et al., 1999; Miyashita et al., 1999; Erschadi et al., 2000; Sharbel et al., 2000; Barth et al., 2002; Hoffmann et al., 2003; Jorgensen and Mauricio, 2004; Nordborg et al., 2005; Stenoien et al., 2005; Schmid et al., 2006). Understanding the amount and distribution of genetic variation between and among populations is not only crucial for ecological and evolutionary studies, but also serves as a base for functional genomic studies. More than 300 accessions collected from different locations around the world are available through Arabidopsis stock centers, such as the Nottingham Arabidopsis Stock Center (NASC) and the Arabidopsis Biological Resource Center (ABRC), which have become an important resource for the Arabidopsis-research community. The analysis of variants found in nature has proven to be very successful in gaining insight into the control of important processes in plants (Koornneef et al., 2004; Mitchell-Olds and Schmitt, 2006).

Although A. thaliana is widely distributed in China, as recorded in the Chinese flora (Cheo et al., 1987, 2001), no accession from China is available in any of the international Arabidopsis stock centers. In China, the herbarium information on this species is fragmental, and the genetic background of the naturally distributed A. thaliana is largely unknown. Therefore, it is important to examine the genetic differentiation among the populations in natural habitats in China to construct a broader picture of the genetic diversity of this species around the world. In this study, we collected plant samples from 21 populations of A. thaliana from three provinces in East China and one province in Northwest China, and used random amplified polymorphic DNA (RAPD) and inter-simple sequence repeat (ISSR) markers to investigate the genetic diversity in these populations. The correlation of the geographic distance and genetic distance among these populations was also analyzed.

Materials and methods

Plant materials

All naturally distributed populations used in this study were collected by the authors. Nineteen populations were collected from three provinces, Anhui, Jiangxi and Zhejiang in East China, and two populations from Xinjiang Province in Northwest China (Figure 1). The Columbia (Col) ecotype was used for comparison. The locations and habitat descriptions for the naturally distributed populations are listed in Table 1, and each population is denoted by five letters: the first two uppercase letters representing the province followed by three lowercase letters representing the city or county where the population was collected. For example, AHngs represents the population collected in Ningguo City of Anhui Province. For most populations, at least 30 individual plants with matured seeds per population were collected. The whole plants were collected and stored individually in plastic bags with silica gel for immediate drying. However, for some small populations such as the two populations obtained from Xinjiang Province, which had less than 30 individuals per population, only 14–20 individuals were collected. One voucher specimen was collected for each population. Each individual plant was numbered and propagated in a growth room (16 h light, 8 h dark at 21±2°C).

Figure 1
figure 1

Distribution map of the populations of Arabidopsis thaliana used in this study.

Table 1 Naturally distributed populations collected in China

DNA isolation

The total DNA was isolated from the rosette leaves of the silica-dried materials (for ISSR analysis) or from the plants propagated in a greenhouse (for RAPD analysis). For the fresh materials, the CTAB method was adopted (Qu et al., 2000); while for the silica-dried materials, the plant materials were pretreated with 1.5% PVP and proteinase K, and then treated by the CTAB method. For each population (including Col ecotype), the DNA was isolated from 14 to 30 individuals separately.

ISSR and RAPD analysis

Twelve ISSR primers were chosen from those primers previously used by Barth et al. (2002), while another five were designed based on the Arabidopsis genome sequence (The Arabidopsis Genome Initiative, 2000). In total, 17 primers were used for a two-step primer screening process. For the first step, the total DNA obtained from Col ecotype was used to identify the primers that produced clear and reproducible bands. For the second step, six individuals from each population were randomly selected for RAPD analysis with primers selected in the first step. Following the two-step primer screening, 13 primers with clear and reproducible bands were chosen for the amplification reactions for all individuals in the present study (Table 2). PCR amplification was carried out in 25 μl with 20 mM Tris–HCl, 10 mM (NH4)2SO4, 10 mM KCl, 0.1% Triton X-100 (pH 8.8), 2 mM MgCl2, 200 μ M each of dATP, dGTP, dCTP and dTTP, 0.4 pmol of primer, 0.8 U Taq DNA polymerase (Dingguo Biotech, Beijing, China) and 20 ng template DNA. Thermal cycling (Thermocycler PTC-225, MJ Research, Watertown) started with 5 min at 94°C, and 45 cycles of 1 min at 94°C, 45 s at 50°C and 2 min at 72°C and ended by an extension for 7 min at 72°C. The amplification products were separated on 1.4% agarose gels in TAE buffer for about 4 h (5–7 V/cm), stained with ethidium bromide and documented using an AmpGene Gel 100 system. The 100 bp ladder marker (Promega (Beijing) Biotech Co. Ltd, China) and λDNA/EcoRI+HindIII marker were used as size standards.

Table 2 The ISSR primers and the polymorphism of their PCR products

For RAPD analysis, 153 primers (S2001S2100, S201220, S401S420, S480S500, Sangon Co. Ltd, Shanghai, China) were screened using the same procedure as in the ISSR analysis. Eleven primers with clear and reproducible bands were selected (Table 3). The procedure of PCR amplification and its product documentation for RAPD analysis were the same as that used in the ISSR analysis with only a slight modification in the thermal cycles: a hot start for 7 min at 94°C, and 45 cycles of 45 s at 94°C, 1 min at 36°C and 2 min at 72°C and ended by an extension of 7 min at 72°C.

Table 3 The RAPD primers and the polymorphism of their PCR products

Data analysis

The amplified RAPD and ISSR fragments were scored for the presence or absence of bands (1=present, 0=absent). Only clear and reproducible bands were scored. Each of the bands (DNA fragments) was considered a single, unique locus with Mendelian segregation. A locus was considered polymorphic if the relevant band was present in one or more, but not all, individuals of the population. A two-dimensional matrix was generated for each of the two molecular marker systems. The percentage of polymorphic bands (PPB) was calculated from the matrix by the number of PBs/total number of bands × 100%. The Shannon’s Information Index (Lewontin, 1972), I=1−Σpilnpi, was calculated for the genetic diversity in natural populations, where pi is the frequency of the ith band. Nei (1973), measurements of genetic diversity among natural populations were also calculated, including the total genetic diversity (that is, expected heterozygosity) (HT), mean genetic diversity within populations (HS), and the proportion of genetic diversity occurring among populations, GST=(HTHS)/HT (Nei, 1973). All of these genetic diversity parameters were estimated using POPGENE version 1.32 (Yeh et al., 1999). The 21 natural populations were divided into four geographic groups: the northwestern group (two Xinjiang populations in the most northwestern part of China), which is geographically far from the other populations; the northern group (three populations north of the Yangtze River in the Anhui Province), southwestern group (four populations south of the Yangtze River in the Jiangxi Province), and southern group (the remaining populations south of the Yangtze River in the Anhui and Zhejiang Provinces). The southwestern group is separated from the southern group by Poyang Lake, the largest freshwater lake in China. The genetic variation among the natural populations was also estimated from the analysis of molecular variance (AMOVA) (Excoffier et al., 1992), in which the total genetic variance was partitioned into ‘among population’ and ‘within population’ components. The Hillis distance was calculated between each pair of individuals (Hillis, 1984). The genetic relationships among the natural populations were estimated using the unweighted pair-group method with arithmetic averages (UPGMA). Cophenetic values (rcp) based on the results of the UPGMA cluster analysis were calculated as a measure of the quality of clustering (Rohlf, 1982). The Mantel test (Mantel, 1967) was used to test the correspondence between RAPD and ISSR marker-based similarity matrices (Lapointe and Legendre, 1992), and to test the correlation between the geographical distance and genetic distance (Sharbel et al., 2000). The geographic distance was defined by GIS, and was calculated using the World Book Atlas module in IBM Book MAC Edition (version 2004). These calculations were carried out using NTSYS-pc version 2.11Q (Rohlf, 2002).

Results

Thirteen ISSR primers produced 172 scorable bands in 566 individuals of 21 natural populations and Col ecotype, with 13.24 bands per primer in average ranging from 8 (by primer (AC)8) to 18 bands (by primer (GCT)4(CT)). Among the total number of bands, 95.93% (165) were polymorphic (Table 2), and 6.98% (12) were not significantly different among populations (P>0.05) while 85.47% (147) were significantly different (P<0.001). For the RAPD analysis, 11 primers yielded 165 reliable bands in the 560 individuals of 21 natural populations and Col ecotype, with 15 bands per primer in average ranging from 10 (by primer S2002) to 18 bands (by primers S97, S2006 and S2083). Of the bands, 98.18% (162) were polymorphic (Table 3) and 7.78% (13) were not significantly different among populations (P>0.05) while 83.63% (138) were significantly different (P<0.001). In both cases, more than 50% of the polymorphic fragments occurred at frequencies less than 0.10. For the ISSR markers, the mean percentage of PPB within natural populations was 35.94%, ranging from 18.60% (JXnfx) to 56.40% (AHyix) (Table 4). Geographically, the four southwestern populations were found to have low PPBs, while the populations in Anhui Province had relatively high PPBs in both the north and south groups. Two individuals from JXjgs and JXnfx respectively were determined to have identical banding profiles, but no identical profile was shared by individuals from different populations. For the RAPD markers, the mean PPB was 35.38%, ranging from 21.21% (ZJdys, XJalt and XJqhx) to 53.33% (AHyix) (Table 4). Two populations from the northwest group and one from the south group (ZJdys) have the lowest PPB among the natural populations, while the populations from Anhui province have relatively high PPBs in both the north and south groups. Two individuals from JXnfx, JXjjx and JXxjx, respectively, have identical banding profiles, but no identical profile was shared by individuals from different populations. Compared to the natural populations, the Col ecotype has the lowest PPB, as revealed by both the ISSR and RAPD markers. The mean Shannon's Information index within natural populations was 0.1852, ranging from 0.0966 (JXnfx) to 0.2940 (AHyix) for ISSR markers; and 0.1733, ranging from 0.1109 (ZJdys) to 0.2669 (AHyix) for RAPD markers (Table 4).

Table 4 Genetic diversity of the natural populations from China based on ISSR and RAPD data sets

At population level, the genetic diversity of all populations is 0.4267 as detected by using ISSR markers, and 0.4946 by RAPD markers. At the group level, both ISSR and RAPD markers detected the lowest genetic diversity in the northwest group. However, ISSR markers detected the highest genetic diversity in the southwest group (Gst=0.3372), while RAPD detected the highest diversity in the south group (Gst=0.4768). The AMOVA analysis detected about 54 and 58% of the variation due to the genetic variation among populations based on the ISSR and RAPD data sets, respectively (Table 5). It is interesting to note that at group level, in contrast to population level, less genetic variation was found among populations, as revealed by both ISSR and RAPD markers, except for the southern group where more genetic variation was found among populations as detected by RAPD markers. For example, in the northern group, only about 41 and 39% of variation is found among populations revealed by ISSR and RAPD markers, respectively. Further analysis indicated that this resulted from a relatively high proportion of genetic variation among the groups (Table 5).

Table 5 Genetic variations within and among populations revealed by AMOVA based on ISSR and RAPD data sets

UPGMA dendrograms based on the Hillis distance matrixes generated from the raw data of the ISSR or RAPD markers did not group the populations exactly according to their geographic distribution. However, the four populations of the southwestern group were always clustered together in both ISSR and RAPD dendrograms as a monophyletic group (Figures 2 and 3). Although two of the populations in the northwestern group were always clustered together, it is interesting to note that these two were also clustered with some populations in the southern group. The Col ecotype was the ‘basal group’ in both the ISSR and RAPD dendrograms. The cophenetic values (rcp) of the UPGMA clustering based on the ISSR and RAPD data sets was significantly correlated to the primary data matrixes of ISSR and RAPD, respectively (rcp=0.80793 for ISSR data set and rcp=0.91622 for RAPD data set).

Figure 2
figure 2

Unweighted pair-group method with arithmetic averages clustering of Arabidopsis thaliana based on Hillis distance calculated from inter-simple sequence repeats data set.

Figure 3
figure 3

Unweighted pair-group method with arithmetic averages clustering of Arabidopsis thaliana based on Hillis distance calculated from random amplified polymorphic DNA data set.

The Mantel test on the ISSR data set indicated that the geographical distance was significantly correlated with the genetic distance (rcp=0.58924, P<0.001) when using the complete data set (including Col). When Col or Col and the Xinjiang’s populations were excluded from the data set respectively, the correlations were still significant (rcp=0.37952, P<0.005 and rcp=0.49100, P<0.0001 respectively). Similar correlations were also found for the RAPD data set when the Mantel test was applied (rcp=0.71907, P<0.00001 for complete data set; rcp=0.32985, P<0.01 excluding Col; rcp=0.30710, P<0.02 excluding Col and Xinjiang's populations). The Mantel test also detected a significant correspondence between the genetic similarities based on the ISSR and RAPD data sets (rcp=0.6792, P<0.00001).

Discussion

The distribution and natural habitats of Arabidopsis thaliana in China

A. thaliana is widely distributed in China and has been recorded in Eastern, Central, Northwestern, Western and Southwestern China (Cheo et al., 1987, 2001), covering both temperate and subtropical zones. Although A. thaliana has a wide distribution in China, relatively few herbarium specimens are found in this country. The earliest dated specimens we examined were collected from Yixian County, Anhui Province, in 1910 and deposited in the herbarium of the Institute of Botany, Beijing (PE). This lack of herbarium specimens may partially be attributed to the fact that this species has very short flowering–fruiting times, usually in early spring and/or that it is an inconspicuous weedy species.

The habitats of the natural populations of A. thaliana collected for this study are diverse, such as from along the roadside or from farmland, slopes or abandoned fields, with altitudes ranging from 20 m (AHthx) to 1400 m (XJqhx). The populations are usually distributed in relatively moist areas. The two populations from Xinjiang Province (XJqhx and XJalt) were obtained from the Altai Mountain range. Among the samples collected for this study, the XJqhx and XJalt populations grow at the highest altitude and lowest temperature during the growth season. When the authors collected samples from Qinghe (XJqhx) in early June, it was during a period of snowfall. The phenotype of the plants varies greatly in the fields. For example, the average height of individuals of the northwestern group is 16–18 cm, while that of the southern group is 26–46 cm. When the individuals from different populations were planted in the greenhouse at 22±2°C, those in the northwest group tended to flower earlier than the populations from East China.

These natural populations are usually found in disturbed habitats with strong influences of human activity. It seems that some newly found distributions could be attributed to human-aided dispersal. Since A. thaliana produces numerous tiny seeds, wind could also cause long-distance dispersal (Tackenberg et al., 2003; Jorgensen and Mauricio, 2004). The dynamic distribution pattern of A. thaliana in East China might reflect a history of combination of natural and human-aided dispersal.

Comparison of the results based on RAPD and ISSR data sets

Although there have been many debates on the reproducibility of RAPD markers, many studies have shown that RAPDs are useful molecular markers to detect genetic diversity at the population level in well-controlled experiments (Bartish et al., 2000; Diaz et al., 2001; Reisch et al., 2003; Fontaine et al., 2004; Nybom, 2004). Since the ISSR markers are generated by longer primers (15–24 bp), these were thought to be more stable than the RAPD markers (Yang et al., 1996; Nagaoka and Ogihara, 1997; Parsons et al., 1997; Esselman et al., 1999). In this study, the reproducibility of RAPD and ISSR was assured by a repeated PCR amplification for at least six individuals in each population. Both ISSR and RAPD amplifications produced stable and repeatable fragments by the selected primers. In general, the ISSR and RAPD markers generated similar results on the genetic diversity within and among the natural populations, as revealed by Nei’s measurement (Nei, 1973) of genetic diversity and AMOVA analysis. Moreover, the UPGMA clustering based on these two data sets were significantly correlated to their distance data matrixes respectively, although the rcp value of ISSR (rcp=0.80793) is lower than that of RAPD (rcp=0.91622).

Genetic variation among and within populations

Although extensive studies have been conducted on the genetic diversity of A. thaliana around the world, none of the populations from China have ever been included. This study is focused on the genetic diversity of naturally distributed populations in China. The overall genetic diversity of the 21 natural populations, Gst=0.4946 (RAPD), Gst=0.4267 (ISSR), was approximately the same as that seen in other selfing species with gravity-dispersed seeds, whose average Gst=0.5 (Hamrick and Godt, 1996), less than that of the native North European populations (Fst=0.88; Stenoien et al, 2005) or native populations in France (Fst=0.59; Le Corre, 2005), but significantly greater than that of the North American populations (Gst=0.28) reported by Jorgensen and Mauricio (2004). When the northwestern populations were excluded, the genetic divergence of the 19 East China populations reduced only slightly (data not shown). Although A. thaliana was introduced into East China (Al-Shehbaz and O’Kane, 2002), its overall genetic diversity is much greater than those recently introduced populations in North America, but still less than those native populations in Europe. This finding might indicate that the populations in China have a longer ‘introduction’ history than those in North America, which could also be supported by the fact that no identical banding profile was found between/among individuals of different populations. When the genetic variation is dissected into ‘within’ and ‘among’ population variations, typically, selfing species have low levels of genetic diversity within populations, but a substantial differentiation among populations (Hamrick and Godt, 1996). In previous studies, the hierarchical AMOVA on the genetic variation of the natural population of A. thaliana revealed different patterns for different populations. In most cases, the observed genetic variation was consistent with its lifestyle: less variation existed within populations and more genetic variation was found among populations (Hanfstingl et al., 1994; Bergelson et al., 1998; Breyne et al., 1999; Miyashita et al., 1999; Jorgensen and Mauricio, 2004; Stenoien et al., 2005). For example, Stenoien et al. (2005) found that, on average, only 12% of the genetic variation occurred among individuals within 10 northern European populations of A. thaliana, as screened by microsatellite markers. In other cases, higher genetic variation was found within populations than among populations. For example, Jorgensen and Mauricio (2004) detected that approximately 77% of the genetic variation occurred among individuals within six North American populations, as revealed by AFLP markers; and Bakker et al. (2006) found 56.7% of genetic variation within populations by the sequences of six genes and five microsatellite loci over the species range. In this study, in the hierarchical AMOVA for the 21 Chinese populations, it was found that slightly more genetic variation occurred among populations rather than within populations, as revealed by ISSR (54.7%) and RAPD (58.2%) makers, respectively. When two Xinjiang populations were excluded, the genetic variations within 19 Chinese populations were similar to that within 21 populations (Table 5). Although it is difficult to compare directly the results of the studies mentioned above due to different population size and methodology being adopted, the general trend can be referred. The relatively high amount of genetic variation found among groups of Chinese populations may indicate a geographic structure.

The correlation of genetic and geographic distance

The correlation between the genetic distance of populations and their geographic distance has been discussed extensively in previous studies using various molecular markers in A. thaliana. In most cases, no clear association between geographical origin and genetic similarity was detected in populations distributed in different regions of the world (King et al., 1993; Hanfstingl et al., 1994; Innan et al., 1997; Ullrich et al., 1997; Bergelson et al., 1998; Loridon et al., 1998; Breyne et al., 1999; Miyashita et al., 1999; Erschadi et al., 2000; Jorgensen and Mauricio, 2004; Stenoien et al., 2005; Bakker et al., 2006). However, Sharbel et al. (2000) detected significant, but weak, isolation by distance among populations sampled from the presumed native range of A. thaliana in Eurasia; and Barth et al. (2002) found some Asian accessions clustered separately from the central European plants. A recent study by Nordborg et al. (2005) on the fragments of 480 kb obtained from each of 96 individual genomes revealed a strong population structure, despite the fact that individual populations harbor much of the variation present species–wide. In this study, generally significant correlations between the geographical and genetic distance were detected using the Mantel test for Chinese populations, based on both ISSR and RAPD markers (P<0.005 for ISSR and P<0.01 for RAPD) and for the eastern Chinese populations (P<0.01 for ISSR and P<0.02 for RAPD).

Although A. thaliana is present within China and usually distributed in disturbed habitats strongly influenced by human activities, the correlation between its genetic and geographic distance at population level suggests that some natural dispersal mechanism may also be involved in the distribution of this species. This distribution in China is unlike the population distributions in North America, where the populations were possibly originated very recent from a mixed origin (Jorgensen and Mauricio, 2004). However, it is impossible to trace the origin of the Chinese populations only based on the ISSR or RAPD data of these limited populations. Data on precise sequences from more populations worldwide will be needed to explore the origin of Chinese populations and their phylogenetic relationships with those from other parts of the world.