Introduction

The short arm of rye (Secale cereale L.) chromosome 1 (1RS), besides of being part of the rye genome, is present in triticale and many hundred wheat cultivars as the 1BL.1RS or 1AL.1RS wheat-rye translocation (Baum and Appels 1991). 1RS carries a variety of agronomically and genetically important genes, such as the self incompatibility locus S (Wricke and Wehling 1985), and genes responsible for resistance against several rust species and powdery mildew (McIntosh 1988). Dependent on the wheat background, genes enhancing yield of 1RS-translocation wheat varieties are assumed to be present on this chromosome arm (Carver and Rayburn 1995; Moreno-Sevilla et al. 1995; Villareal et al. 1998; Lelley et al. 2004). Several genetic maps were constructed for 1RS, mainly based on RFLP, AFLP, RAPD, SSAP but only 9 SSR markers have been developed (Börner and Korzun 1998; Korzun et al. 2001; Ma et al. 2001; Hackauf and Wehling 2002; Nagy and Lelley 2003; Khlestkina et al. 2004).

Microsatellites are arrays of short tandemly repeated sequence motifs (≤6 bp) which are, due to their special mutation mechanism called “replication slippage”, highly polymorphic with respect to the number of repeated units (for reviews see Schlötterer 2000; Ellegren 2004; Buschiazzo and Gemmell 2006). This polymorphism can be utilized making microsatellites to be versatile and easy to use genetic markers. However, development of SSR markers is laborious and costly (Squirrell et al. 2003) especially for polyploid taxa and taxa containing large amounts of repetitive sequences (Temnykh et al. 2001; Squirrell et al. 2003; Tero et al. 2006). For such species functional SSR markers are therefore especially precious.

Considering a general lack of markers for the 1RS arm, we decided to develop a larger set of microsatellite markers for this important part of the rye genome. To target the development of markers specific for 1RS, we employed flow cytometry to obtain pure fractions of 1RS DNA. Chromosome flow sorting has been used previously to dissect complex plant genomes (for recent reviews see Doležel et al. 2004, 2007) including rye (Kubaláková et al. 2003). DNA amplified from flow-sorted 1RS arms was used to construct partial DNA libraries enriched for particular microsatellite motifs, and selected DNA clones were sequenced. As an alternative strategy, BAC-end-sequences derived from a 1RS specific BAC library (Šimková et al. 2008) were used to develop microsatellite markers. In total 74 primer pairs producing 76 polymorphic loci on 1RS were developed and physically mapped to three 1RS bins. The results of this study demonstrate that flow-sorted chromosomes can be used for the development of microsatellite markers for specific chromosomes and chromosome arms. The newly developed markers will aid cloning agronomically and genetically important genes, anchoring fingerprinted BAC-contigs to the genetic map of 1RS and support marker assisted selection (MAS). Additionally, we demonstrate, using the sequence information of the complete rice genome, that the intra-genomic length variation of microsatellites correlates with polymorphism and can therefore be used as a measure to assess the usefulness of specific microsatellite motifs for SSR-marker development. Based on the assumption that a high intra-genomic length variation also entails a high polymorphism, we especially recommend the microsatellite motifs AG, AAG, AC, AAC, AT and AAT for marker development in cereals.

Materials and methods

Plant material

Wheat (Triticum aestivum cv. Chinese Spring: CS) ditelosomic addition lines carrying the short arm of rye chromosome 1 (21”+1”:CS+1RS”) were used for flow sorting of the 1RS arm. Newly identified SSR-markers were tested using genomic DNA of wheat (CS), rye (Secale cereale cv. Imperial: I) and the wheat-rye addition line (CS+1RS”). Physical (bin) mapping was carried out using DNA of 1RS deletion lines. These deletions were produced by Friebe et al. (2000) in the short arm of chromosome 1R of a Chinese Spring (CS)/Imperial (I) addition line. Deletion line S-6 does not carry the satellite and the NOR region of 1RS whereas the deletion line S-4 has lost approximately four-fifth of the distal region of 1RS (Friebe et al. 2000). Testing for SSR-marker polymorphism was done by using 10 diverse rye cultivars (Imperial, Dankowski, Jingzhose, EcoH Kurz, Purga, Canadian rye, Schlögler, Salzmünde, Turkish rye, Voronesh). Genomic DNA was isolated from the leafs of the plant material using the cetyltrimethylammonium bromide (CTAB) extraction method according to a protocol of Hoisington et al. (1994).

Sequences

From a 1RS-specific BAC library (Šimková et al. 2008), 2,778 BAC-end-sequences (BES) were obtained (Bartoš, in preparation). Based on similarity to the short arm of wheat homoeologous group 1 chromosomes A, B and D (http://wheat.pw.usda.gov/NSF/curator/wheat_est_index.html), 342 rye ESTs were identified with putative location on 1RS. Whole BAC sequences for Triticum aestivum and Hordeum vulgare were obtained from the NCBI home page (http://www.ncbi.nlm.nih.gov, GenBank accesion numbers can be found in supplementary file S1). The BAC sequences of Zea mays and the whole genome of Oryza sativa was downloaded form the TIGR ftp server (ftp://ftp.tigr.org/pub/data/). Detailed information about the sequences used in this work can be found in Table 1.

Table 1 Total length (kbp) and number of sequences used in this work

Flow sorting and DNA amplification

Root tips of ditelosomic wheat rye addition line 1RS (CS+1RS”) were enriched for mitotic metaphases and used to prepare suspensions of intact chromosomes according to Vrána et al. (2000). The chromosome suspensions were stained with DAPI (4′,6-diamino-2-phenylindole) and analyzed using a FACSVantage SE flow cytometer and sorter (Becton Dickinson, San José, USA). Aliquots of 30,000 1RS arms were sorted into 50 μl of deionized water, proteinase treated and purified as described in Šimková et al. (2007). The obtained DNA was amplified using GenomiPhi DNA Amplification Kit (GE Healthcare, Chalfont St Giles, UK) yielding 3–4 μg of DNA (Šimková et al. 2007).

Microsatellite marker isolation

The microsatellite isolation procedure used in this work was based on the protocols of Connell et al. (1998) and Kumpatla et al. (2004). SSR enriched clone libraries were constructed for nucleotide motifs: AC, AG, AAC, AAG. Three microgram of 1RS-specific DNA were mixed with glycerol (50%) and nebulized (Invitrogen Corp., CA., USA) in an ice-cold water bath for 135 s with 33 psi (2.3 bar). This process yielded DNA fragments within a size range of 300–600 bp. After twofold ethanol precipitation, blunt ended fragments were produced with Mung Bean Nuclease (MBNase) (New England Biolabs, MA, USA) according manufacturer’s instructions. The MBNase was removed with Millipore Micropure-EZ Enzyme Remover (Millipore, MA, USA) spin columns. Blunt ended adaptors were ligated to the end-polished DNA fragments. The adaptors were prepared by mixing equal amounts of AP-11 oligo (5′-CTC TTG CTT AGA TCT GGA CTA-3′) and AP-12 oligo (5′-TAG TCC AGA TCT AAG CAA GAG CAddC-3′), heating them to 95°C for 5 min and slowly cooling them down to room temperature. The ddC nucleotide was added to the 3′-end of the AP-12 oligo to avoid self ligation of the adaptors. The adaptor ligated fragments were PCR amplified with the AP-11 oligo acting as primer. Biotinilated oligos [(GA)13, (CA)13, (GAA)8, (CAA)8] were hybridized to fragments containing a complementary SSR using a PCR cycler hybridization program [95°C − 5 min, 99 × (70–0.2°C –5 s), 50°C –10 min, 20 × (50–0.5°C –5 s)]. Fragments hybridized to an SSR-oligo were captured with streptavidin coated magnetic beads (Dynal A.S, Oslo, Norway). Residual biotinilated SSR-oligos were removed with Millipore Microcon YM-100 columns (Millipore). Captured fragments were amplified with the AP-11 oligos and PCR products were purified with a Wizard® PCR Clean-Up System (Promega, WI, USA). The amplified fragments were ligated into pSTBlue-1 AccepTor™ Vector and then transformed into NovaBlue Single™ Competent Cells (Novagen, Nottingham, GB, The Netherlands). The Competent Cells were cultured in LB agar with X-gal (70 μg/ml), IPTG (80 μM), Carbenicyllin (50 μg/ml), Tetracyclin (15 μg/ml) for 15–18 h. Colonies containing an insert were transfered to an LB Broth containing Carbenicyllin (50 μg/ml) and Tetracyclin (15 μg/ml). To lyse the bacteria and inactivate DNase, 50 μl of bacteria liquid was incubated at 99°C for 5 min. The inserts were directly amplified from the bacteria lysate with the SP6 (5′-ATT TAG GTG ACA CTA TAG-3′) and T7 (5′-TAA TAC GAC TCA CTA TAG GG-3′) primers. The PCR products were run on a 10% acrylamide gel. Only inserts between 300 and 700 bp were selected for sequencing. To remove residual primers and dNTPs, PCR products were purified prior to sequencing with Exonuclease I (Fermentas Inc., ON, Canada) and Calf Intestine Alkaline Phosphatase (Fermentas) according to Werle et al. (1994). Sequencing was done using DYEnamic ET Dye Terminator Kit (GE Healthcare) according to manufacturer’s instructions and a two step PCR program [35 × (90°C –20 s, 60°C –2 min)]). The sequencing reaction was purified with Sephadex™ G-50 (GE Healthcare) and Multiscreen HTS 96well plates (Millipore) and subsequently run on a MegaBace™ 1000 DNA analysis system (GE Healthcare). Sequence quality was verified with Chromas (http://www.technelysium.com.au/) and high quality sequences were exported into a fasta file. Microsatellite search was done with the software SciRoKo (Kofler et al. 2007), only microsatellites with a length of at least 14 bp were considered for primer pair design. Primer pairs were designed using the Primer3 web-server (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) and default settings, with the exception of the desired fragment size, which was set to 110–170 bp. All sequences were searched against the Triticeae repeat database Release 9 (complete) database (TREP; http://wheat.pw.usda.gov/ITMI/Repeats/) using stand-alone BLAST 2.2.13. Most sequences having a similarity (e ≤ 10−10) to a known transposable element were removed from the primer database and not further tested. Additionally, the BAC-end-sequences and the putative 1RS ESTs have been used for primer design. To achieve 1RS specificity, 1RS ESTs were aligned with homeologous wheat ESTs using ClustalX (Thompson et al. 1997) and PCR primers were preferentially designed for sites differing between wheat and rye.

All nucleotide sequences derived from sequencing of clones form the SSR-enriched libraries were deposited in GenBank (Accession numbers: EU338536–EU339104).

Microsatellite marker testing

All SSR-marker primer pairs (pp) were tested with M13-oligos labeled with a fluorescence dye (5′-dye CCC AGT CAC GAC GTT G-3′, dyes: FAM, HEX synthesized by MWG, Ebersberg, D) using the method described by Schülke (2000). The pp were run on a 12% acrylamide gel and the gels were scanned with the Tyhphoon Trio (GE Healthcare) variable mode imager.

Initially, all pp where tested whether they amplify a discrete band using the DNA of the CS-1RS addition line with three different “Touchdown”-PCR programs (supplementary file S3) having varying annealing temperatures, in order to increase specificity. In a second step, the pp producing a discrete band in the CS-1RS addition line, were tested whether they amplify a rye specific band with wheat (CS) and rye (I) DNA and three PCR programs described above. The pp amplifying a rye specific band, were tested for polymorphism and physically (bin) mapped. Polymorphism testing was done with ten different rye cultivars. Physical or bin mapping was done with the CS/I deletion lines. For this two tests only the one “Touchdown”-PCR program was used which turned out as to be the most efficient for the specific pp. The testing for rye specificity and physical mapping was done in triplicate.

Statistics

The overall microsatellite statistics was calculated with SciRoKo 3.3 (Kofler et al. 2007) using default settings. For microsatellite statistics all microsatellite motifs were fully standardized according to Kofler et al. (2007), i.e. microsatellites having similar motifs were grouped together, e.g., AG, GA, CT, TC representing one group. The polymorphism information content (PIC) for the individual loci was calculated according to Anderson et al. (1993). For the analysis of microsatellite clusters, a maximum distance of 10 bp between two adjacent microsatellites was used to be considered as clustered.

Results

Microsatellite density in rye and comparision to other cereals

Table 2 shows the density of SSRs per Mbp for the most abundant microsatellite motifs of the 1RS BAC-end-sequences (BES) compared to the densities found in T. aestivum, H. vulgare, O. sativa and Z. mays. The five most abundant motifs of each cereal are shown in bold letters. The overall microsatellite frequency varies strongly between the five cereals, with O. sativa having the highest (314.6 SSR/Mbp) and rye (1RS BES) having the lowest (106.3 SSR/Mbp). Table 2 demonstrates that the microsatellite motif AAG is the most abundant in the 1RS BES followed by the motifs AG, CCG, C, and ATC. In wheat, which is the closest relative of rye, four of the five most frequent rye motifs are the most frequent ones too. The approximately 30% higher microsatellite density in wheat compared to rye can mainly be attributed to the higher densities of the motifs AG, CCG, C and AT. Generally the microsatellite motifs AAG, AG, CCG, C and AT are the most frequent ones in the five investigated cereals.

Table 2 Density (SSR/Mbp) of the most abundant microsatellite motifs found in the 1RS BAC-end-sequences (BES) compared to the four other cereals T. aestivum, H. vulgare, O. sativa and Z. mays

Microsatellite isolation and development

Flow-sorted 1RS arms were obtained with a purity of 86% (Šimková et al. 2008) SSR-enriched libraries with an insert size between 100 and 700 bp were constructed for the microsatellite motifs AG, AAG, AC and AAC. Each library contained between 1,200 and 2,000 clones. In total 1290 clones from the SSR-enriched libraries were sequenced: AG 463, AC 356, AAG 261, AAC 210 (supplementary table S1). Additionally, 2778 BAC-end-sequences (BES) and 342 ESTs were used for microsatellite isolation (supplementary table S1). In the four SSR enriched libraries 603 microsatellites were identified, 216 in BES and 103 in ESTs. Assuming that the microsatellite density in the BES (106 SSR/Mbp) represents the overall microsatellite density of rye, a 18.1 fold enrichment of microsatellites was achieved in the four SSR-enriched libraries (1,914 SSR/Mbp).

Table 3 shows the enrichment efficiencies of the individual SSR-enriched libraries relative to the BES. For individual motifs the enrichment efficiency ranged from 45 to 635. All SSR-enriched libraries were additionally enriched for microsatellite motifs other than the motif of the oligo used for the enrichment. This “co-enrichment” is especially striking in the AAG-library for the motifs AG and AC, in the AG library for the motif AC and in the AAC library for the motif ACC (Table 3). In three of the four SSR-enriched libraries, the highest enrichment was achieved for the microsatelite motif for which the actual enrichment was done. Conversely, in the AAG-library the highest enrichment was achieved for the microsatellite motifs AG and AC.

Table 3 Results of the enrichment of microsatellites in the individual SSR libraries, calculated on the basis of the microsatellite densities in the BES

During the testing of microsatellite primer pairs (pp) a large number of them were “lost” because of several reasons. This “loss” of pp during marker development was termed as “attrition” by Squirrell et al. (2003). Figure 1 demonstrates a comparision of attrition rates of the four SSR-enriched libraries with those of the BES. Detailed information about the attrition rates of the individual SSR-enriched libraries, the BES and ESTs can be found in the supplementary table S1. We identified six main attritions, four of them are common in many published work on SSR marker development. The two additional attritions were due to the requirement of being rye specific in a wheat background, and being specific for 1RS. From 1,290 sequenced clones, 2,778 BES and 324 ESTs containing 922 microsatellites, in total, 724 pp were designed. (1) A BLAST search of all 724 pp sequences against the TREP database revealed for 244 pp-sequences a significant similarity (e ≤ 10−10) to known transposable elements. Most of these pp sequences were removed. From the remaining pp, 478 were synthesized and tested. (2) A discrete band from the DNA of the wheat 1RS ditelosomic addition line (21”+1”) was amplified by 273 pp, which was not necessarily amplified only from 1RS. PCR with the other 205 pp resulted either in a smear, a complex banding pattern or no band at all. (3) From the remaining 273 pp, 213 amplified a rye-specific band not present in wheat. (4, 5) The 213 pp were tested for both, polymorphism with a rye tester set and for 1RS specificity. These two tests showed that, out of them, 119 were 1RS specific and 117 were polymorphic. (6) From these two subsets of pp only 74 amplified both a polymorphic and 1RS-specific band (Fig. 1, Table S1).

Fig. 1
figure 1

Attrition during microsatellite development from the SSR enriched libraries and the BAC end sequences (BES). If not denoted otherwise all values are primer pairs (count). The data for the EST derived SSR markers are not included. The length of the bars is true to scale with respect to the number of designed primer pairs (except the sequenced clones)

Generally it can be seen in Fig. 1 that the attrition rates for the SSR-enriched libraries and BES were at each step very similar. In total 10.1 and 10.0% of all pp amplified a polymorphic and 1RS specific band from the BES and SSR-enriched libraries, respectively.

Microsatellite marker and database

We developed 119 pp which yielded 129 1RS-specific loci (Fig. 2) since some pp amplified more than one discrete band. From these 119 pp, 74 amplified 76 polymorphic loci in a set of rye cultivars. In summary 117 pp amplified 124 polymorphic loci, 48 of those pp could not be assigned to the 1RS chromosome exclusively as they amplified a band of the same size also in wheat. The remaining 76 loci were specific for the 1RS arm in wheat background. From the 74 pp amplifying both polymorphic and 1RS specific bands, 21 were derived from the AG-library, 12 from AC, 14 from AAG, 10 from AAC, 14 from the BES and 3 from the ESTs (Table S1).

Fig. 2
figure 2

The physical or bin location of the 129 1RS specific loci on the 1RS chromosome. The values on the left hand side indicate the total number of SSR markers physically mapped to a certain bin, whereas the values in brackets are the polymorphic ones. The individual SSR markers are shown on the right hand side and the value in brackets after each locus indicates the number of detected alleles. The loci are ordered descending by allele number. The contraction in the bar of 1RS represents the NOR. ‘S-6’ and ‘S-4’ are the breakpoints of the respective deletions

With 1RS deletion lines it was possible to map the 129 1RS-specific loci into one of the three bins on 1RS, in distal (D), in intercalary (I) and in proximal (P) position. We assigned 38 (29), 40 (30), and 51 (17) loci to the D-, I- and P-bin respectively whereas the values in brackets indicate the number of polymorphic loci. On average a polymorphic locus has a PIC of 0.55. Examining the loci of each bin separately, the average PIC for the D-, I- and P-bins are 0.62, 0.56, and 0.45 respectively. Only loci with at least 2 alleles were considered for calculating the PIC. For the individual microsatellite motifs the average PIC is 0.57 for AG, 0.49 for AAG, 0.53 for AC and 0.46 for AAC. Compound motifs were not considered.

All primer pairs and their respective loci were stored in a Microsoft Access database. The database was created in its third normal form, therefore splitting the primer pairs and the amplified loci into separate tables. For convenient data retrieval a number of views (33) were created. These views, for example, allow the retrieval of bin-specific loci or facilitate ordering of primer pairs having a certain quality score, with or without appending a M13-tail to the forward primer. Tables for primer pairs and loci are additionally available in the tab delimited format, which can be copied, e.g., into Microsoft Excel. The database is available as supplementary file S1. Sequential numbers with the prefix ‘TSM’ (Tulln Secale Microsatellite) were assigned to functional primer pairs. According to the ‘Guidelines for Nomenclature of Biochemical Molecular Loci in Wheat and Related Species’ (http://wheat.pw.usda.gov/ggpages/wgc/98/Intro.htm#Intro2) the loci were named with the sequential number of the primer pair and the prefix ‘Xtsm’. Where the chromosomal location on 1RS could be confirmed the postfix ‘−1R’ was added to the loci-name.

Estimating the value of microsatellite motifs for marker development by computing their intra-genomic length variation

In this paper we suggest that the intra-genomic length variation (σ L ) of a microsatellite motif may be related to the level of polymorphism. To test this hypothesis we used the available data for the whole rice genome. Zhang et al. (2007) calculated the degree of polymorphism for all mono-, di- and trinucleotide microsatellite motifs based on in-silico comparision between Japonica and Indica rice. Due to the special role of the mononcleotide microsatellites (see Discussion) the two motifs, A and C, are not included in the regression and marked with the symbol ×. Figure 3 demonstrates that there is a strong logarithmic correlation (r = 0.87) between the intra- and the inter-genomic microsatellite length variation. If the mononucleotide microsatellites were to be included, the correlation would deteriorate to r = 0.54.

Fig. 3
figure 3

Relation between the intra-genomic length variation (σ L ) and the rate of polymorphism (Rate of pm) for all microsatellite motifs published by Zhang et al. (2007). The logarithmic least square fitting method was used to calculate the regression curve. The mononucleotide microsatellite motifs A and C (marked with ×) are not included in the correlation

Table 4 shows σ L for all mono-, di- and trinucleotide microsatellite motifs. Additionally, the mean value of the microsatellite length \((\bar{x}_L)\) and the number of identified microsatellites are included in Table 4. Each identified microsatellite has a length of at least 14 bp. Table 4 demonstrates that in each species, microsatellite motifs with high and low σ L can be identified. It can also be seen that the motifs with a high σ L are nearly identical in all investigated cereals. Based on the hypothesis stated above, microsatellite markers should be preferentially developed for motifs having a high σ L , thus for SSR-marker development in the cereals the motifs AAG, AG, AT, AAT, AC and AAC (bold) should be preferentially used. Mononucleotide microsatellites might be an exception to this hypothesis.

Table 4 Mean value and standard deviation of the microsatellite length for mono-, di- and trinucleotide microsatellite motifs in five cereals. The microsatellite motifs with the highest length variation (σ L ) are shown in bold

Discussion

Microsatellite density in rye compared to other cereals

The overall microsatellite densities (SSR/Mbp) in the grass species in Table 2 shows that O. sativa has the highest (314 SSR/Mbp) and S. cereale, as represented by the 1RS BAC-end-sequences, the lowest (106 SSR/Mbp) microsatellite density. The remaining grasses T. aestivum, H. vulgare and Z. mays all have similar microsatellite densities. Morgante et al. (2002) demonstrated that the microsatellite density is positively correlated with the fraction of single copy DNA in a genome. Since, in plants, the genome size is positively correlated with the amount of repetitive DNA (Feschotte et al. 2002) and therefore the fraction of single copy DNA is negatively correlated with the genome size, rye with a genome size of 7,900 Mbp should have a lower fraction of single copy DNA than most other cereals (e.g., Hordeum vulgare—5,100 Mbp; Zea mays—2,700 Mbp; Oryza sativa—382 Mbp). Based on this assumption and on the theory of Morgante et al. (2002), rye with its relatively low fraction of single copy DNA should also have a low microsatellite density as we indeed found in this study. This finding supports the assumption of Morgante et al. (2002) that microsatellites are preferentially found in regions pre-dating the recent genome size expansion in plants. We note however, that this conclusion is only valid if the 1RS BES are representative for the genome of rye.

Microsatellite marker development

From the 724 microsatellite primer pairs developed in this work, 76 functional loci were obtained. This is an unexpectedly high attrition rate for microsatellite marker development. Since the BAC-end-sequences (BES) represent a random sample of 1RS sequences, which have not been enriched for microsatellites, we assume the attrition rates of the BES can be seen as a nearly neutral sample, with which the attrition rates for the pp derived from the SSR-enriched libraries can be compared. Figure 1 clearly shows that only minor differences exist between the attrition rates of the BES and the SSR-enriched libraries. This suggests that for the high level of attrition in 1RS rather the complexity of the genome than technical shortcomings were responsible. In giemsa stained cytological preparations 1RS shows the most C-heterochromatin of all rye chromosome arms (Baum and Appels 1991), indicating that 1RS may be especially challenging for targeted marker development.

We believe that most of the observed attrition can be attributed to two main reasons. The first concerns the nature of the rye genome consisting of a vast number of transposable elements (Bartoš, in preparation). Our BLAST search of the 724 sequences against the TREP identified 244 hits with known transposable elements. Since so far only a minor fraction of the rye genome has been sequenced, this number is likely to increase in the future. Previous works have already demonstrated negative effects of a high number of transposable elements on microsatellite marker development. Temnykh et al. (2001) found that only 31% of the microsatellites having the motif AT produced useful markers and attributed this to the fact that AT microsatellites in rice are frequently associated with the Micropon TE family. Also, Tero et al. (2006) found that TE have a negative effect on the number of useful SSR-markers and a recent review from Squirrell et al. (2003) suggested that SSR-marker development might be especially difficult for polyploid species and species having a large number of TE like wheat and rye. The data published by Torada et al. (2006) allow a comparison of the attrition rates between wheat and rye. Torada et al. (2006) showed that about 35% of 599 pp derived from wheat genomic SSR-enriched libraries produced no band or a smear, compared to 46% (398 pp to >214 pp) in the present work. This moderately increased attrition rate in our work seems reasonable when considering that the rye genome is larger than the monoploid genome of wheat, and especially 1RS may contain more TE than wheat (see above) and that we tested our primers on 1RS addition lines, with the whole wheat genome in background.

The second reason for the elevated attrition rates is that two additional attritions occur in this work, which are usually not present in microsatellite development procedures. One was the required rye specificity, the second was the required specificity for chromosome arm 1RS. This reduced the number of viable primer pairs by 24% (273 pp → 208 pp) and 42% (208 pp → 120 pp), respectively. Since, rye and wheat diverged only about 8 million years ago (Huang et al. 2002) a close relationship in terms of sequence similarity is to be expected. Additionally, purity of flow-sorted 1RS specific DNA was about 86% (Šimková et al. 2008), thus 14% contamination from wheat DNA has to be anticipated.

We demonstrated that the applied “Dynabead-fishing” procedure yields clone libraries having a considerable enrichment for microsatellites, ranging from 45 to 635 fold (Table 3). We also observed a remarkable “co-enrichment”, that is enrichment for microsatellite motifs other than the motif for which the actual enrichment was done. This co-enrichment may have two reasons. First, it is possible that microsatellites frequently occur in clusters and therefore enrichment for one will automatically entail enrichment for the other microsatellite. Second, it is also possible that the hybridization process of an SSR-oligo probe to the DNA template allows for sequence deviations at the hybridization site.

We therefore investigated the most abundant microsatellite clusters in all available sequences (enriched libraries and BES) and found that microsatellites having the motif AC are frequently found close to microsatellites having the motif AG (AC–AG clusters: 26×) and that AAC microsatellites are often found close to ACC microsatellites (AAC–ACC clusters: 8×). We further calculated for all available sequences, that 10% (26 of 266) and 23% (26 of 110) of all AG and AC microsatellites respectively are part of a AC–AG cluster. For the AAC–ACC cluster these values are 8% (8 of 98) and 27% (8 of 29) for the motifs AAC and ACC, respectively. Obviously the clustering of microsatellites can only partly explain the observed co-enrichment since, for example, the AAG-library is also enriched for the motif AG which never occured close to an AAG microsatellite. Also the extent of the co-enrichment can not be explained through clustered microsatellites alone. Assuming that 10% of all [AG] n microsatellites are part of an AC–AG cluster (see above), the co-enrichment for the motif AC should amount to 39× in the AG-enriched library, whereas we observed an 129× co-enrichment. This demonstrates that inaccurate hybridization of the SSR-oligo probe to the template DNA is the main reason for the observed co-enrichment rather than clustering of microsatellites.

The fact that microsatellite SSR-enrichment procedures also yield a high co-enrichment for other motifs through imperfect hybridization considerably increases the usefulness of SSR-enrichment procedures.

Microsatellite markers for 1RS

We succeeded with the development of 76 microsatellite marker loci specific for 1RS in wheat background using DNA isolated from flow-sorted chromosomes. Although we encountered unexpected high attrition rates during marker testing, we believe that this is not due to inaccurate flow-sorting, rather we suspect the reasons in the structure of the rye genome and in the particular requirements of 1RS specificity (see above). Previous studies demonstrated the usefulness of flow-sorted chromosomes for targeted development of DNA markers (Požárková et al. 2002; Román et al. 2004). The results of this work confirm this findings. Out of the 117 pp amplifying a polymorphic band in rye, 48 amplified a band of the same size in wheat. Preliminary sequence data suggest that the size overlapping is not necessarily an indication of homologous loci in wheat. Mapping of the SSRs in rye is in progress, it will confirm their presence on 1RS. It is interesting that the proximal bin of 1RS contains the most loci (Fig. 2) although this bin has the smallest size of about one fifth of the 1RS arm (Friebe et al. 2000). We believe that some of these loci are therefore not specific for this bin but rather specific for rye, that is they are probably associated with rye-specific repetitive DNA. For example, Katto et al. (2004) reported a PCR based marker specific for rye chromatin, allowing to detect introgression of rye chromosome segments in the wheat genome. However, this will probably not affect the polymorphic loci of the proximal bin, since polymorphisms suggests a single locus which in turn can be unambiguously assigned to a certain bin.

We found that, in this work, the average PIC of the microsatellite marker and thus the average degree of polymorphism increases with the distance from the centromere. The same phenomenon was observed for RFLP markers in Aegilops (Dvorak et al. 1998). Akhunov et al. (2003) and Dvorak et al. (1998) showed that the recombination rate in the wheat genomes increases with the distance from the centromere.

Although it is commonly believed that recombination has only a minor influence on microsatellite evolution (Ellegren 2004) it has clearly been demonstrated that recombination can influence microsatellite expansion and stability (Jakupciak and Wells 1999, 2000; Pearson et al. 2005). Therefore it is possible that the increase in microsatellite PIC with distance from centromere is partly due to increased recombination rate towards the telomere.

So far, nine microsatellite markers are available for the 1RS arm (Korzun et al. 2001; Hackauf and Wehling 2002; Khlestkina et al. 2004). With our work we added another 76 to this markers, thereby increasing the marker density of this important chromosome arm considerably. The newly developed microsatellite markers will aid the anchoring of fingerprinted 1RS BAC-contigs to the genetic map of this chromosome arm and will support the cloning of agronomically and genetically important genes as well as facilitate marker assisted selection (MAS) in rye and in translocation wheat. All markers have been stored in a database for convenient retrieval of marker related information. A number of views has been created to address the most common database requests such as the ordering of primer pairs.

Intra-genomic microsatellite length variation is a measure to predict microsatellite marker polymorphism

The last attrition in developing SSR markers is their lack of polymorphism. To reduce the probability of selecting such SSR containing fragments for primer design a method for predicting the probability of polymorphism in advance would be highly desirable.

To test a hypothesis that length variation of particular microsatellite motifs within a genome can be used to predict the level of polymorphism for these motifs in the species we used the software SciRoKo (Kofler et al. 2007) to analyze the complete sequence of the rice genome.

The microsatellite length variation is expressed as the standard deviation (σ L ) of the microsatellite length mean value, i.e. microsatellite motifs having a large σ L will generally yield marker with a high level of polymorphism, and vice versa. This hypothesis is based on the notion that microsatellites of certain motifs, not varying in length within the genome, are not likely to do so between different individuals of the same species either. Crucial for this hypothesis is that the microsatellite motif has a considerable influence on microsatellite characteristics and polymorphism. This was proposed by Schlötterer and Tautz (1992) who demonstrated that the microsatellite motif had a distinctive influence on “slippage synthesis”. Our own results (Table 4) showed that the average length of microsatellites and σ L varied considerably between different motifs. To verify our hypothesis, large amount of data was evaluated. We selected O. sativa because the whole genome sequence and extensive microsatellite marker polymorphism information are available. A recent genome-wide in-silico survey by Zhang et al. (2007) calculated the microsatellite polymorphism between Japonica and Indica rice. We used the polymorphism data of Zhang et al. (2007), and plotted it against the microsatellite length variation (Fig. 3) and found a strong logarithmic correlation (r = 0.87) between polymorphism rate and σ L . The logarithmic correlation is plausible since the rate of polymorphism can not exceed 100% whereas the length variation has no theoretical upper limit.

According to Zhang et al. (2007) mononucleotide microsatellites are the most polymorphic motifs in the rice genome. This is in conflict to our hypothesis, because mononucleotide microsatellites have a small σ L ([A] n :5.1; [C] n :3.9; Fig. 3) they should therefore also have a low level of polymorphism. Two explanations are possible: First, the results of Zhang et al. (2007) imply that mononucleotide microsatellite ought to be the most useful for marker development in rice and related species, since they are not only highly polymorphic but also very frequent. This is, however, not confirmed by recent large scale microsatellite marker development efforts. These studies report only a few successfully developed mononucleotide markers or none at all (Temnykh et al. 2001; Sharopová et al. 2002; Song et al. 2005; Torada et al. 2006). Therefore it is possible that the polymorphism of mononucleotide microsatellites is lower than estimated by Zhang et al. (2007).

Second, the evolution of mononucleotide microsatellite might be subject of other mechanism not relevant to di- to hexanucleotide microsatellites. In this case our model would fail for mononucleotide microsatellites. For example, poly-A tracts are frequently found in the end of LINE/SINE transposable elements (Schmidt 1999; Feschotte et al. 2002) where they act as binding sites for reverse transcriptase (Schmidt 1999). Dewannieux and Heidmann (2005) demonstrated that the length of poly-A tracts considerably influences the transposition rate of SINEs. LINE and SINE elements are abundant in rice (Noma et al., 1999) and generally in plants (Schmidt 1999; Feschotte et al. 2002). The functional constrains acting on the poly-A tracts of LINEs and SINEs can contribute to the low observed length variation of the [A] n microsatellites. Additionally, it can be speculated that the abundance and sequence similarity of LINEs/SINEs complicates the identification of homologous [A] n microsatellites between the Indica and Japonica varieties. Poly-C tracts, however, can act as binding sites of RNA-binding proteins (Makeyev and Liebhaber 2002), which may also influence the evolution of these microsatellites. Nevertheless, even upon including mononucleotide microsatellites into our model the correlation is still clear (r = 0.54). In our opinion, however, it is more reasonable to assume that either Zhang et al. (2007) overestimated the polymorphism of mononucleotide microsatellites or mononucleotide microsatellites are an exception to our model.

We demonstrated that the intra-genomic length variation (σ L ) can be used to predict the level of polymorphism for microsatellite motifs and thus their usefulness for marker development. We suggested that in species where sufficient amount of sequence information is available, for example whole BAC sequences or genome survey sequences (GSS), an analysis of σ L should precede microsatellite development. Such an analysis can easily be done with the Software SciRoKo 3.3 (Kofler et al. 2007). It should be kept in mind that the results for mononucleotide microsatellites are conflicting.

Based on the frequency of the microsatellite motifs (Table 2) and their σ L (Table 4) we especially recommend the motifs, AG, AAG, AC, AAC, AT and AAT for marker development in the cereals. Our recommendation receives support by several large-scale efforts for microsatellite development in the cereals, which successfully used these motifs (McCouch et al. 2002; Song et al. 2005; Torada et al. 2006). In our work, the motifs AT and AAT, though genome wide highly polymorphic, were not used for enrichment, because their low annealing temperatures or their self complementarity rendered the enrichment procedure ineffective.