Introduction

Barley (Hordeum vulgare L.) is a self-pollinated diploid (2n=14) plant with numerous advantages for the development of genetic stocks and their subsequent analysis (Sato et al., 2003). Barley serves as a model genome for the polyploid (for example, tetraploid and hexaploid cultivated wheat) and outcrossing (for example, rye) members of the Triticeae (Bothmer et al., 2003). Owing to the large diploid genome sizes (ca. 5000 Mbp, Arumuganathan and Earle, 1991), whole genome sequencing in the Triticeae has been deferred and the focus has been on construction of increasingly dense genetic maps.

Historically, barley geneticists spent decades developing high-quality genetic maps, comprised of natural and induced mutants, by using classical three-point linkage tests (Lundqvist et al., 1996). Cytogenetics subsequently contributed to the genetic mapping by physically localizing genes (Künzel et al., 2000). Genetic mapping of barley accelerated with the application of molecular markers to doubled haploid (DH) populations (Chen and Hayes, 1989). These DH populations have the added advantage of allowing for simultaneous genetic map construction and repeated analysis of quantitative traits (Hayes et al., 1993). DH populations were first mapped with restriction fragment length polymorphisms (RFLPs) (Graner et al., 1991; Heun et al., 1991; Kleinhofs et al., 1993). Owing to their transferability across species, RFLPs were essential for the early synteny studies that identified evolutionary patterns and genome similarities among cereal crop species (Moore et al., 1995; Devos, 2005). Subsequently, amplified fragment length polymorphisms (AFLPs) were widely used for higher-density map development (Waugh et al., 1997; Hori et al., 2003; Takahashi et al., 2006), but this type of marker has issues with data quality. Sequence-based PCR markers—that is, sequence-tagged sites (STSs) (Mano et al., 1999) and simple sequence repeats (SSRs) (Ramsay et al., 2000)—proved to be more reliable and informative and continue to serve as useful anchor loci for maps based on expressed sequence tags (ESTs). Extensive, quality-controlled barley EST information is available in the HarvEST (http://harvest.ucr.edu/) database. These resources were also used to develop the Affymetrix Barley 1 GeneChip (Close et al., 2004), a platform for large-scale expression analysis (Druka et al., 2006).

Genetic mapping of ESTs establishes gene location and provides a framework for whole genome sequencing, as shown in rice (Chen et al., 2002; Wu et al., 2002). A high-density EST map is the foundation for map-based genome analysis as it provides a basis for selecting bacterial artificial chromosome (BAC) clones for sequencing (Varshney et al., 2006). Mapped ESTs also provide access to other genomes, particularly that of rice (Oryza sativa L.), based on sequence similarity. Stein et al. (2007) found significant micro-collinearity between the rice and barley genomes using a consensus barley map with 1032 EST-based loci assayed using a combination of marker assays (including 607 RFLPs). Comparative genomic analysis of barley ESTs and rice genome sequences will enable detection of undiscovered barley genome sequences. Methods that systematically generate markers and estimate candidate orthologs based on the rice genome have enabled isolation of genes controlling key agronomic characters in barley (Furukawa et al., 2007; Taketa et al., 2008).

Our approach was to develop a high-resolution barley EST map using a single DH mapping population, only 3′-end ESTs and only PCR-based assays. Two of the EST donors were parents of the DH-mapping population. ‘Haruna Nijo’ is an elite malting cultivar. ‘H602’ is an accession representing the ancestral wild form of the species. A key advantage to using a single population for mapping, as opposed to a consensus map based on multiple populations, is that every map position is based on segregation data. We used only 3′-end ESTs generated by our research group to ensure sequence quality, preclude redundancy in mapping transcripts and provide a comprehensive distribution of genes on the linkage map.

Our ESTs were derived from representative strains of barley preserved at the Barley Germplasm Center in Okayama University (http://www.shigen.nig.ac.jp/barley/). As these strains represent a broad sample of genetic diversity in terms of degree of domestication, geographic origin and end use, we expected to maximize the degree of polymorphism in EST sequences. We used only PCR-based assays for mapping to provide the research community with accessible tools (for example, STS, CAPS (cleaved amplified polymorphic sequence) and single nucleotide polymorphisms (SNPs)) for further research and application of the mapped EST information. Strategies for using the mapped ESTs to integrate genetic and genomic resources are discussed.

Materials and methods

Source of ESTs

Three barley strains were used to construct nine cDNA libraries. Haruna Nijo is a two-row malting cultivar grown in Japan. Akashinriki is a Japanese six-row hull-less landrace used mainly for human consumption. Both are cultivated types classified as Hordeum vulgare ssp. vulgare. H602 is an accession of Hordeum vulgare ssp. spontaneum, the wild progenitor of cultivated barley. The two subspecies are classified as Hordeum vulgare L. after Bothmer et al. (2003) as no reproductive barriers or major genome differentiation are observed. The nine cDNA libraries were constructed using standard procedures (see http://harvest.ucr.edu/) or by oligo capping (Maruyama and Sugano, 1994). Cycle sequencing from both the 3′ and 5′ ends of the cDNA clones was carried out on ABI3700 or ABI3100 DNA sequencers using BigDye-terminator (Applied Biosystems Japan Ltd, Tokyo, Japan). Each read was base-called using Phred (Ewing and Green, 1998). After trimming of vector sequence and Poly A/ Poly T stretches, high-quality sequences (quality value (QV) 20) were selected from each read. These sequences were used for clustering and multiple alignment.

Primer development

Approximately 60 000 3′ESTs were assembled by Phrap (http://www.phrap.org/) and two non-redundant sets of sequences were derived (8753 contigs and 6686 singlets). Primers for each singlet and for the longest high-quality sequence in each contig were designed using Primer 3 (Rozen and Skaletsky, 2000). Criteria for developing primer sets were Tm=60 °C (with a difference of <2 °C between forward and reverse primers), GC content=50% (with a difference <20%), target size=400 bp (actual range 150–500 bp). Sequences that did not satisfy these conditions were rejected. A total of 10 366 primer sets were developed.

Mapping population development

A cross was made between Haruna Nijo and H602. The DH mapping population of 93 lines was derived from the F1 using microspore culture by Pajbjergfonden, Denmark (Ziauddin et al., 1990).

Polymorphism detection and mapping

Polymorphisms were identified using genomic DNA of Haruna Nijo, H602, and their F1. The PCR amplicons were electrophoresed on 1.5% agarose gels. Primer sets that did not amplify or that generated complex banding patterns were rejected. Presence or absence (dominant) and size difference (co-dominant) of polymorphisms were used directly as STS markers. All PCR amplicons were directly cycle-sequenced and aligned using CLUSTALW (Thompson et al., 1994). Restriction site polymorphisms for 50 different enzymes were used to develop CAPS markers. Where neither STS nor CAPS assays were possible, SNPs were genotyped directly by fluorescence polarization (Greene et al., 2002) using a 384-plate format and an ARVOsx multilabel counter (PerkinElmer Japan Co., Ltd, Yokohama, Japan). SNP alleles were scored automatically using the software provided by the manufacturer. Polymorphic SSR and STS markers with previously-reported map positions (Mano et al., 1999; Ramsay et al., 2000) were used as anchor markers. Linkage between these anchor markers and EST-derived STS and CAPS markers was calculated using MAPMAKER/EXP (Lander and Botstein, 1989) with the Kosambi map function (Kosambi, 1944) and a LOD threshold of 5.0. After the development of seven linkage groups, SNP-based markers were sequentially added to each linkage group at the same LOD threshold.

Sequence similarity with other genomic resources

We determined the sequence similarity of the mapped EST sequences with the unigene probe sets of the Affymetrix Barley 1 GeneChip (HarvEST Assembly 25), published full-length cDNA sequences (http://www.shigen.nig.ac.jp/barley/) and the rice genome (RAP2 rice representative sequences selected in each locus, http://rapdb.dna.affrc.go.jp/) by BlastN (Altschul et al., 1990) with a threshold of E<1E−30.

Results

Degree of polymorphism

Polymorphism rates for each class of marker are shown in Figure 1 and complete marker information on each locus is available (Supplementary Table 1). Of the 10 366 primer sets that were developed, 7700 generated useful amplicons. Of the 7700 primer sets, 3975 (52%) showed polymorphisms between the mapping parents. Of these, 2890 (28% of the total) were mapped (Supplementary Table 2). Nearly half of these (1717) were mapped as SNPs; the next largest group (933) were mapped using CAPS and 240 showed INDELs (196 co-dominant and 44 dominant) that could be mapped directly using fragment mobility in agarose gels. Transition SNPs were more common than transversion SNPs (A/G: 34.6%, C/T: 25.0%). The decision to map by CAPS or fluorescence polarization was based on the relative cost of the restriction enzyme.

Figure 1
figure 1

Categories of polymorphism for the PCR amplicons between mapping parents Haruna Nijo and H602 by the primer sets developed from the non-redundant 3′ESTs in Okayama University. PCR(-) includes both PCR failure and multiplex amplification.

Mapping

Fifty-eight previously-mapped markers (40 SSR and 18 STS) were used as anchor markers to define the short and long arms of each chromosome linkage group. Addition of the 2890 ESTs created a map with seven linkage groups, with a total map length of 2136 cM (Table 1, Figure 2 and online at http://map.lab.nig.ac.jp:8085/cmap/) (mapped 3′EST sequences are available from Supplementary Data 1). Each linkage group is densely populated with EST markers (a minimum of 321 on chromosome 6H and a maximum of 498 on chromosome 5H). Similar numbers of ESTs were mapped to each chromosome by the three mapping strategies (Table 1). As all EST-based markers are based on a non-redundant set of 3′EST sequences, the many clusters of co-segregating markers are attributed to the small size of the mapping population and not to ‘re-mapping’ the same locus using different primer sets. The average percentage of missing data points per locus was 4.7%, but in some cases was as high as 43%, which introduces some ambiguity into the mapping process (Supplementary Table 1). Clusters of markers showing segregation distortion were observed on all chromosomes except 3H. The number of missing data points per locus, as well as the segregation values for each locus, are shown in Supplementary Table 1. Replacing missing data points will slightly improve map quality, whereas the unidirectional segregation distortion of alleles at each cluster suggests a biological basis, perhaps related to gametophytic selection for performance in microspore culture, which tends to produce more lethal haploids compared with the haploid induction by Hordeum bulbosum (Chen and Hayes, 1989). Sayed et al. (2002) and Muñoz-Amatriain et al. (2008) reported distorted marker segregation in anther culture-derived mapping populations. Overall, neither missing data nor segregation distortion affected map quality, as the LOD threshold was 5.0.

Table 1 Length and marker information on the high-density barley EST map
Figure 2
figure 2

A cMAP of barley transcript (EST) map in the cross between Haruna Nijo and H602. SSR and STS markers were integrated as anchors for each chromosome. All the EST markers were PCR amplified by specifically developed primer pairs from the representative 3′ESTs, which were generated at Okayama University and directly accessible from the electronic version of cMAP with full EST markers (http://map.lab.nig.ac.jp:8085/cmap/).

Sequence similarity

Statistics on the sequence similarity of the ESTs with the Affymetrix Barley 1 GeneChip, published full-length barley cDNA sequences, and the Triticum aestivum (Ta) gene index release 10 are shown in Table 2 and full data are available in Supplementary Table 3. Of the mapped ESTs, 2689 (93%) are formatted in Affymetrix Barley 1 and full-length cDNA sequences are available for 1039 (36%). Mapped ESTs show higher similarity with sequences in the Ta gene index (93%) and moderate similarity with rice locus (50%) (Supplementary Table 4).

Table 2 Number of accessions in the probes in Affymetrix Barley 1 GeneChip, published barley full length cDNA sequences (FLcDNA), Triticum aestivum gene index release 10 (TaGI10) and RAP2 rice nucleotide sequence of each locus (RAP2 nucleotide locus) showing sequence similarity by blastN search with ESTs on the map

Discussion

The present work establishes linkage map positions for an estimated 9% of the genes of barley. This estimate is based on the number of non-redundant barley transcripts (32 690) by CAP3 (Huang and Madan, 1999) assembly using 3′ end sequences of full-length cDNAs (Sato K, unpublished). These mapped genes, together with a systematically developed set of mapping populations, will make it easier to directly clone barley genes conferring phenotypes showing simple inheritance and to determine the genetic basis of complex traits. The information will contribute to the development of a framework for the physical mapping of barley, which is a necessary step towards genome sequencing. A high-density barley EST map will also be useful for all cereal species by way of providing additional markers and elucidating evolutionary relationships.

Contribution to the systematic development of genetic resources

Haruna Nijo and H602 are key hubs in a comprehensive set of genetic and germplasm resources. Hori et al. (2005) developed and used recombinant chromosome substitution lines (RCSLs) derived from these two accessions. Haruna Nijo is the recurrent parent in each of the 10 new RCSL populations developed at Okayama University (Sato, unpublished). The donor accessions for each of these new RCSL populations represent diverse key barley germplasm, including Morex, Barke, Harrington, Oregon Wolf Barley (dominant parent) and Golden Promise. These lines will facilitate access to qualitative and quantitative traits segregating between donor parents and Haruna Nijo. BAC libraries have been constructed from both Haruna Nijo (Saisho et al., 2007) and H602 (Sato, unpublished). Haruna Nijo is the basis of a new TILLING system comprised of 10 000 individuals (Sato, unpublished). Haruna Nijo and H602 are the sources of many of the ESTs (http://www.shigen.nig.ac.jp/barley/) that are foundation stones of contemporary barley genetic resources, including HarvEST (http://harvest.ucr.edu/), the US Barley Coordinated Agricultural Project (CAP) (http://www.barleycap.org/; Hayes and Szucs, 2006) and the Affymetrix Barley 1 GeneChip (Close et al., 2004). Haruna Nijo was used to develop the first comprehensive set of barley full-length cDNAs (5006 clones) (http://www.shigen.nig.ac.jp/barley/). The EST map described in this report will make Haruna Nijo one of the key haplotypes for barley research.

The cross-combination of a cultivated and a wild-type barley was chosen to allow mapping genes related to domestication, to facilitate future mining of new alleles from the wild-type parent and to maximize polymorphisms at marker loci. Domestication-related genes mapped to date include brittle rachis (btr1/btr2), dormancy, black lemma (Blp) and cleistogamy (cyl1/Cyl2) (Hori et al., 2005). H602 is a source of potentially new genes conferring resistance to fungal pathogens, including those inciting mildew (Blumeria graminis f. sp hordei) and net blotch (Pyrenophora teres f. teres) (Sato and Takeda, 1997). The present map, together with the other genomics resources based on this germplasm (notably RCSLs and BAC libraries), can be used as a platform for cloning these genes.

Apart from the relatively small number of ESTs that we mapped directly based on INDELs, we mapped ESTs based on SNPs in 3′UTRs, exons and introns. We expected that our strategy of using 3′ESTs would provide a higher level of polymorphism than the one based on exon SNPs only (for example, Rostoks et al., 2006). This expectation was based on the report of higher SNP rates in non-coding regions (for example, UTRs, introns and intergenic regions) than in exons in humans (Venter et al., 2001) and soybean (Van et al., 2005). Our 3′ESTs would be expected to include exons, 3′UTRs and introns. We also expected that there could be a somewhat higher rate of exon SNPs in EST sequences from wild-type and cultivated barley than in those from only cultivated barley because of mutations associated with domestication. However, our SNP rate was not as high as expected compared with the rates of 17–45% reported for three mapping populations by Stein et al. (2007). This difference may be due to the higher detection rate of intron SNPs by Stein et al. (2007) and the higher expected similarity in coding sequences between these accessions of wild-type and cultivated barley.

The present map was developed using segregation data from 93 DH lines, a small population for mapping 2890 ESTs. Although local ordering would be optimized, and some co-segregating markers separated, with a larger population, the current map provides reliable general positions for a large number of ESTs. Our strategy of using a single population differs from the consensus map approach used by Stein et al. (2007) and Close et al. (unpublished Barley CAP SNP consensus map; http://www.harvest-web.org/hweb/bin/gmap.wc?wsize=1195x790). Ultimately, the individual and consensus maps should be available at a single site so that the user can view, align and compare depending on the need and interest.

The EST map as a tool for map-based cloning and dissection of complex traits

A high-density EST map will be useful for cloning genes when the mapped loci are also represented on an expression profiling array such as the Affymetrix Barley 1 GeneChip (Affymetrix Japan Co., Tokyo, Japan). As shown in Table 2, 2689 probes on the Barley 1 GeneChip are assigned genetic map positions. Addition of the 1032 ESTs reported by Stein et al. (2007) will further integrate the Barley 1 GeneChip with the genetic map. Furukawa et al. (2007) have shown that expressed probes on the Barley 1 GeneChip with genetic map positions in barley can be used to identify orthologous genes in rice. This strategy led to identifying the corresponding full-length cDNA in barley, and ultimately cloning and characterizing the aluminium-tolerance gene in barley. The availability of full-length cDNAs for 36% of the mapped ESTs will make this process even more efficient in future projects.

The present map will also be useful for mapping complex traits. Any marker can be used for mapping quantitative trait locus (QTL), but a key advantage of ESTs is that they are based on a sequence, so that the information content will not change even if the detection platform changes. By providing aligned sequence data on the two parents used in this study (Supplementary Table 5), we provide a starting point for aligning alleles from multiple germplasm accessions and a resource for designing markers for other germplasm combinations. This approach was shown by Hori et al. 2007a, who synchronized the positions of QTLs in eight mapping populations by using the same EST markers, all of which were extracted from the present EST map.

The EST map as a framework for physical mapping of barley

The mapped ESTs can be assigned physical coordinates using cytogenetic stocks, such as barley–wheat addition lines. For example, Nasuda et al. (2005) and Ashida et al. (2007) mapped barley EST markers on the barley chromosome deletion stocks and estimated the physical location of these ESTs. Although the resolution in segmental allocation of markers was low in wheat deletion stock mapping (Sorrells et al., 2003), the approach is a useful first step in developing BAC contigs for genome sequencing. Best estimates of physical and genetic distances in the barley genome are necessary to determine the efficiency of using a genetic map as a framework for BAC clone allocation to the genome, as marker resolution is usually highest in distal regions and lowest in centromeric regions (Künzel et al., 2000).

The EST map provides an integrative tool for all cereal species

As clearly shown by Devos (2005); and Moore et al. (1995), there is general colinearity in content of the barley and wheat genomes. Therefore, it is not surprising that 93% of wheat genes showed a high level of sequence similarity with the mapped barley ESTs (Table 2). Together with the mapped ESTs reported by Stein et al. (2007) and those on HarvEST (http://harvest.ucr.edu/), ca. 5000 wheat genes may show collinearity with mapped barley ESTs. This should be a significant marker resource for wheat genetic map enrichment.

As shown by Hori et al. 2007b, barley ESTs can be efficiently mapped on the diploid wheat genome. Homoeology is complete except for the reciprocal translocation between 4A and 5A (Figure 3, see online at http://map.lab.nig.ac.jp:8085/cmap/). There are examples for complementary use of wheat, and barley genetic resources can be of benefit to gene discovery for vernalization-requirement genes in both species (Yan et al., 2004, 2006) based on sequence similarity.

Figure 3
figure 3

Comparison of barley EST markers between barley chromosome 3H and diploid wheat 3A (Hori et al., 2007a, 2007b). A total chromosome map information and EST marker sequences are accessible from the electronic version of cMAP (http://map.lab.nig.ac.jp:8085/cmap/).

Barley: Rice synteny is an important resource to identify barley orthologs by rice genome information (Moore et al., 1995; Devos, 2005). The sequence similarity of mapped barley ESTs with rice nucleotide sequences was much lower (50%) than observed for the wheat gene index (93%). The overall genome comparison between mapped barley ESTs and rice genome sequences (IRGSP pseudomolecules Build4 (Supplementary Table 4)) identified many collinear regions between the two genomes. In particular, the entire chromosome-level collinearity was observed between barley chromosome 3H and rice chromosome 1 (Figure 4). Comparative genomic information can aid systematic marker generation even when the corresponding interval in the rice genome lacks a barley ortholog (Komatsuda et al., 2007; Sutton et al., 2007). This approach is particularly useful for the identification of functional homologs in both species (Inukai et al., 2006) and identification of candidate orthologs in barley. This has been shown in the isolation of genes controlling key agronomic characters in barley (Furukawa et al., 2007; Taketa et al., 2008). Even after the release of rice genome (International Rice Genome Sequencing Project, 2005), some rice transcripts (http://rapdb.dna.affrc.go.jp/RAP2_statistics.html) were not assigned map positions: barley ESTs were used to identify rice transcript positions in the rice genome. The reverse will also be true. Rice sequence information will be useful for filling missing sequences in the barley genome.

Figure 4
figure 4

Complete chromosome level collinearity between mapped ESTs on barley chromosome 3H and the genome sequence of rice chromosome 1 (from IRGSP pseudomolecules Build4). A blastN (Altschul et al., 1990) threshold of E<1E-5 was used for the basis of sequence similarity comparison.

Perspectives on EST mapping strategies

There are 500 000 barley ESTs in GenBank. These were the essential basis for detecting SNPs among barley EST donors and efficient genotyping using the Oligonucleotide Pooled Assay (OPA) mapping system (Rostoks et al., 2006). However, the system required redundant sets of ESTs, and the genes that are mapped may be biased to the same highly expressed genes in multiple genotypes. The present map comprised of 2890 ESTs took 3 years to generate, which is slow compared with the OPA mapping system. However, the present map was effective for mapping transcripts with lower-expression levels and may therefore represent a complementary set of mapped transcripts.

Redundancy and assay polymorphism rates will always be challenges for EST mapping projects. If we were to start EST mapping today, we would use full-length cDNAs to minimize redundancy. Polymorphism detection would also be improved with more available sequence. Polymorphism assay platforms are constantly improving in terms of cost, accuracy and throughput.