Organization and variability of the maize genome
Introduction
Maize is unusual among model genetic organisms. In addition to being, until the past decade, the most extensively studied plant species, it has a very high commercial value and serves as the main staple of the diet for millions of people in Africa and the Americas. It should, therefore, have ranked close to the top among species considered for early large-scale genome sequencing by the public sector. However, the high cost of sequencing its moderately large genome (>2.5 Mb) relegated maize to a position behind other plants with much smaller genomes, such as Arabidopsis, rice, poplar, and Medicago. That situation finally changed last year, when the same three US agencies that sponsored the Arabidopsis genome sequencing project (National Science Foundation [NSF], Department of Energy [DOE], and US Department of Agriculture [USDA]) announced a US$32 million program for sequencing the maize genome. Here, we review our knowledge of the organization and variability of the maize genome at the outset of this highly anticipated program.
Section snippets
The maize transcriptome
Given the maize genome's large size [1] and abundance of transposable element [2], a quick way to assess its gene content is through the isolation and sequencing of cDNAs from different tissues. With a total of 482 892 sequences, maize currently ranks tenth overall in number of cDNA entries in GenBank and second, only behind wheat, among plants. In addition, an industry consortium has recently made a collection of 1 845 987 expressed sequence tags (ESTs) available via a users agreement (//www.maizeseq.org/
The early stages of genome sequencing: absence of a physical map
The initial sequence analysis of maize genomic regions, from different maize lines and scattered in the linkage map, was driven by the genetic interests of individual investigators (see Supplementary material). The first contiguous genomic regions to be sequenced were the z1C1 [15, 16] and adh1 loci [17] on chromosomes 4 and 1, respectively. The z1C1 locus of inbred BSSS53 was assembled first from overlapping cosmids and subsequently from overlapping bacterial artificial chromosome (BAC)
The present: sequencing genomic regions anchored to the physical map
As illustrated by the Arabidopsis and rice projects, the sequencing of large regions of the genome and the isolation of genes by map-based cloning benefit greatly from the availability of a physical map, which is generally constructed in two steps. First, the genome is broken down in small overlapping fragments of about 150 kb and common restriction patterns are used to establish contiguous regions, called finger printing contigs (FPCs) [29]. Second, FPCs are aligned to the genetic map through
Overall composition of the maize genome
A global view of the organization of repetitive DNA in the maize genome has been generated by fluorescence-based in situ hybridization, which facilitated the localization of centromeric, knob, and microsatellite repeat sequences for each of the ten chromosomes [37]. Recently, it has even been possible to position genetically mapped EST markers along the 10 maize pachytene chromosomes on the basis of the distribution of recombination nodules along synaptonemal complexes [38]. Most EST markers
Intraspecific and interspecific comparisons
The availability of BAC-sized contiguous sequences has made it possible to examine the relationship of components of the maize genome to each other and to those of other species.
Some gene families in maize, such as storage protein and disease resistance genes, are found in multiple genomic regions. The α-zein storage protein genes consist of 42 copies in six different chromosomal locations [11, 42]. The Rp1 genes are located in two regions of 250 and 300 kb on chromosome 10; and the MP3 genes
Conclusions
A large-scale sequencing project of the B73 genome is about to begin. The sequence information derived from this project will be extremely useful to geneticists and breeders because it will be largely anchored to a genetic map. However, the repetitive DNA component is highly polymorphic and there might even be exceptions to the conservation of gene order among inbreds. Therefore, the picture of the distribution of retrotransposons and genes captured by the new genome sequencing project will be
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
We thank Galina Fuks for her help in assembling the tables. Part of the work presented here and the preparation of this article was supported by National Science Foundation (NSF) grants DBI 99-75618, MCB 99-04646, DBI 02-11851, and DBI 03-20683.
References (58)
- et al.
Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotransposons
Ann Bot
(1998) - et al.
Uneven distribution of expressed sequence tag loci on maize pachytene chromosomes
Genome Res
(2006) - et al.
Nuclear DNA content of some important plant species
Plant Mol Biol Reporter
(1991) - et al.
Comparison of RNA expression profiles based on maize expressed sequence tag frequency analysis and micro-array hybridization
Plant Physiol
(2002) - et al.
Regulation of leaf initiation by the terminal ear 1 gene of maize
Nature
(1998) - et al.
Characterization of the maize endosperm transcriptome and its comparison to the rice genome
Genome Res
(2004) The map-based sequence of the rice genome
Nature
(2005)- et al.
Representative cDNA libraries from few plant cells
Plant J
(1994) - et al.
Sperm cells of Zea mays have a complex complement of mRNAs
Plant J
(2003) - et al.
Sequence composition and genome organization of maize
Proc Natl Acad Sci USA
(2004)