Main

Centromeres present an enigma. One and only one is needed on every chromosome for segregation at mitosis and meiosis in all eukaryotes, yet there is no conservation of centromeric sequences: different organisms have markedly different centromeric DNAs1. In budding yeast, a 125-bp consensus sequence contains all the information needed for centromere function, whereas in nematodes, the holokinetic centromere spans the entire chromosome. In most multicellular eukaryotes, however, centromeres typically encompass megabases of DNA, often with limited or nonexistent sequence similarity for homologous centromeres between closely related species2.

Understanding the basis for centromere function in multicellular eukaryotes has been impeded by the sequence content of centromeres, which consist of highly repetitive 'satellite' sequences, interrupted by dispersed transposable elements. For example, the best-studied human centromere comprises a 2-Mb to 4-Mb core of homogeneous 171-bp alpha satellite DNA repeats flanked by 0.5 Mb of diverged alpha satellite that is densely populated with L1 transposable elements3. The enormous tracts of highly repetitive sequences at centromeres have precluded complete sequencing, not only for human centromeres, but also for those from model organisms4,5. Indeed, not a single centromere that has been mapped in a multicellular eukaryote has been completely sequenced.

Whereas centromeric sequences have proven nearly intractable, considerable progress has been made in defining the nucleosomal component of centromere-specific chromatin. A centromere-specific histone H3-like protein, referred to as CenH3 (ref. 6), has been found to underlie the kinetochore in species ranging from yeast (Cse4p) to vertebrates (CENP-A) and plants (CENH3; refs. 68). CenH3s are essential for chromosome segregation in all organisms tested, and ablation of CenH3 is accompanied by the inability of other kinetochore components to assemble2.

Plants, like humans, have centromeres comprising megabase-sized satellite sequences, and are similarly refractory to sequence analysis4,9,10,11. In rice, however, some chromosomes have little satellite repeat sequence12. The canonical rice centromeric satellite, CentO, lies in all 12 rice functional centromeres, as centromeric misdivisions yield telocentric chromosomes with reduced amounts of CentO12. Specifically, a misdivision derivative with a breakpoint in the very short CentO array on rice Chromosome 8 identifies this array as being within the functional Chromosome 8 centromere (Cen8). The limited amount of CentO on Cen8 allowed us to obtain a contiguous tiling of BAC clones that span Cen8, to cytologically map CentO arrays to a region in the BAC contig and to sequence the CentO-spanning segment. We also identified a putative gene encoding CENH3 from rice genomic sequence, characterized it molecularly and raised an antibody to the product. We used this antibody to CENH3 in chromatin immunoprecipitation (ChIP) analysis to delimit a 750-kb region that underlies the kinetochore. Notably, this region contains several active genes.

Results

A BAC contig spans Cen8

We previously showed that the centers of rice centromeres contain CentO arrays and centromere-specific retrotransposon (CRR) sequences12,13,14. We used CentO and CRR probes to identify BACs from centromeric regions in the rice genome. We screened 36,864 rice BAC clones, which are part of the rice BAC libraries constructed from Oryza sativa spp. japonica rice variety Nipponbare12. Approximately 3% of the BAC clones contain CentO, CRR or both. We then used all CentO- and CRR-positive clones to search the CUGI rice FPC Database to identify centromere-associated BAC contigs. We identified 58 contigs, each containing at least 50 clones, and selected clones from these contigs for fluorescent in situ hybridization (FISH) mapping to anchor them to the 12 rice chromosomes. One contig, containing more than 400 BAC clones, was associated with Chromosome 8, of which a subset of clones in the center of the contig spanned the CentO centromeric satellite repeat.

We mapped clones on either side of the CentO-containing BACs in this contig by FISH to the short-arm or long-arm side of CentO in rice pachytene chromosomes (Fig. 1a–c). We constructed a minimal tiling path of BAC clones that extends 5–6 BACs to either side of the CentO-containing BAC a0038J12 (Fig. 1a). We used pachytene FISH and fiber FISH to confirm the cytological locations and overlap between the 12 BAC clones (Fig. 1d–f and data not shown). Fiber FISH data with all 12 BAC clones indicated concordance between the BAC clones and Chromosome 8, suggesting that this tiling path is a faithful representation of the corresponding chromosomal region. In a more detailed analysis of BAC a0038J12, we compared the relative lengths of the simultaneous hybridization signals from CentO and a0038J12 probes on five Nipponbare genomic fibers and five a0038J12 molecules. The length of the CentO region as a percentage of the a0038J12 hybridization signal was 43.4 ± 1.4% for Nipponbare fibers and 41.5 ± 1.8% for a0038J12 molecules (mean ± s.e.), suggesting that CentO sequences in this clone were stable (Supplementary Fig. 1 online). Other BAC molecules had similar correspondences to genomic fibers.

Figure 1: Cytological characterization of a BAC contig spanning rice Cen8.
figure 1

(a) BAC contig including 12 clones (shown in black) that represents a minimal tiling path centered on CentO and spanning the Chromosome 8 centromere. The sequences derived from the four BACs marked in red were obtained from GenBank. (b) BAC a0096K16 (red) is mapped to the short-arm side of the centromeric satellite CentO (green) on rice pachytene Chromosome 8. (c) BAC a0095C12 (red) on the contig is mapped to the long-arm side of CentO (green) on Chromosome 8. (d) A fiber FISH signal derived from BAC a0043P08 and CentO. (e) Reprobing of the fiber shown in d using a probe derived from BAC a0038J12 and CentO, showing that BAC a0038J12 spans the CentO signal. (f) A fiber FISH signal derived from BAC a0024A05 and CentO. All BACs used in FISH and fiber FISH are included in the minimal tiling path. DAPI staining is shown in blue. Scale bars, 10 μm.

We shotgun-sequenced the 12 BACs in the minimal tiling path and then closed gaps and resolved misassemblies. The eight remaining small gaps are attributable to sequencing or assembly difficulties (Supplementary Table 1 online). We used the sequences of these BACs together with the sequences (including three additional gaps) of four contiguous BACs available from GenBank (Fig. 1a) to construct a 1.65-Mb virtual contig, which we annotated for gene models15.

We identified repetitive sequences by BLAST-searching16 the TIGR/Oryza/Repeat Database and the public rice genome sequence (a total of 3,760 BAC/PAC clones representing 515.9 Mb). Approximately 58% (954 kb) of the virtual contig is represented by known repetitive sequences: 41 kb of CentO, 10 En/Spm-like DNA transposons17,18 and 162 retrotransposons. An additional 14% of the virtual contig is derived from miniature inverted-repeat transposable elements, unknown repeats and highly truncated DNA transposons and retrotransposons. Thus, repetitive sequences comprise 72% of the 1.65-Mb region. Single-copy sequences include 47 gene models (Fig. 2a), of which 19 are similar in sequence to known genes and 28 are predicted solely by ab initio gene finders. The loci and their chromatin status are summarized in Fig. 2a–f.

Figure 2: Map of Cen8.
figure 2

(a) Location of putative genes in the Cen8 virtual contig. RT-PCR-positive genes are indicated by magenta arrows and RT-PCR-negative genes by blue arrows. Orange bars mark the positions of the remaining sequencing gaps. (b) Distributions of the CentO satellite and the CRR elements in the Cen8 virtual contig. Autonomous CRRs, nonautonomous CRRs, solo LTRs from autonomous CRRs, solo LTRs from nonautonomous CRRs and CentO are indicated by orange, purple, dark green, light green and red arrowheads, respectively. (c–e) ChIP-PCR analysis of Cen8 using (c) antibody to CENH3, (d) antibody to dimethylated H3-Lys9 (H3K9me) and (e) antibody to dimethylated H3-Lys4 (H3K4me). Mean (n = 3) relative enrichment levels are shown as histogram bars with standard error. P values calculated based on Student's t-test are shown as percentages. Red bars indicate significantly greater relative enrichment and blue bars indicate lack of significant enrichment. The red dotted line represents relative enrichment of 1 in c and d, but is set at 0.4 in e because primer pair 15 was used as a baseline rather than the noncentromeric reference gene Rictpi, which is enriched in dimethylated H3-Lys4 (cf. Fig. 4a). (f) A scale bar of the 1.65-Mb Cen8 virtual contig. Pink shading marks the kinetochore region enriched in CENH3 binding.

The CentO satellite sequences in the virtual contig are organized into two clusters of three tracts each (Fig. 2b). Tract D (18,338 bp) and tract E (7,620 bp) are separated by a truncated CRR element (CRR-5), and tracts E and F (12,244 bp) are separated by a complete CRR element (CRR-6). Tract F is inverted relative to tracts D and E. We found three shorter CentO tracts of 1,386 bp (tract A), 1,236 bp (tract B) and 147 bp (tract C) on the long-arm side of the virtual contig. Cluster ABC (2.8 kb) has only 7% of the amount of CentO found in cluster DEF (38.3 kb) and is not detectable by FISH, as the single CentO signal corresponds to cluster DEF on the short-arm side of BAC a0095C12 (Fig. 1c).

Mapping Cen8 with CENH3

Our BAC mapping and DNA sequencing of the virtual contig identifies the molecular location of the centromere, which had previously been shown to include CentO repeats based on cytology and the occurrence of centromere misdivisions in this region12. But the boundaries of the centromere remained ambiguous, especially given that there are two CentO arrays in the contig that are several hundred kilobases apart. To delimit the entire centromere, we needed a criterion for deciding which sequences are in the bona fide centromere and which lie outside. Mapping strategies that have been applied to centromeres include recombination and rearrangement breakpoint mapping to separate a functional centromere from either arm19,20 and construction of artificial chromosomes using centromeric sequences3. Each of these methods for defining centromeres has limitations, because subregions of centromeres with repetitive sequences and redundant functions can lead to ambiguities in mapping.

An alternative centromere mapping method takes advantage of the perfect correspondence between centromeres and CenH3s. High-resolution mapping is achieved using ChIP with antibodies against CenH3s (refs. 8,21). Owing to sequence repetitiveness, the application of ChIP to native satellite-containing centromeres has been limited to identification of the repeat family22,23,24. But ChIP using antibodies against the human CenH3, CENP-A, has been used to map neocentromeres, which are rare functional centromeres that sometimes arise spontaneously in alphoid-free regions of chromosome arms25,26,27. Therefore, in selected cases, ChIP using an organism's CenH3 can delimit the DNA region along the chromosome on which the kinetochore forms, which we refer to as the kinetochore region.

We identified a single gene encoding CENH3 in the rice genome. CenH3s are distinguishable from canonical H3 histones by bioinformatic criteria28, and a single predicted open reading frame from genomic sequence conformed to the CenH3 profile. We identified an EST corresponding to part of this open reading frame and sequenced the apparently full-length cDNA corresponding to the EST. We raised a rabbit polyclonal antibody to the most N-terminal portion of the putative CENH3 protein. This antibody to CENH3 specifically stained the sister kinetochores at the primary constriction of each of the 2n = 24 rice chromosomes in mitotic figures (Fig. 3a–c), providing a suitable reagent for molecular mapping of Cen8.

Figure 3: Analysis of CENH3 binding in the rice genome.
figure 3

(a) A somatic metaphase cell prepared from root tips. The primary constriction is visible on at least 10 of the 24 chromosomes. (b) Staining with antibody to CENH3 in the same cell. (c) Merge of a and b. Staining with antibody to CENH3 (green) is present on both chromatids (red) of all visible primary constrictions. Scale bar, 5 μm. (d) ChIP using antibody to CENH3 on selected repeat sequences. The vertical axis shows the difference between antibody to CENH3 treatment and mock controls in the percent of the total hybridization signal that was found in the immunoprecipitated chromatin fraction (% IP) when probed with each repetitive sequence class.

We first used ChIP to address whether CentO and CRR repeats underlie the kinetochore. Both sequences were enriched in the CENH3-bound fraction in ChIP assays (Fig. 3d). As is the case for maize centromeres23, centromeric satellite is more enriched than centromere-specific retrotransposons. On average, 16.5 ± 3.8% (mean ± s.e., n = 3) of the CentO repeat and 7.8 ± 2.2% (mean ± s.e., n = 3) of the CRR element were precipitated by the antibody to CENH3. All other tested repeats (5S and 45S rDNA, RIRE3 retrotransposons and Os48 tandem repeats29) were not substantially enriched in the precipitated fractions (Fig. 3d). We conclude that CentO and CRR are main components of rice centromeres, but because they are repetitive, we cannot determine if they are bound by CENH3 specifically on Cen8.

Next, we used ChIP-PCR to identify the kinetochore region within the 1.65-Mb region. We used 42 single-copy primer sets (Supplementary Table 2 online) from throughout the Cen8 virtual contig to determine which sequences are bound by CENH3. The amplified products of 11 sets of primers (10, 12–15, 18, 22–24, 29 and 31) designed between 250 kb and 1,000 kb of the Cen8 virtual contig had significantly greater relative enrichment in ChIP assays (P < 0.05; Figs. 2c and 4a and Supplementary Fig. 2 online). These positive primer sets surround CentO tracts D, E and F, indicating that these CentO tracts are included in the kinetochore region.

Figure 4: Examples of ChIP targets and gene expression in the CENH3-binding region of Cen8.
figure 4

(a) The C-kinase substrate gene (a noncentromeric control) and primer set 15 were each amplified and separated by electrophoresis in three replicate experiments together with the noncentromeric gene Rictpi, which is used as a reference for calculating relative enrichment. The ChIP-PCR results are negative (relative enrichment not significantly different from 1) for the C-kinase substrate gene and positive for primer set 15 for precipitation with antibody to CENH3 and antibody to dimethylated H3-Lys9 (H3K9me). Both Rictpi and C-kinase substrate genes are enriched in fractions precipitated with antibody to dimethylated H3-Lys4 (H3K4me). (b) RT-PCR using a cDNA of gene 6729.t00009 from shoot (S) and root (R). M, 100-bp ladder marker; N, negative control (without template); G, genomic DNA template. (c) The same RT-PCR product was used to probe a Southern blot of rice genomic DNA digested with HindIII (H), BamHI (B) and EcoRI (E).

Primer sets 32–42 to the right of the CentO tract F did not have greater relative enrichment (Fig. 2c and Supplementary Fig. 2 online), placing the short-arm boundary of the kinetochore region between primer sets 31 and 32. Primer sets 1–9 also did not have greater relative enrichment (Fig. 2c and Supplementary Fig. 2 online), placing the long-arm boundary between primer sets 9 and 10. Because the short CentO tracts A, B and C are between primer sets 7 and 8, they do not appear to be included in the CENH3-binding region. Thus, the kinetochore region is 750 kb, based on the distance between primer intervals 9–10 and 31–32. The relatively low significance of CENH3-binding at the edges of the kinetochore region (primer sets 10 and 31) might reflect a decrease in CENH3 density. Our mapping of the kinetochore region is consistent with the location of the DEF cluster of CentO tracts at the short-arm side of the primary constriction of Chromosome 8 in pachytene FISH (Supplementary Fig. 1 online).

Distribution of H3 in Cen8

Eleven primer sets (11, 16, 17, 19–21, 25–28 and 30) in the kinetochore region did not have significantly greater relative enrichment (Fig. 2c), suggesting that CENH3 binding is discontinuous. To ascertain whether these discontinuities represent interspersion of CENH3- and H3-containing nucleosomes, we carried out ChIP-PCR analysis with the same primer sets and nucleosome fractions using antibodies to dimethylated histone H3 at Lys4 (H3-Lys4) and dimethylated H3-Lys9. Levels of dimethylated H3-Lys4 were low throughout the kinetochore region except at primer set 10 at the long-arm edge (Figs. 2e and 4a and Supplementary Fig. 2 online), whereas levels of dimethylated H3-Lys9 were high throughout the kinetochore region and beyond (Figs. 2d and 4a and Supplementary Fig. 2 online). Nine of the eleven primer sets with low relative enrichment for CENH3 also had insignificant relative enrichments for dimethylated H3-Lys9. Therefore, these primer sets cannot be used to detect ChIP signals, owing either to inaccessibility of the epitopes or to their absence. We conclude that high levels of CENH3 binding are contained within regions of high dimethylated H3-Lys9 binding.

The nine primer sets that are positive for both CENH3 and dimethylated H3-Lys9 (12–15, 18, 23, 24, 29 and 31) are probably not detecting a uniform population of Cen8s with mixed nucleosome arrays. ChIP analysis was done with the mono-, di- and trinucleosome fraction using probes that ranged from 100 bp to 350 bp, so mixed arrays would require an almost exact alternation of CENH3- and H3-containing nucleosomes at all nine sites. Rather, we interpret this result as reflecting heterogeneity among different Cen8s in the population. Thus, the alternating arrays of CENH3- and H3-containing nucleosomes expected based on fiber analysis for fruit fly and human cells30 would average out in our ChIP-PCR study to show similar distributions of the two histone epitopes.

Mobile elements in Cen8

Of the 162 retroelements in the 1.65-Mb region, 28 are CRR elements (Fig. 2b and Supplementary Fig. 3 online). Four of the CRRs are full-sized autonomous elements (7.6–7.8 kb), five are full-sized nonautonomous elements (4.4 kb) and the others are truncated elements or solo long terminal repeats (LTRs). Within the virtual contig, CRRs are most abundant in the Cen8 kinetochore region, which contains 21 of the 28 CRRs, including 5 of the 6 autonomous CRRs and all 3 CRR solo LTRs. In contrast, we found no difference in the density of other elements between the kinetochore region and the rest of the virtual contig (48% of elements in 45% of length).

Retrotransposons in rice Cen8 are relatively young, based on age estimates from the number of substitutions per nucleotide site in the originally identical LTRs11,31. Using an average substitution rate of 6.5 × 10−9 per synonymous site per year32, we estimate ages of 0–9.4 million years (Supplementary Table 3 and Supplementary Fig. 3 online). Only 6 of the 48 retrotransposons analyzed had transposed more than five million years ago, and 71% of the elements had transposed within the last three million years. We did not observe any differences in the transposition timing of the retrotransposons located in the kinetochore region from those located in the rest of the virtual contig.

Cen8 contains active genes

Of the 47 putative genes in the 1.65-Mb contig, 14 are in the kinetochore region (Fig. 2a). This is notable because centromeres of other multicellular eukaryotes are known to be devoid of genes3,4,33. To ascertain whether the putative centromeric and flanking genes are expressed, we carried out RT-PCR analysis of the 39 intron-containing apparently unique genes (Table 1). Twelve of these genes are expressed in leaf and root tissues (Fig. 4b, Table 1 and Supplementary Fig. 4 online), including four genes (6729.t00010, 6729.t00009, 6730.t00011 and 6827.t00018) located in the CENH3-binding region. The annotations for these genes did not suggest any notable difference from randomly sampled rice genes.

Table 1 Expression of genes located in the Cen8 virtual contig

We carried out Southern-blot hybridizations using the RT-PCR products as probes to confirm that the active genes are unique in the rice genome. Seven of the twelve active genes showed only a single band of the expected size; the other five showed a strong band of the expected size and a second weak band, indicating the presence of a related gene elsewhere in the genome (Fig. 4c and Supplementary Fig. 4 online). We conclude that at least 12 genes on the virtual contig are expressed and that at least 4 of these active genes lie exclusively in the Cen8 kinetochore region.

Discussion

The presence of long tracts of satellite repeats have made centromeres the last frontiers of higher eukaryotic genomes, and no native centromere had previously been mapped, cloned and sequenced. A fully functional minichromosome centromere from Drosophila melanogaster has been cloned and partially sequenced5,34. In this case, however, the precise relationship between this centromere and the native X-chromosome centromere from which it was derived is unknown, and a satellite block that is essential for minichromosome function is dispensable for X-chromosome segregation35 and is absent from some wild-type chromosomes36. In addition, fully functional human neocentromeres have been cloned and sequenced. These are not native, however, but are extremely rare in the species, with no evidence for their continued propagation over more than a few generations2. In contrast, rice Cen8 is native to the species, found at the same position and containing small CentO arrays in both indica and japonica cultivars that were independently domesticated 10,000 years ago.

Except for its low satellite DNA content, Cen8 is a typical rice centromere. Other rice centromeres are also satellite-poor relative to centromeres of many other eukaryotes. For example, FISH measurements show that rice Cen4 contains a quantity of CentO that is not significantly higher than that of Cen8 (ref. 12). Nevertheless, Cen1 and Cen11 have megabase-sized arrays of CentO, and other rice centromeres have array sizes that lie in between those of Cen4 or Cen8 and Cen1 or Cen11. Therefore, the low level of CentO in Cen8 places it at one extreme of a continuum of array sizes in this species. Large CentO arrays fall in the size range of satellite DNA arrays found in other multicellular organisms; for example, the human Y-chromosome alpha satellite array ranges from 400 kb to several megabases in normal men37.

The 750-kb rice Cen8 kinetochore region is similar in size and chromatin composition to centromeres of other eukaryotes. The D. melanogaster minichromosome centromere is 420 kb33,34, a maize B chromosome centromere lies in a 500-kb region38 and two human neocentromeres are 330 kb and 460 kb26,27. Like other centromeres, rice Cen8 incorporates both CENH3- and H3-containing nucleosomes. This is expected from models of the centromere, in which the external face of the chromatid consists of CENH3-containing nucleosomes that interact with microtubules and the heterochromatic core promotes sister chromatid cohesion30,39. In support of this model, we find that dimethylated H3-Lys9, which is abundant in heterochromatin, is strongly enriched throughout and beyond the kinetochore region, and dimethylated H3-Lys4, which is abundant in euchromatin, is low in the kinetochore region relative to flanking regions. To account for the overlap between CENH3 and dimethylated H3-Lys9 bound to the same DNA segments in a population of Cen8s, we propose that there is variation in the distribution of nucleosome types between Cen8s. As a result, DNA segments throughout the kinetochore region will be high in both CENH3 and dimethylated H3-Lys9 (compare Fig. 2c with 2d). Variation in the location of CENH3-containing nucleosomes relative to H3-containing nucleosomes is consistent with the interspersion and plasticity of long arrays of CENH3 and H3 observed for individual fibers in flies and humans30.

The presence of active genes within the kinetochore region was unexpected, because centromeres are embedded in heterochromatin. But many fruit fly genes have been found to inhabit pericentric heterochromatin at low density40, where they are expressed despite association with heterochromatin proteins41. The same appears to be true for Arabidopsis, in which one study mapped genes to a transposon-rich region thought at the time to be within centromeric DNA20. More recent work has shown that all five Arabidopsis centromeres lie within 178-bp satellite repeat arrays that extend for 3 Mb4,24, and there are no unique sequences or active gene candidates in any Arabidopsis centromere. Therefore, the contrast between Arabidopsis centromeres that lack genes and rice centromeres that have them is attributable to the abundance of satellite arrays.

Incompatibility between genes and centromeres has also been suggested from the disruption of a centromere by transcription in budding yeast42. But the yeast centromere is only 125 bp in length with a single CenH3 nucleosome8, and so it might be easily disrupted, whereas the relatively enormous centromeres of plants and animals might not be noticeably affected by transcription. In addition, kinetochores function only during mitosis and meiosis, whereas transcription occurs only during the remainder of the cell cycle, and so cohabitation of genes and centromeres might be compatible with their disparate functions.

The phenomenon of active genes in rice Cen8 resembles human neocentromeres that form de novo in chromosome arms. Neocentromeres arise in rather ordinary regions that contain genes26,27. Active genes were recently found in the functional neocentromere of mardel(10), a human marker chromosome derived from Chromosome 10 (ref. 43). As this marker chromosome is of recent origin and exists in a genome with two normal copies of each gene on the two normal Chromosome 10s, the active genes at this neocentromere are untested by evolution, whereas each gene in rice Cen8 is fixed in the species. Nevertheless, the parallels between human neocentromeres and Cen8 suggest that they represent different stages in an evolutionary sequence.

The emergence of centromeres from genic regions is a rare event in karyotype evolution. For example, in most primates, the X-chromosome centromere is monophyletic, consisting of an X-specific subfamily of alpha satellite that is present in multi-megabase arrays3. During the evolution of the X chromosome in lemurs, however, new centromeres have emerged twice in ancestrally distal locations44. It is unclear whether these rare events represent successful transpositions of other native centromeres or arose de novo from genic regions and acquired centromere-specific sequences later45. The lack of intermediate stages underscores our ignorance of the process of centromere evolution.

We propose that the earliest stage in centromere evolution is represented by human neocentromeres, in which a kinetochore forms in a genic region with loss or inactivation of the native centromere37. In most cases, this neocentromere-containing chromosome would become extinct. In the rare case of fixation, however, sequences would adapt to their new roles in mitosis by acquiring blocks of satellite DNAs and newly inserted centromere-specific transposons44. Rice Cen8 would thus resemble an intermediate stage in the progression from neocentromeres to mature centromeres. During later stages, centromere meiotic drive would favor the expansion of satellite DNA arrays to multi-megabase lengths1. Meanwhile, transposons would accumulate in the resulting pericentromeric regions where meiotic recombination is suppressed40, and the resident genes would gradually adapt to a heterochromatic environment4,20,46,47. The CENH3 binding domain of Cen8 shows no structural, compositional or age differences from its flanking regions, except for the presence of CentO satellite and the enrichment of CRR elements. These observations suggest that Cen8 may be at an early stage of centromere evolution.

A strength of this evolutionary scenario is that it unites observations made in plants and animals, ancient lineages in which centromeres must have evolved de novo many times. This inevitable progression from a diverse genic region to a monotonous block of centromeric repeat arrays is analogous to common ecological processes that result in the dominance of a climax species. For instance, in temperate zones, a new forest evolves into a climax forest dominated by a single species that continually replaces itself. In the case of centromere progression from a genic region, expansion of satellite DNA would lead to the climax state, in which continual homogenization of satellite repeats would maintain the kinetochore over long evolutionary periods.

Methods

FISH and fiber FISH.

We used O. sativa spp. japonica rice var. Nipponbare for all experiments. We prepared meiotic pachytene chromosomes and carried out chromosomal FISH essentially as described29. We carried out fiber FISH on extended genomic DNA fibers and BAC molecules as described29. We labeled DNA probes with biotin-dUTP or digoxigenin-dUTP (Boehringer Mannheim) and counterstained chromosomes with propidium iodide or DAPI in an antifade solution, Vectashield (Vector Laboratories). We used plasmid pRCS2 (ref. 14) as a probe to the CentO satellite repeat in FISH and fiber FISH analyses.

Sequencing.

We carried out standard high-throughput sequencing as described15. Briefly, we constructed a small- and large-insert shotgun library for each BAC clone and end-sequenced clones using dye terminator chemistry on ABI 3700 sequencers (Applied Biosystems). We assembled random sequences for each BAC clone using TIGR Assembler48 and identified clone linkages between the assemblies using the BAMBUS scaffolding software. We filled sequencing gaps using a combination of alternative chemistries, primer walking, resequencing of PCR products and transposon-mediated sequencing. We assembled highly repetitive areas, such as the CentO repeat region in a0038J12, using transposon-based sequencing with large-insert shotgun clones. Comparison of experimental with predicted restriction enzyme digestion patterns derived by agarose gel electrophoresis with a minimum of one restriction enzyme confirmed the final assembly of the finished BACs, except for an anomalous band in BAC a0070J19, possibly attributable to incomplete digestion.

BAC a0043P08 was only partially sequenced to confirm the overlap of BACs a0017M13 and a0003M24 in the tiling path. Five of the eight sequencing gaps, representing an estimated 3–11.4 kb, are located in the retrotransposons in BAC a0079E14, and this BAC remains unfinished. Two of the other gaps (in a0096K16 and a0003M24) resulted from GC hard stops in the sequence and are estimated to be less than 500 nucleotides each. The final gap (in a0038J12) resulted from assembly difficulties caused by the CentO repeat and was estimated at 7 kb. Examination of overlapping sequence between finished BAC clones that were sequenced independently identified 5 nucleotide differences in 261,119 bases of overlapping sequence (1 error per 52,223 bases). These five differences were a single nucleotide substitution (G→T) and an insertion of four bases (ATAT) in a local AT dinucleotide repeat.

CenH3 cloning and immunocytochemistry.

We detected clone E30313, consisting of cDNA encoding rice CENH3 inserted into pBluescript II SK+, by TBLASTN searching of the MAFF EST database. This clone was provided in plasmid form by T. Sasaki of the MAFF DNA Bank49. A peptide was synthesized to represent the 19 most N-terminal amino acids of the predicted protein followed by a cysteine for conjugation (ARTKHPAVRKSKAEPKKKLC-amide). The peptide was conjugated to keyhole limpet hemocyanin and injected into two rabbits (Biosource International). One of the resulting antisera was affinity purified versus the peptide for immunostaining and ChIP. We used commercial antibodies to detect dimethylated H3-Lys4 and dimethylated H3-Lys9 (Upstate Biochemicals).

We fixed rice roots in PHEMES (60 mM PIPES buffer, 25 mM HEPES buffer, 10 mM EGTA, 2 mM MgCl2, 0.35 M Sorbitol, pH 6.7) containing 3% paraformaldehyde and 0.2% Triton X-100 for 20 min and washed them three times in phosphate-buffered saline (PBS)50. Root tips were digested for 30 min at 37 °C with 1% cellulase Onozuka RS (Yakult Honsha) and 0.5% pectinase (Kikkoman) dissolved in PHEMES, washed three times in PBS and squashed onto slides coated with poly-L-lysine (Sigma).

We diluted antiserum to CENH3 1:100 in blocking solution and applied it to tissues on slides. Slides were covered with coverslips, sealed with rubber cement and incubated at 4 °C overnight. We then removed the coverslips and washed the slides three times with PBS. We detected the antibody by applying goat antibodies to rabbit conjugated to fluorescein isothiocyanate (Jackson ImmunoResearch) diluted 1:100 in blocking solution, incubating for 2 h and washing three times in PBS. Vectashield (Vector Laboratories) containing 1 μg ml−1 DAPI was mounted on the slide. All images were captured digitally using a SenSys charge-coupled device camera (Roper Scientific) attached to an Olympus BX60 epifluorescence microscope. The camera control and image analysis were done using IPLab Spectrum v3.1 software (Signal Analytics).

ChIP.

We carried out ChIP analysis as described24 using two-week-old etiolated rice seedlings. We used normal rabbit serum as a mock treatment. For ChIP analysis of repeat sequences, we separated the immunoprecipitated samples into Sup (unbound) and Pel (bound) fractions. We purified DNA from both bound and unbound fractions and slot-blotted it onto a HybondN+ membrane. We hybridized the membrane with PCR-amplified centromeric and noncentromeric repeats (Supplementary Table 2 online) or with the plasmids pTa794 (5S rDNA, 410 bp) or pOs48 (Os48, 355 bp). In each case, we subtracted the percent immunoprecipitation (defined as Pel/(Pel + Sup)) of the mock experiments from the percent immunoprecipitation of the antibody to CENH3 treatments. Each experiment was replicated in three independent tubes.

For ChIP-PCR analysis, we designed primers to exclusively single-copy regions of Cen8 (Supplementary Table 2 online). Negative controls were the O. sativa triosephosphate isomerase gene (Rictpi, at 12.2 cM region of chromosome 1) and the myristoylated alanine-rich C-kinase substrate-like gene (at 99.1 cM region of Chromosome 8). We carried out PCR using the primers with the bound fractions from treatments with the antibodies to CENH3, to dimethylated H3-Lys9, to dimethylated H3-Lys4 and mock treatments as templates. PCR conditions were 30 cycles at 94 °C for 30 s, annealing at the specific temperature for each primer set for 30 s, and 72 °C for 1 min. We separated the products by electrophoresis and blotted them on HybondN+ membrane (Amersham). We carried out hybridization, washes and detection as in the RT-PCR procedure (below). We calculated the relative enrichment by comparing antibody-associated PCR product ratios to product ratios from mock experiments using the following formula: relative enrichment = (cen8/Rictpi)antibody / (cen8/Rictpi)mock. The probability (P) that the mock fractions and antibody fractions belong to same group was determined by t-test.

RT-PCR.

We germinated sterilized rice seeds in a sterile glass bottle and grew them for ten days in a cycle of 8 h light and 16 h dark at 20–25 °C. We extracted RNA from shoots and roots of the seedlings with AquaPure RNA isolation kits (BIO-RAD Laboratories). We synthesized cDNA with the RNA and ProtoScript First Strand cDNA Synthesis kit (New England Biolabs).

We determined copy numbers of the products in the rice genome by Southern-blot hybridization. Rice genomic DNA was digested with BamHI, HindIII and EcoRI and blotted on HybondN+ membrane. In all hybridization experiments, we incubated membranes at 65 °C overnight and washed them sequentially with 2× saline sodium citrate (SSC) + 0.1% SDS, 0.5× SSC + 0.1% SDS, and 0.1× SSC + 0.1% SDS. Signals were detected by phosphorimaging.

URLs.

CUGI rice FPC Database, http://www.genome. clemson.edu/projects/rice/fpc/; TIGR/Oryza/Repeat Database, http://www.tigr.org/tdb/e2k1/plant.repeats/; BAMBUS, http://www.tigr.org/software/bambus/.

GenBank accession numbers.

BACs, AY360384AY360394; CenH3, AY438639.

Note: Supplementary information is available on the Nature Genetics website.