Main

Outbreaks of rice blast disease are a serious and recurrent problem in all rice-growing regions of the world, and the disease is extremely difficult to control1,2. Rice blast, caused by the fungus Magnaporthe grisea, is therefore a significant economic and humanitarian problem. It is estimated that each year enough rice is destroyed by rice blast disease to feed 60 million people3. The life cycle of the rice blast fungus is shown in Fig. 1. Infections occur when fungal spores land and attach themselves to leaves using a special adhesive released from the tip of each spore4. The germinating spore develops an appressorium—a specialized infection cell—which generates enormous turgor pressure (up to 8 MPa) that ruptures the leaf cuticle, allowing invasion of the underlying leaf tissue5,6. Subsequent colonization of the leaf produces disease lesions from which the fungus sporulates and spreads to new plants. When rice blast infects young rice seedlings, whole plants often die, whereas spread of the disease to the stems, nodes or panicle of older plants results in nearly total loss of the rice grain2. Different host-limited forms of M. grisea also infect a broad range of grass species including wheat, barley and millet. Recent reports have shown that the fungus has the capacity to infect plant roots7.

Figure 1: Life history of Magnaporthe grisea.
figure 1

a, Asexual spores called conidia germinate and develop a specialized infection structure, the appressorium. Invasive growth within and between cells culminates with sporulation and lesion formation. Sexual reproduction occurs when two strains of opposite mating type meet and form a perithecium in which ascospores develop. Once released, ascospores can develop appressoria and infect host cells. b, Rice blast lesions on leaves. c, Node blast symptoms on rice, which cause lodging and yield loss. d, Scanning electron micrograph of an appressorium (AP) forming after germination of a conidium (CO) on the rice leaf cuticle. The appressorium generates enormous turgor to provide mechanical force to breach the leaf surface. Scale bar, 10 µm e, Transmission electron micrograph of an appressorium, showing a penetration hypha entering the leaf and developing invasive hyphae (IH). Scale bar, 5 µm.

Here we present our preliminary analysis of the draft genome sequence of M. grisea, which has emerged as a model system for understanding plant–microbe interactions because of both its economic significance and genetic tractability1,2.

Acquisition of the M. grisea genome sequence

The genome of a rice pathogenic strain of M. grisea, 70-15, was sequenced through a whole-genome shotgun approach. In all, greater than sevenfold sequence coverage was produced, and a summary of the principal genome sequence data is provided in Table 1 and Supplementary Table S1. The draft genome sequence consists of 2,273 sequence contigs longer than 2 kilobases (kb), ordered and orientated within 159 scaffolds. The total length of all sequence contigs is 38.8 megabases (Mb), and the total length of the scaffolds, including estimated sizes for the gaps, is 40.3 Mb. The genome assembly has high sequence accuracy—96% of the bases have quality scores of greater than 40—and long-range continuity, with 50% of all bases residing in scaffolds longer than 1.6 Mb.

Table 1 Magnaporthe grisea assembly features

Reconstruction of the M. grisea genome was aided by the availability of genome maps8,9 (Supplementary Methods S1). Thirty-three scaffolds, representing 32.8 Mb or 85% of the draft assembly, were ordered on the genetic map and assigned to each of the seven chromosomes by virtue of containing an anchored genetic marker. In addition, 19 scaffolds (65% of genome assembly) contained more than one marker and could thus be oriented on the map. The ends of chromosomes were identified by the telomere repeat motif (TTAGGG)n. Thirteen telomeric sequences were placed at the ends of scaffolds, of which six could be placed at the ends of chromosomes, whereas the remainder were associated with unanchored scaffolds (Supplementary Table S2). Genome coverage was estimated by aligning 28,682 M. grisea expressed sequence tags (ESTs), representing genes expressed during a range of developmental stages and environmental conditions10,11. Approximately 94% of the ESTs were aligned to the genome assembly, despite many of these ESTs being from different strains.

The gene content of a plant pathogenic fungus

Within the M. grisea genome, 11,109 genes were predicted with protein products of longer than 100 amino acids (Supplementary Methods S2). These predicted genes comprise 48% of the assembly (1 gene per 3.5 kb). For comparison, 10,082 genes were predicted in Neurospora crassa, a related pyrenomycete, and 9,457 were predicted in the more distantly related plectomycete Aspergillus nidulans (also known as Emericella nidulans; http://www.broad.mit.edu/annotation/fungi/aspergillus/). Neither of these species causes plant disease. At the amino acid level, M. grisea orthologues in N. crassa and A. nidulans show an average identity of 47% and 46%, respectively.

Given that the M. grisea genome possesses more genes than the non-plant pathogens N. crassa and A. nidulans, and that these fungi are well-studied model organisms, we compared genes between these species to identify potential cases of gene family expansion that might be associated with the evolution of M. grisea as a plant pathogen.

Single linkage clustering of protein sequences resulted in 348 families containing five or more genes. Overall, more proteins from M. grisea (1,266, P < 0.001) and A. nidulans (1,424, P < 0.001) were classified into families as compared with those from N. crassa (950). An important factor in this comparison is that N. crassa has a process called repeat-induced point mutation (RIP), a mechanism that eliminates duplicated genes during meiosis and therefore limits the ability of N. crassa to undergo paralogous gene duplication and consequent gene family expansion12. Twenty-eight families were found with significant differences (P < 0.05) in gene content between the three species, nine of which were larger in M. grisea than they were in the genome of either N. crassa or A. nidulans (Table 2; see also Supplementary Table S3). Several gene families expanded in M. grisea exhibited sequence similarity to proteins that are involved in fungal pathogenicity13.

Table 2 Gene family content differences between M. grisea, N. crassa and A. nidulans

Phylogenetic trees (Supplementary Fig. S1) of the expanded gene families in M. grisea indicated that none of the families showed evidence for recent lineage-specific expansion. These gene families in M. grisea may therefore be the result of ancient gene duplication events followed by loss of gene family members in the N. crassa and A. nidulans lineages.

Genome architecture and co-linearity

Co-linearity of chromosome segments (synteny) is widely reported for plant and animal genomes14, but little is known about this for filamentous ascomycete fungi. Analysis of orthologous pairs of genes in M. grisea and N. crassa revealed no evidence for extensive regions of conserved synteny, although linkage group assignments were often conserved between the two species, notably chromosome 7 in M. grisea with linkage group I of N. crassa (Supplementary Table S4). Only 113 regions containing four or more genes were found to be co-linear between M. grisea and N. crassa. One example, also conserved in several other filamentous fungi, is the quinate/shikimate (Qa) metabolic pathway gene cluster (Supplementary Fig. S2). This seven-gene cluster, spanning 20 kb, which participates in quinate use and aromatic amino acid catabolism, is found on chromosome 3 in M. grisea, and is syntenic in N. crassa, A. nidulans and other filamentous ascomycetes, but is not present in Saccharomyces cerevisiae, Schizosaccharomyces pombe or other yeasts.

M. grisea has a family of novel G-protein-coupled receptors

To be a successful plant pathogen, a fungus must undergo a series of morphological and physiological programmes5. During these developmental transitions, the pathogen also overcomes (or suppresses) the plant's innate immune system and perturbs host metabolism and cell signalling to favour fungal growth. We have attempted to define the most important characteristics of the genome of M. grisea that are associated with its pathogenic lifestyle.

G-protein-coupled receptors with seven transmembrane helices (GPCRs) transduce environmental signals, by means of heterotrimeric G proteins, to activate secondary messengers and regulate gene expression15,16. The M. grisea genome contains a large repertoire of GPCR-like genes, including 61 not previously described. Twelve of these genes form a subfamily (Supplementary Figs S3 and S4), and contain a conserved fungal-specific extracellular membrane-spanning domain (CFEM) at the amino terminus17 that resembles the epidermal growth factor (EGF) module present in certain human GPCRs15. Notably, one of the CFEM-GPCRs, Pth11, is required for appressorium formation and pathogenesis2,18, and M. grisea has the largest number of CFEM-GPCR proteins among sequenced fungi. In contrast, only one was detected in N. crassa and two in A. nidulans, and this type of GPCR is completely absent in the non-filamentous ascomycetes S. cerevisiae, S. pombe, Candida albicans and Pneumocystis carinii, or the basidiomycete fungi Cryptococcus neoformans, Ustilago maydis and Phanerochaete chrysosporium (Fig. 2). Moreover, whole-genome microarray analysis confirmed that the identified CFEM-GPCR-like proteins are expressed during infection-related development, and two CFEM-GPCR genes are specifically upregulated when the fungus is undergoing appressorium formation, as shown in Fig. 2. Together, these data suggest that M. grisea has greater flexibility to react to extracellular signals compared with saprobic fungi. The pathogenic lifestyle, which involves transitions from plant surface to colonization of distinct plant tissues, may therefore require the ability to respond to a wider range of physical and environmental stimuli.

Figure 2: Differential expression of selected M. grisea genes during infection-related development.
figure 2

a, b, Appressorium formation (arrows) at 7 h (a; cessation of polar growth and tip hooking) and 12 h (b; tip swelling and melanization) after germination. c, Expression profiles of selected genes from conidia germinated on hydrophobic appressorium-inducing (I) and hydrophilic non-appressorium-inducing (NI) surfaces. The colour scale indicates transcript abundance relative to ungerminated conidia: red, increase in transcript abundance; green, decrease in transcript abundance. d, Fold change in transcript abundance (cutinases (CUT), red; polyketide synthases (PKS), blue; and Pth11 homologues with CFEM domain, green) at 7 and 12 h after germination on a hydrophobic surface compared with a hydrophilic surface. Only genes exhibiting a twofold or greater (≤ - 1 or ≥1) change in expression are shown.

Virulence-associated signalling pathways in M. grisea

The M. grisea genome contains three mitogen-activated protein kinase (MAPK) cascades that regulate appressorium development, penetration peg formation and adaptation to hyper-osmotic stress (Fig. 3). Although there are similar numbers of MAPK pathways in other fungi (N. crassa has three, A. nidulans four and S. cerevisiae five), the processes regulated by the pathways in different organisms are distinct. Two of the three MAPK pathways in M. grisea control virulence-associated development. The core of the Pmk1 MAPK (pathogenicity MAPK) pathway, which regulates appressorium formation in M. grisea, is clearly related to both the pheromone signalling and filamentation pathways from S. cerevisiae, and Pmk1 is able to function in place of either the Fus3 or Kss1 MAPK in yeast19. In M. grisea, however, Pmk1 pathway components such as the MAPK are involved in mating and pathogenesis, whereas Mst12, the Ste12-related transcription factor, is dispensable for mating altogether. These observations demonstrate the functional divergence of these signalling pathways. Furthermore, homologues of Pmk1 are required for fungal virulence in all plant pathogens in which they have been investigated20. The Pmk1 signalling pathway revealed from the genome sequence is therefore likely to be of generic importance for fungal pathogenesis.

Figure 3: Comparison of major signalling pathways between S. cerevisiae and M. grisea.
figure 3

The core components of MAP kinase-, cAMP-dependent and calcium-signalling pathways are conserved; however, receptors and downstream targets are less conserved. Protein names in red indicate previously identified M. grisea homologues; names in black are S. cerevisiae protein names. The colour of the boxes around the protein names indicates the degree of conservation between M. grisea and S. cerevisiae proteins based on BLASTP: blue, e-value <1 × 10-10; orange, e-value >1 × 10-10. Ovals indicate instances where the M. grisea homologue was shown to be required for pathogenicity. Ovals with black borders are M. grisea proteins not found in S. cerevisiae or not implicated in the S. cerevisiae pathway. InsP3, inositol triphosphate; DAG, diacylglycerol.

A principal difference in the operation of MAPK signalling in M. grisea compared with S. cerevisiae is the absence of a clear Ste5 homologue. A homologue also seems to be absent in other filamentous ascomycetes, including A. nidulans and N. crassa. In yeast, Ste5 is the scaffold protein that conditions specificity in MAPK signalling21. The absence of an identifiable scaffold implies that either a hitherto uncharacterized protein tethers the MAPK signalling module together in M. grisea and provides specificity in signal transmission, or that MAPK specificity is governed by a different mechanism in this fungus.

Cyclic AMP signalling is required for the induction of appressorium formation and for the turgor-driven process that leads to plant infection. Until now it has not been completely clear how this cAMP signal is transmitted. A known cAMP-dependent protein kinase A (PKA) catalytic subunit (CPKA) is required for appressorium maturation, but is dispensable for early appressorium development19,22. The genome has revealed a gene that putatively encodes a second PKA catalytic subunit (MG02832.4). Evaluation of this gene may shed light on how cAMP-mediated signalling operates both at initiation and later stages of appressorium formation. The Pth11 GPCR also operates upstream of the cAMP signalling pathway in M. grisea18, suggesting that the CFEM-GPCR family could provide a number of distinct and unforeseen inputs into this pathway.

Turgor-driven plant infection by M. grisea

Appressoria of M. grisea generate the enormous turgor pressure needed to breach the plant cuticle through accumulation of up to 3 M concentrations of glycerol. Analysis of the M. grisea genome suggests that germinating spores possess considerable versatility in their capacity to synthesize glycerol in the appressorium from storage products. Notably, in contrast to S. cerevisiae, where fatty acid β-oxidation occurs solely in peroxisomes, M. grisea has several genes that putatively encode acyl-CoA dehydrogenases, but does not appear to possess a gene encoding acyl-CoA oxidase. Thus, β-oxidation in M. grisea may occur both in mitochondria and in catalase-free glyoxysome-like bodies, as in N. crassa, which would allow use of a wide range of fatty acid substrates, including branched-chain fatty acids23. M. grisea also seems to have the capacity to synthesize glycerol from the glycolytic intermediates dihydroxyacetone phosphate and dihydroxyacetone. Activity of both NADH-dependent glycerol-3-phosphate dehydrogenase and NADPH-dependent glycerol dehydrogenase has been reported in developing appressoria of M. grisea24. Taken together, the apparent flexibility in lipid metabolism and ability to divert intermediates from glycolysis may be important for rapid glycerol accumulation during appressorium development.

M. grisea possesses a complex secreted proteome

The secreted proteome is a crucial component of the ability of fungi to perceive and respond to the environment. We identified the presence of 739 proteins that are predicted to be secreted by M. grisea, approximately twice the corresponding number for N. crassa. Part of this difference reflects an expansion in protein families in M. grisea. For example, 163 putatively secreted proteins occur in families containing at least twice as many members as the corresponding family in N. crassa (Supplementary Table S5). Several of these expanded families are predicted to encode enzymes for degradation of the plant cell wall and cuticle. For example, eight genes in M. grisea putatively encode cutinases—methyl esterases that degrade cutin, the waxy polymer that forms the leaf cuticle. Several of these genes are significantly upregulated during infection-related development (Fig. 2). Previous experiments involving deletion of the cutinase CUT1 (MG01943.4) led to the conclusion that enzymatic degradation of the cuticle was dispensable for plant infection25. However, CUT1 is not among those genes differentially regulated during appressorium formation. Any of the remaining seven cutinases may be capable of complementing the activity of Cut1 in the Δcut1 mutant. Coupled with the absence of cutinase-encoding genes in N. crassa—which colonizes dead plant tissues—these data strongly suggest that cutinases have a significant role in M. grisea.

Among the secreted proteins predicted for M. grisea, many contained consensus carbohydrate substrate-binding domains, consistent with a role in attachment and colonization of plant tissue (Supplementary Table S6). Specifically, a role for chitin-binding proteins in plant–fungus interactions has been proposed from studies of the avirulence protein Avr4 of the tomato pathogen Cladosporium fulvum26. Avr4 is a chitin-binding protein that may act to protect the fungal cell wall from chitinases produced by plants' innate immune response. Inspection of proteins containing the cysteine pattern found in Avr4 and other variant motifs revealed a novel pattern that is highly abundant in M. grisea. The novel variant cysteine pattern CX7CCX5C is present in 36 copies of 21 predicted proteins. In contrast, this pattern occurs just eight times in A. nidulans, four times in N. crassa, and not at all in S. cerevisiae open reading frames (Supplementary Fig. S5). This motif probably represents a variant chitin-binding motif whose abundance may be diagnostic for plant-associated filamentous ascomycetes.

Fungal effectors and PAMPs

Pathogenic microorganisms of plants are known to secrete proteins directly into host plant cells to perturb host cell signalling or suppress the plant innate immune system27,28. The plant adaptive immune system has, in turn, evolved to recognize pathogen effector proteins (often termed pathogen-associated molecular patterns, PAMPs). In the M. grisea–rice interaction, this immunity is governed by a gene-for-gene system (one gene in the host conditions resistance to a pathogen effector protein encoded by a single pathogen gene).

Interrogation of the M. grisea genome for putative effector proteins revealed three families of putatively secreted, cysteine-rich polypeptides (clusters 180, 360 and 641) and a protein family with similarity to the necrosis-inducing peptide of Phytophthora infestans (NPP1, pfam05630)29. As described above, cysteine-rich polypeptides are recognized as PAMPs, as exemplified by the C. fulvum–tomato interaction30. In addition, a novel family of proteins with similarity to the N-terminal half ( 150 amino acids) of the enterotoxin A chain (pfam01375) was noted. This region of the enterotoxin A chain contains ADP-ribosylation activity31, which suggests that the M. grisea protein may interact with rice GTP-binding proteins. The genome of the sequenced M. grisea strain 70-15 contains four known M. grisea avirulence genes: AVR-Pita, ACE1, PWL2 and PWL3. No orthologues were found to M. grisea PWL1, PWL4 or AVR1-CO39, or to well-characterized AVR genes from other pathogenic fungi including Avr2, Avr4, Avr9, ECP2, ECP3 and ECP5 from C. fulvum30, and NIP1 from Rhynchosporium secalis32. This highlights the diversity and lack of sequence similarity or conservation in the fungal avirulence gene products identified so far.

Secondary metabolic pathways of M. grisea

Filamentous fungi are well known producers of secondary metabolites, which in nature fulfil a variety of functions probably to allow for niche exploitation. Plant pathogenic fungi produce diverse secondary metabolites that aid in pathogenicity, such as host-selective toxins33. The M. grisea genome displays a considerable capacity for production of secondary metabolites. There are 23 genes predicted to encode polyketide synthases (PKS), compared with seven PKS genes in Neurospora, and three of the PKS-encoding genes in particular are upregulated during infection-related development (Fig. 2). Beyond containing the ketosynthase domain, the structure of these proteins is highly divergent (Supplementary Fig. S6). Most of the 23 PKS genes appear to occur in gene clusters with neighbouring genes that are predicted to encode enzymes such as cytochrome P450 and mono-oxygenases, which typically modify or customize the polyketide backbone to form a functional secondary metabolite (Supplementary Table S7). The diversity of PKS genes in filamentous ascomycete fungi and the poor conservation of clearly orthologous PKS genes, even in related species34, indicates that considerable variability is likely to exist in the polyketide metabolites generated by pathogenic fungi.

Non-ribosomal peptide synthetases (NRPS), which catalyse production of cyclic peptides including numerous toxins, also seem to be well represented in the M. grisea genome. Two distinct subclasses exist: those that exist separately and those that are fused to a PKS (PKS–NRPS). Overall, there are six likely NRPS genes and eight PKS–NRPS genes in the M. grisea genome: six full-length PKS–NRPS genes and two PKS-NRPS genes with a truncated NRPS domain. This contrasts with two predicted NRPS genes and one NRPS-related gene reported in the N. crassa genome sequence. One of the hybrid PKS-NRPS proteins is encoded by the ACE1 gene, which has recently been shown to act as an avirulence gene35. The large number, and expression profile, of PKS and NRPS genes in M. grisea is consistent with the requirements of a fungal pathogen in adapting to diverse environments, perturbing host metabolism and ultimately causing plant cell death.

Repetitive DNA and repeat-induced point mutations

M. grisea exhibits a high degree of genetic variability, and novel pathogenic variants capable of infecting formerly resistant host plants arise with alarming frequency during rice cultivation36. Such gains of virulence are often associated with transposon-mediated inactivation or deletion of PAMP-encoding genes whose products trigger the plant adaptive immune system37,38. Thus, an understanding of the natural history of repetitive elements in M. grisea not only provides an insight into their impact on genome evolution but also sheds light on mechanisms of pathogenic variation. Approximately 9.7% of the M. grisea genome assembly comprises repetitive DNA sequences longer than 200 base pairs (bp) and with greater than 65% similarity. Four previously unknown repeats were discovered, as were alternative forms for three previously described transposons. The genome sequence also revealed full-length sequences of two elements for which only incomplete sequences were previously available.

Most repetitive sequences in the assembly are retrotransposons comprising eight major families (Table 3). Five are retroelement families and three are DNA transposons. These repetitive elements are not uniformly distributed in the genome assembly, but form discrete clusters (Supplementary Fig. S7 and Supplementary Note S1)39. Further examination revealed many examples of elements inserted into copies of themselves or other elements. Examination of integration events occurring within other transposons provides evidence for extensive past recombination in the genome (Supplementary Discussion S1). Given the prevalence of repetitive elements and their ability to participate in recombination, it is perhaps surprising that an organism could tolerate such substantial genomic change. However, in nature, rice pathogenic strains of M. grisea propagate asexually and, as such, genome organization is rarely, if ever, subject to the potentially catastrophic effects of meiotic recombination involving homologous chromosomes with radically different structures40. Thus, rearrangements that would normally have been purged by meiosis appear to have been maintained in the absence of deleterious effects on vegetative fitness. Some rearrangements are expected to have positive fitness benefits, especially those that result in loss of genes whose products would normally trigger defence responses in potential hosts. Many host-specificity genes in M. grisea are situated in transposon-rich regions of the genome; this arrangement provides ample opportunity for host-range expansion through gene loss37,41.

Table 3 Characterization of repeat elements in the M. grisea genome

The prevalence of intact and essentially identical repeated DNA elements in the genome suggests that M. grisea has been unable to stop their proliferation. This is of interest, because RIP has previously been reported to occur in at least one strain (Br48) of M. grisea42. Inspection of M. grisea repeats reveals evidence for RIP (Supplementary Methods S3). However, for Pyret, the repeat family showing the most extensive RIP-like mutations, only three-quarters exhibit signs of RIP and show an average of only 11.4% sequence divergence from the reference sequence. Other elements such as Pot2 are present in over 100 apparently intact copies, displaying an average of 99.4% nucleotide identity. In contrast, all repeat elements in N. crassa show heavy RIP mutation, typically to greater than 20% divergence12. The presence of many intact, highly similar elements despite evidence of RIP can be explained by a number of factors. First, it is possible that RIP was lost in the sequenced strain. However, within the genome sequence there are numerous repetitive elements that show only transition mutations. This, combined with the presence of a gene in M. grisea predicted to encode a DNA methyltransferase homologous to the RID gene of N. crassa, which is required for RIP, indicates recent and possibly ongoing RIP43. Second, the experimental evidence for RIP in M. grisea also demonstrated that the process is both less efficient at recognizing repeats and less severe in the number of mutations induced than in N. crassa42. Therefore, it is possible that many transposable elements in M. grisea simply escape damage by RIP. Finally, because RIP is only known to operate during the sexual cycle, the lack of RIP-like mutations in most repetitive elements may reflect the predominance of vegetative propagation during the recent evolution of rice pathogenic strains of M. grisea.

Discussion

This is the first analysis of the genome of a plant pathogenic fungus. Our analysis of the M. grisea genome has allowed a much greater appreciation of the likely attributes required by a plant pathogenic fungus to invade and colonize a living host plant. M. grisea has a considerably diverse set of proteins involved in extracellular perception and signal transduction, an extensive array of secreted proteins and secondary metabolites, specifically adapted regulatory pathways controlling infection-related development, and a genome capable of generating considerable genetic variation even in the absence of sexual reproduction. New opportunities for disease control and novel targets—such as unique families of CFEM-GPCR surface receptors and secreted cysteine-rich proteins—for development of durable fungicides are already apparent from our analyses. Acquisition of the genome sequence of M. grisea and that of its host, rice44,45, coupled with the genetic and experimental tractability of this pathosystem, will enable a systems biology approach to the study of a plant–fungal interaction. Large-scale, insertional mutagenesis projects, transcriptional profiling and proteomic analysis are already in progress and offer the possibility of an enhanced understanding of the processes by which a fungus causes plant disease.

The genome sequence of M. grisea illustrates an extraordinary feature of the fungal kingdom, namely the extent of sequence diversity between what are thought of as closely related species from a taxonomic perspective. For example, although M. grisea and N. crassa are both pyrenomycetes and are often studied using similar methods in the same laboratory, they exhibit a degree of sequence diversity similar to that found between human and Xenopus46.

Methods

Strain and growth conditions

Rice-infecting M. grisea strain 70-15 (ref. 47) (Fungal Genetics Stock Center 8958) was grown in liquid complete medium, and DNA was extracted as previously described48.

Sequencing and assembly

Plasmid (4-kb inserts) and fosmid (40-kb inserts) libraries were generated and end-sequenced as described (http://www.broad.mit.edu/annotation/ fungi/magnaporthe/assembly.html#clones). Bacterial artificial chromosome (BAC) libraries were generated and end-sequenced as described49. The draft genome sequence was assembled using Arachne (http://www.broad.mit.edu/wga/). Mapped genetic markers were associated with sequence scaffolds through hybridization to end-sequenced BACs (Supplementary Methods S1). ESTs were aligned to the genome using a BLAST e-value cutoff of ≤10-20. Assembly version 2 was used for all subsequent analyses.

Annotation and analysis

Automated gene predictions and annotations were performed using Calhoun (http://www.broad.mit.edu/annotation/fungi/magnaporthe/gene_finding.html). Gene predictions were performed using FGENESH/FGENESH1 + trained on M. grisea sequences (SoftBerry) and GENEWISE (Sanger Center) and validated against 65 characterized M. grisea genes. Additional information and gene identification numbers for genes featured in this article are presented in Supplementary Table S8.

To perform co-linearity analyses, amino acid identity between M. grisea and N. crassa was first determined by comparing the predicted proteins from each fungus using BLASTP. Homologues with the best hit were aligned using ClustalW and the amino acid per cent identity for each pair was calculated. M. grisea and N. crassa genes were considered to belong to a conserved cluster if there was less then 10 kb between any two genes in the cluster. Homologous genes in clusters between species were accepted if alignments spanned ≥60% of both genes and the alignment score was within 80% of the top score for either of the pair of genes. In this analysis, a gene may be placed in more than one cluster. No attempt was made to identify or resolve these cases.

Blastclust (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/) was used to cluster predicted peptide sequences from M. grisea, N. crassa and A. nidulans into families using threshold limits of 30% identity and 80% length overlap. Functional annotations were manually curated by examining blast searches to the GenBank NR protein database, SwissProt and a database of M. grisea repetitive sequences. A heterogeneity G-test was performed to identify families with significant content differences between species. Pair-wise post-hoc tests were performed using the Bonferroni correction for multiple comparisons. Families with significant similarity to proteins encoded by transposable elements were removed.

To identify G-protein-coupled receptor-like proteins, known GPCR sequences, including ones present in GPCRDB (www.gpcr.org/7tm/), were blasted (e-value limit of ≤10-9) against the M. grisea predicted proteins. These, in addition to candidates identified from an Interpro scan, were confirmed to have seven transmembrane spans by TMPRED (http://www.ch.embnet.org/software/TMPRED_form.html), Phobius (http://phobius.cgb.ki.se/) and TMHMM (http://www.cbs.dtu.dk/services/TMHMM/). Default settings were used. N. crassa and A. nidulans CFEM-containing GPCRs were identified by BLASTP.

Putative polyketide synthases and non-ribosomal peptide synthases were identified by comparison of the genome sequence to known sequences in GenBank from A. nidulans, Leptosphaeria maculans, Nectria haematococca, Gibberella moniliformis and Cochliobolus heterostrophus.

Microarray experiments were performed using conidia germinated in water on either the hydrophobic or hydrophilic side of GelBond. At 7 and 12 h after germination, samples were flash frozen with liquid nitrogen, scraped from the support, ground and RNA extracted using Sigma Trizol reagent. RNA was similarly extracted from ungerminated conidia. RNA from two biological replications of each treatment was pooled in equal amounts and labelled with Cy3 and Cy5 dyes using Agilent Technologies low input linear amplification kit. Samples were hybridized to the Agilent M. grisea whole genome oligo array (product G4137A) using manufacturer protocols and reagents. Hybridizations were performed in an interlaced loop where each treatment was paired with every other. A total of ten hybridizations were performed, each treatment was used in four hybridizations (two with Cy3 and two with Cy5). Spot fluorescence was normalized using Lowess within and between slides, and gene expression profiles were analysed in GeneSpring. Microarray data may be accessed through NCBI GEO (http://www.ncbi.nlm.nih.gov/projects/geo/) accession GSE1945.

Additional information concerning genome sequencing and analysis can be found at http://www.broad.mit.edu/annotation/fungi/magnaporthe/.