Main

The genome of A. fumigatus Af293 was sequenced by the whole-genome random sequencing method9 augmented by optical mapping10. Genome closure and quality standard attainment was accomplished by directed sequencing and manual editing. (See Table 1 and Supplementary Fig. S1 for genome features.) Sequenced chromosomal arms extend from putative centromeres to the telomere and end in 7–21 tandem repeats of the sequence TTAGGG. The copy number of the mitochondrial genome relative to the nuclear genome is estimated to be 12 based on the redundancy in the assembled sequence. The protein-coding genes and other genome features were identified by an automated annotation pipeline coupled with manual review.

Table 1 Properties of the Aspergillus fumigatus Af293 genome

Several candidate pathogenicity genes have been previously identified by assaying mutants in cultured macrophages or in animal models of invasive aspergillosis. These genes encode proteins involved in central metabolic pathways, signalling, cell wall biosynthesis, pigment biosynthesis and regulation of secondary metabolite production (Supplementary Table S1). This scope of functions suggests that the genomic infrastructure for pathogenicity is complex and integrated with a range of metabolic capabilities. Thus, any computationally based analysis of the genome sequence would not be directly able to identify functions critical for pathogenicity.

A. fumigatus thermotolerance is a trait critical to its ability to thrive in mammalian and avian infections and in the even-higher temperature ranges characteristic of composts (that is, up to 70 °C). To investigate the metabolic adaptation of this fungus to higher temperatures, gene expression was examined throughout a time course upon shift of growth temperatures from 30 °C (representing environments of tropical soil) to 37 °C and 48 °C (representing temperatures in the human body and compost, respectively). Gene expression patterns revealed that comparable numbers of genes were differentially expressed at each temperature, many of them with similar patterns (Fig. 1a). We identified 323 genes (clusters 1 and 2) that showed a higher expression level at 48 °C than at 37 °C, and 135 genes (cluster 3) that were expressed at a higher level at 37 °C than at 48 °C (Fig. 1b, see also Supplementary Table S2). Many of these 323 genes, especially those in cluster 1, which is enriched with heat shock-responsive genes, may have a role in thermotolerance of A. fumigatus. These include only 11 (four in cluster 1 and seven in cluster 2) of the 551 homologues of the Saccharomyces cerevisiae general stress-response genes, which were shown to be differentially expressed under all stress conditions tested11. Cluster 3 also includes a small number of such genes (five), and three of them have the opposite expression patterns from yeast (Supplementary Table S2). These data indicate that high temperature responses in A. fumigatus differ from the general stress response in yeast. Except for catalase B, no known genes implicated in pathogenicity showed higher expression at 37 °C than at 48 °C, suggesting that host temperature alone (37 °C) is insufficient to turn on many virulence-related genes.

Figure 1: Gene expression profiles for the temperature shift-responsive genes.
figure 1

The colour bar indicates the range of the expression ratios in the heat map-type figures. a, Genes with significantly differentiated expression are shown (see Methods). b, The same set of genes grouped into ten clusters. Three clusters of interest (that is, clusters 1 and 2 with genes expressed at a higher level at 48 °C than at 37 °C, and cluster 3 with the opposite pattern) are shown with centroid graphs and heat map-type figures. Cluster 1 enriches heat shock genes that are highlighted in yellow. Homologues to the yeast general stress-response genes are indicated with pink boxes. Vertical bars on graphs represent the data range, and the points in the middle the average.

More allergens (defined by IgE binding) have been characterized from A. fumigatus than from all other fungal species combined (n = 58)12. We identified nine additional predicted allergens in the genome based on similarity with other fungal allergens (Supplementary Table S3), including secreted proteases, glucanases and cellulases. Only A. fumigatus encodes the major allergen ribotoxin (Asp f1), which cleaves a single phosphodiester bond of the 28S ribosomal RNA of eukaryotic ribosomes. None of the nine allergens is a spore surface protein, despite a hydrophobin in Cladosporium herbarum being allergenic13. The allergen Asp f16 has immunoprotective properties14.

Identification of essential genes may reveal potential targets for drug development. Putative essential genes in the A. fumigatus genome were identified by BLASTp search against 131 single-member KOGs (eukaryotic orthologous groups) representing a conserved core of largely essential eukaryotic genes compiled by analysis of seven diverse, completely sequenced eukaryotic genomes15 (Supplementary Table S4). Only one of the 131 KOGs, KOG3214/DUF701, containing putative Zn ribbon RNA binding proteins, was not found in A. fumigatus or other aspergilli, suggesting a lineage-specific gene loss.

A. fumigatus virulence may be augmented by its numerous secondary metabolites, including fumagillin, gliotoxin, fumitremorgin, verruculogen, fumigaclavine, helvolic acid and sphingofungins4. Genes controlling fungal secondary metabolites are generally organized in clusters, many of which are species-specific. The A. fumigatus genome contains 26 such clusters with polyketide synthase, non-ribosomal peptide synthase and/or dimethylallyl tryptophan synthase genes. Only 13 of the 26 clusters have orthologues in A. oryzae and/or A. nidulans, and ten of these orthologous clusters are missing many or most of the genes present in the A. fumigatus clusters (see ‘Selfish Cluster Hypothesis’ in Supplementary Information). The unique clusters of A. fumigatus are dispersed in the genome with a bias towards telomeric locations. Many of these clusters contain regulatory genes, genes associated with resistance such as transporters involved in efflux16, and genes with no obvious role in production of the metabolite (Supplementary Table S5). Fifteen of the clusters contain 22 transcriptional regulators, which are probably specific to their cluster because they do not have strong similarity to other proteins in the databases. In contrast to these regulators within the clusters, other global regulators of secondary metabolite synthesis are dispersed in the A. fumigatus genome. The genome also contains one copy of laeA encoding a global regulator of Aspergillus secondary metabolites17.

Table 2 summarizes the numbers of different classes of secondary metabolite genes for A. fumigatus, A. nidulans (ref. 18) and A. oryzae (ref. 19). (See ‘Secondary Metabolites’ in Supplementary Information for further discussion.)

Table 2 Secondary metabolite gene types in A. fumigatus, A. nidulans and A. oryzae

Stimulation of the programmed cell death pathway, as reported for A. fumigatus and A. nidulans during stationary phase and oxidative death20, presents an opportunity for antifungal drug development. As with other fungi, A. fumigatus lacks homologues of the metazoan upstream apoptotic machinery, whereas the downstream effectors and regulators, both caspase-dependent and caspase-independent, seem to be shared (Supplementary Table S6). A. fumigatus possesses a homologue of the key participant of caspase-independent apoptosis in mammals, PARP, which is absent in S. cerevisiae. PARP activity was demonstrated previously in A. nidulans during sporulation-induced apoptosis21. The presence of these proteins in Aspergillus is indicative of the recently identified PARP-dependent programmed cell death pathway and makes these filamentous fungi attractive models in which to study the mechanism and origin of programmed cell death.

As the hyphal cell wall is essential for A. fumigatus to penetrate solid nutrient substrates and to resist host cell defence reactions, comprehension of cell wall biosynthesis pathways are important. The A. fumigatus cell wall is composed of a fibrillar branched β1,3-glucan core bound to chitin, galactomannan and β1,3-1,4-glucan, embedded in an amorphous cement composed of α1,3-glucan, galactomannan and polygalactosamine22. β1,6-glucan and peptidomannan, both present in yeast cell walls, are missing in A. fumigatus. The types and numbers of A. fumigatus Af293 cell wall-related proteins as compared to other eukaryotes are provided in Supplementary Table S7. Specificity of the polymer organization of the A. fumigatus cell wall is reflected at the genomic level in the specificity of the cell wall biosynthetic gene inventory.

In S. cerevisiae, certain proteins initially anchored by a glycosyl phosphatidylinositol (GPI) moiety to the plasma membrane, and subsequently cross-linked to β1,3-glucans through β1,6-glucans, are thought to be major participants in yeast cell wall organization23. Among 82 putative GPI-anchored proteins identified in A. fumigatus, no homologues of these yeast GPI-anchored proteins were found (Supplementary Fig. S2). A. fumigatus also lacked homologues of the yeast PIR proteins that are putatively bound to the β1,3-glucans through an alkali-labile bond. It has been hypothesized in yeast that the linkage of proteins to cell wall polysaccharides is important in establishing the three-dimensional polysaccharide network that constitutes the skeleton of all fungal cell walls. On the basis of the comparative analysis reported here, it is more likely that binding to polysaccharides in yeasts is merely a way for certain proteins to remain at the surface of the cell wall to fulfil their biological functions in adhesion and flocculation—events absent in mould biology—and in mating. Hydrophobins, proteins not found in S. cerevisiae, are the only cell wall-linked GPI proteins detected in the A. fumigatus genome sequence. Hydrophobins have a major role in mould biology, because they are required for attachment to hydrophobic surfaces, formation of aerial structures, air dispersion and survival of conidia.

More than 500 putative A. fumigatus-specific genes having no detectable A. nidulans or A. oryzae homologues were found, mostly annotated as hypothetical proteins. A. fumigatus-specific proteins that have functional annotations other than hypothetical are listed in Supplementary Table S8. Most of these seem to have unusual phyletic patterns and are clustered in synteny break locations relative to A. oryzae and A. nidulans (Fig. 2 in ref. 18). About one-third of the A. fumigatus-specific proteins showed significant similarity to other fungal gene products. Furthermore, many seem to be involved in secondary metabolite biosynthesis, such as the developmentally regulated cluster involved in conidial pigment biosynthesis in A. fumigatus.

Figure 2: Spatial distribution of A. fumigatus Af293 genes not present or diverged as compared with various (unsequenced) strains.
figure 2

On the basis of the microarray CGH data, A. fumigatus Af293 genes (reference) with log2 ratios equal to or greater than 2 as compared to signals from the query strains are scored as absent or diverged in the query strains. The five query strains are denoted with the numbers 1–5. The locations of A. fumigatus genes for which the orthologues are diverged or missing in the query strain are arranged in the order that they appear along the chromosome.

Several of the A. fumigatus-specific genes apparently have only bacterial or archaeal homologues, and may confer selective advantage in adapting to environments as diverse as human bodies, compost piles and arsenic-contaminated soil. The most striking finding involves two A. fumigatus-specific proteins that show high sequence similarity with the pI258 ArsC superfamily of arsenate reductases, responsible for detoxification of arsenate by reduction to arsenite in bacteria24. These two proteins are unrelated to Acr2p of S. cerevisiae and are the first instances of the pI258 ArsC-type arsenate reductase in eukaryotes. The corresponding A. fumigatus genes are in a duplicated cluster on chromosomes 1 and 5, along with genes encoding an arsenite exporter, an arsenic resistance protein and an arsenic methyltransferase (Supplementary Table S9). It is of particular note that the cluster members seem to have different phyletic patterns. Although all of the significant BLASTp hits for the arsenate reductase and arsenic resistance protein are actinobacterial and proteobacterial proteins, the arsenite exporter appears to be closely related to yeast Acr3p and the methyltransferase has significant similarity to Neurospora crassa and archaeal proteins as well as mammalian S-adenosyl-l-methionine:AsIII methyltransferase25. The selective benefits of the assembly and retention of this cluster may involve the co-regulation of these arsenic resistance genes26. Elsewhere in the A. fumigatus genome, genes for an arsenite efflux pump and an arsenite translocating ATPase as well as additional copies of the arsenate exporter and arsenic resistance genes have been identified (Supplementary Table S9). This gene complement supports the classification of A. fumigatus among the once notorious ‘arsenic fungi’, organisms that produce the volatile trimethylarsine when grown in arsenate-contaminated environments27.

The genome sequence of A. fumigatus revealed several genes associated with mating processes and sexual development. This topic is discussed further in an accompanying paper18.

Azoles and allylamines block two sequential steps in the 20-step cascade of ergosterol synthesis. Comparative analysis of the ergosterol synthesis pathway genes revealed variable copy numbers of several genes, including ERG3 and ERG11 (Supplementary Table S10). Duplicated genes in the Aspergillus ERG pathway may reflect an adaptation strategy modulating the composition and fluidity of the cell membrane.

The comparative analysis of the A. fumigatus genome has made good use of the sequences of the A. nidulans and A. oryzae genomes to study gene and genome evolution among these species (see the accompanying paper18). However, within the genus Aspergillus, A. nidulans and A. oryzae are only distantly related to A. fumigatus. To explore the association between gene content and phenotype (that is, pathogenicity and related subphenotypes) the much closer taxonomic relationship of Neosartorya fischeri and Neosartorya fennelliae to A. fumigatus provides a more powerful comparative set. N. fennelliae is not known to be pathogenic to humans and possesses a sexual cycle. Another closely related species is Aspergillus clavatus, a mycotoxin producer that has been implicated in neurotoxicosis in beef cattle as well as respiratory disease in maltworkers28. We have used genomic DNA from N. fischeri, N. fennelliae and A. clavatus as well as from two additional strains of A. fumigatus, Af294 and Af71, to perform comparative genomic hybridization (CGH) with our Af293 polymerase chain reaction (PCR) amplicon coding sequence (CDS) microarray. The analysis revealed 2,557 total A. fumigatus Af293 genes to be absent or diverged in the analysed species. Of these, 1,382 are assigned gene names, including 70 coding for enzymes involved transcriptional regulation, at least 22 in production of secondary metabolites, and 6 encoding proteins for drug resistance transporters. Both of the arsC genes were missing or diverged in most of the analysed strains, including A. fumigatus strains Af294 and Af71. Figure 2 shows the chromosomal locations of the missing or diverged genes, demonstrating a bias towards subtelomeric locations consistent with the higher density of synteny breaks observed in subtelomeric locations between A. fumigatus, A. nidulans and A. oryzae (Fig. 2 of ref. 18) and suggesting greater genome instability in these regions. The most relevant CGH analysis for phenotypic comparisons, that with N. fischeri, revealed 700 genes to be absent or diverged relative to A. fumigatus Af293 (Supplementary Table S11). These include at least 13 genes coding for enzymes involved in the production of secondary metabolites, 28 coding for transcriptional regulators and protein kinases, 21 coding for transporters, 199 coding for metabolic and other proteins, and 400 coding for hypothetical proteins. This number of genes is a manageable set to begin the effort of correlating phenotypic differences between these species to gene content.

Methods

Strain isolates

Af293 was isolated from a patient who ultimately died from invasive aspergillosis29. Af71 (NCPF 7098) and Af294 (NCPF 7102) are also clinical isolates. The type strains of N. fischeri (NRRL 181), N. fennelliae (NRRL 5534) and A. clavatus (NRRL 1) were used for CGH.

Sequencing and assembly

The genome of A. fumigatus Af293 was sequenced and assembled using the random shotgun method. Closure (finishing) was accomplished by directed sequencing and manual editing of the genome sequence9. Sequencing and assembly statistics are provided in Supplementary Methods.

Coding sequence prediction and gene identification

The assembled genomic sequence was processed through the TIGR annotation pipeline, a collection of software known as Eukaryotic Genome Control (EGC) that serves as the central data management system. This pipeline is described in detail in Supplementary Methods.

Microarray methods

The DNA amplicon microarray for A. fumigatus Af293 was constructed by designing primers for 9,516 genes (96%) then amplifying these target gene regions from genomic DNA (see Supplementary Methods). The resulting PCR products were purified and spotted in triplicate at high density on Corning UltraGAPS aminosilane-coated microscope slides using a robotic spotter built by Intelligent Automatic Systems and cross-linked by ultraviolet illumination.

For CGH analyses, genomic DNA was prepared from each isolate using the DNeasy Tissue kit (Qiagen). Purified genomic DNA was labelled and hybridized as described30. For temperature-shift experiments, conidia (5 × 106 ml-1) from Af293 were incubated in Complete medium for germination (17 h) at 30 °C. Cultures were then transferred to a water bath of 37 °C or 48 °C for continued growth. Total RNA samples before (that is, 0 min) and after (that is, 15, 39, 60, 120 and 180 min) two temperature shifts (that is, 30 to 37 °C and 30 to 48 °C) were used to profile gene expression. A biological replication of the cell growths and samplings was conducted. Labelling reactions with RNA and hybridizations were conducted as described in the TIGR standard operating procedures found at http://atarrays.tigr.org. The sample from 0 min in each temperature-shift set served as a reference in all hybridizations with samples from later time points within the set. All of the hybridizations with the two biological replicates were repeated in dye-swap sets.

Hybridized slides were scanned and analysed to obtain relative transcript levels (see Supplementary Methods). Normalized data were averaged over replications, and differentially expressed genes at the 95% confidence level were determined using intensity-dependent Z-scores (with Z = 1.96). The resulting data were organized and visualized using euclidean distance and hierarchical clustering with average linkage clustering method to view the whole data set (Fig. 1a) and k-means to group the genes in ten clusters (Fig. 1b) with TIGR MEV (http://www.tigr.org/software).