From signatures to models: understanding cancer using microarrays

Segal, Eran; Friedman, Nir; Kaminski, Naftali; Regev, Aviv; Koller, Daphne

doi:10.1038/ng1561

Perspective
Published: June 2005

From signatures to models: understanding cancer using microarrays

Eran Segal¹,
Nir Friedman²,
Naftali Kaminski³,
Aviv Regev⁴ &
…
Daphne Koller⁵

Nature Genetics volume 37, pages S38–S45 (2005)Cite this article

4036 Accesses
268 Citations
4 Altmetric
Metrics details

Abstract

Genomics has the potential to revolutionize the diagnosis and management of cancer by offering an unprecedented comprehensive view of the molecular underpinnings of pathology. Computational analysis is essential to transform the masses of generated data into a mechanistic understanding of disease. Here we review current research aimed at uncovering the modular organization and function of transcriptional networks and responses in cancer. We first describe how methods that analyze biological processes in terms of higher-level modules can identify robust signatures of disease mechanisms. We then discuss methods that aim to identify the regulatory mechanisms underlying these modules and processes. Finally, we show how comparative analysis, combining human data with model organisms, can lead to more robust findings. We conclude by discussing the challenges of generalizing these methods from cells to tissues and the opportunities they offer to improve cancer diagnosis and management.

You have full access to this article via your institution.

Download PDF

Decoding disease: from genomes to networks to phenotypes

Article 02 August 2021

Aaron K. Wong, Rachel S. G. Sealfon, … Olga G. Troyanskaya

Integrative pathway enrichment analysis of multivariate omics data

Article Open access 05 February 2020

Marta Paczkowska, Jonathan Barenboim, … PCAWG Consortium

An integrated approach to biomarker discovery reveals gene signatures highly predictive of cancer progression

Article Open access 04 December 2020

Kevin L. Sheng, Lin Kang, … Robin T. Varghese

Main

Genomics provides powerful tools with which to probe the components and behavior of biological systems. Microarrays, high-throughput chromatin immunoprecipitation^1,2 (ChIP) and tissue microarrays³ inform us on different perspectives of the molecular mechanisms underlying cellular functions. The staggering volume of molecular data resulting from the rapid adoption of such techniques has underscored the importance of computational analysis as a key link between data generation and the formulation of new hypotheses. It is widely believed that genomics will transform our understanding of the mechanisms underlying the function of cells and organisms, and revolutionize the diagnosis and management of disease by offering an unprecedented comprehensive view of the molecular underpinnings of pathology^4,5. Gene-expression profiling has been applied extensively in cancer research. Gene-expression microarrays have been analyzed using clustering algorithms that group genes and samples on the basis of expression profiles, and statistical methods that score genes on the basis of their relevance to various clinical attributes (Supplementary Note online). Using these methods, investigators have identified new classes of hematological malignancies, predicted prognosis in lung cancer and breast cancer and made many mechanistic observations (Supplementary Fig. 1 online). Despite the natural caution associated with the implementation of new technologies in the clinical arena, the utility of the results of microarray analysis as an effective diagnostic tool at the point of care is already being assessed⁶.

Approaches such as clustering and identification of gene signatures, though successful, tend to ignore much of the signal in the data, both in genes whose activity changes but does not pass the threshold for differential expression and in genes that are differentially expressed but unfamiliar to the researcher analyzing the list. Furthermore, because these analyses are done at the gene level, they are prone to the inherent noise that exists both in the sample population and in different stages of assaying gene expression. Moreover, simply listing genes associated with a certain tumor type is far from identifying the biological processes in which these genes are involved. Finally, clustering genes with similar expression patterns does not identify the causal molecular mechanisms that regulate them. Therefore, developing analysis methods that can extract a more biologically meaningful understanding of the processes giving rise to cancer is a key challenge. Here, we focus on ongoing research that attempts to achieve this goal, discuss challenges in its application to complex multicellular tissues and conclude with some opportunities for using these methods to improve cancer diagnosis and treatment.

A module-level view

To transcend from individual genes to biological processes, several recent methods^7,8,9,10 use gene modules as the basic building blocks for analysis. These methods aim to distill a higher-order and more interpretable characterization of transcriptional changes. Moreover, by considering coherent changes in expression in larger modules, we can identify patterns that are too subtle to discern when considering expression profiles of individual genes in isolation.

Mootha et al.⁸ (Fig. 1a) tested biologically coherent sets of genes (e.g., pathways) for association with disease phenotypes. They applied their method to a data set of human diabetic muscle, with the goal of identifying processes that were systematically altered in diabetic muscle. Their analysis showed that, by examining the joint behavior of a set of genes, they could detect significant changes even in cases where the expression of individual genes was not significantly different. It was only in the coherent signal associated with a higher-level entity that the pattern was evident.

Segal et al.⁷ applied a module-level analysis to obtain a global view of the shared and unique molecular modules underlying human cancer. They compiled a 'cancer compendium' from multiple studies and a large collection of biologically meaningful gene sets from experimental studies and human-curated annotations. They identified gene sets with similar behavior across arrays, combined them into modules and used these modules to characterize a variety of clinical conditions (e.g., tumor stage and type) by the combination of activated and deactivated modules. In the resulting 'cancer module map'⁷ (Fig. 1b), the activation or repression of some modules (e.g., cell cycle) was shared across multiple tumor types and could be related to general tumorigenic processes, whereas others (e.g., growth-regulatory modules; Fig. 1c) were more specific to the tissue origin or progression of particular tumors. Conversely, the module map characterized each condition by a particular combination of module activity, providing insight into the mechanisms underlying specific malignancies. For example, the Growth Inhibitory Module (Fig. 1d) consisted primarily of growth suppressors coordinately repressed in a subset of acute leukemia arrays and suggested a possible explanation for the uncontrolled proliferation and reduced cell death in these tumors (Fig. 1e). Other modules were shared across a diverse set of clinical conditions, suggestive of common tumor-progression mechanisms. For example, a bone osteoblastic module, spanning various tumor types, included both secreted growth factors and their receptors, suggesting a single mechanism for both primary tumor proliferation and metastasis to bone.

These results and others^8,9,11,12,13 illustrate the value of analyzing complex processes such as tumorigenesis in terms of higher-level gene modules and biological processes. This type of analysis increases our ability to identify the signal in microarray data and provides results that are more interpretable than gene lists. In particular, when grouped together into a coherent module, the functional and clinical effects of pleiotropic genes might become more apparent, as would the complexity of the mechanism that has to be addressed therapeutically (Fig. 1e). Finally, a modular approach can be applied uniformly to multiple data sources from different tumor types, thereby uncovering the commonalities and differences of multiple clinical conditions.

From modules to regulatory mechanisms

The characterization of cancer processes in terms of transcriptional changes in genes or modules is only a step towards the goal of obtaining a detailed mechanistic model of the processes leading to malignancy. Recent work attempts to use gene expression and other genomic data to understand regulatory interactions between genes and how these might result in tumorigenesis.

Cellular processes are regulated by a variety of mechanisms, occurring at every step in the process of going from DNA to functional proteins. Transcriptional regulation, directly observed in gene-expression data, controls the production of mRNA transcripts. Important components in this process are cis-regulatory elements in a target gene's promoter region, trans-acting factors that bind to these DNA motifs and signaling molecules that modulate this process based on exogenous and endogenous signals. Genomic data sets offer (noisy) views of different facets of this process. Protein-DNA binding events are directly observed in ChIP-chip assays^14,15. We can computationally detect cis elements in promoter sequences, on the basis of experimentally determined sites¹⁶, de novo identification or evolutionary conservation^17,18. Finally, similar expression profiles allow us to identify target genes that are controlled by a shared regulatory mechanism.

Most attempts to identify regulatory relationships from genomic data have focused on the unicellular yeast Saccharomyces cerevisiae. One focus aims at reconstructing cis-regulatory circuits¹⁹, including identifying new cis elements, detecting their targets and identifying combinations of elements that modulate expression of a target gene. Because signal at the level of individual genes is often hard to detect, most approaches focus on regulatory modules, whose member genes are expected to be controlled by similar regulators in a similar way^20,21. Early approaches identified new individual cis elements that are enriched in clusters of coexpressed genes¹⁹, or pairs of elements that act in synergy under specific conditions²². Recent extensions increase accuracy by using other sources of data, such as regulator binding (from ChIP-chip assays)²¹ or evolutionary conservation^23,24.

More recently, several studies^25,26,27,28 have attempted to identify how the set of cis-regulatory elements in a gene's promoter governs its behavior and explains the observed expression pattern. Segal et al.²⁶ proposed a model of regulatory modules in which module genes shared both a similar expression profile and a similar profile of cis elements. Thus, a gene's cis element profile determined its module assignment and hence its expression profile. Beer and Tavazoie²⁷ subsequently proposed a similar approach, which also included a finer-grained model of promoter configuration. Both groups showed that a substantial fraction of the signal in gene-expression data could be explained in terms of cis-element profiles, and that these profiles exhibited an interesting combinatorial organization of elements into various logic gates (OR, AND) and spatial configurations (Fig. 2a). This general framework can also accommodate transcription factor–binding data instead of (or in addition to) cis elements. For example, Bar-Joseph et al.²⁸ identified gene modules whose expression could be explained by a shared transcription factor–binding profile (Fig. 2b), and Segal et al.²⁵ combined expression, sequence and transcription factor–binding to identify combinations of transcription factors, their target modules and the cis elements that mediated this regulation.

**Figure 2: Computational prediction of *cis*-regulatory networks.**

Despite these successes in model organisms, this approach has yet to be broadly applied in multicellular organisms. In particular, most current methods for detecting cis elements are not well suited to the large, complex genomes and long intergenic regions typical of mammals. Nevertheless, several researchers have identified regulatory circuits in expression data from synchronized HeLa cells^26,29, both finding known cell cycle–regulatory elements and targets, and suggesting new ones (Fig. 2c). Some of the more successful approaches rely on additional signals, such as evolutionary conservation³⁰, spatial clustering of cis elements in the DNA sequence^{30,31,32,33,34} or a global model of cis regulation and gene expression²⁶, to improve the detection of reliable biological signals.

A complementary approach focuses on the transcription factors and signaling molecules that modulate gene expression either directly or indirectly. Although regulator activity is not observed directly, if a regulator is itself transcriptionally regulated, its expression level can serve as a proxy for its activity, allowing us to infer regulatory interactions correctly from expression profiles. Motivated by this insight, several studies^35,36,37 propose algorithms that construct a Bayesian network describing the probabilistic dependencies between the expression levels of genes. These methods can detect both direct and indirect regulatory relations (e.g., between a MAP kinase and its downstream targets). Recent work^38,39 extends this approach by using more realistic models of binding affinity between transcription factors and binding sites, in accordance with biochemical principles⁴⁰.

A recent extension is based on the observation that many regulatory interactions are shared by all members of a gene module^20,39. Segal et al.⁴¹ proposed the module-network approach for identifying modules of coregulated genes and their shared regulation program, which specified the expression profile of a module's genes as a function of the expression of the module's regulators. As with the identification of cancer modules and cis elements, this higher-level analysis improved both statistical robustness and biological interpretability. This approach was successfully applied to a yeast expression data set, identifying functionally coherent modules and known regulatory relations (Fig. 3a). It also suggested testable hypotheses regarding the role of transcription factors and signaling molecules, three of which were tested and validated experimentally.

**Figure 3: Computational prediction of *trans*-regulatory networks.**

A key limitation of such approaches is that many regulators are regulated post-transcriptionally, and their activity is undetectable in gene-expression data. Nevertheless, in the context of tumorigenic processes, there is reason for optimism. Tumorigenesis often arises from some change to a cell's DNA, which in turn results in a perturbation in expression of certain key regulators. For example, the Myc oncogene is amplified in many tumors, resulting in a concomitant change in the expression of its targets⁴². Thus, even regulators that, under normal conditions, are regulated post-transcriptionally may undergo transcriptional regulation in tumor cells, making the regulatory processes more apparent in expression data.

Encouraged by this observation, we applied the module network procedure to a data set of lung cancer arrays⁴³, focusing on regulation by transcription factors. In addition to the usual cancer-related functional categories (cell cycle, DNA and RNA repair, and metabolism), we found multiple modules enriched for genes associated with extracellular inflammation, immunity and extracellular matrix, processes that are increasingly recognized to be important in tumor generation and progression^44,45. An in-depth analysis of one of the modules (Fig. 3b) suggested that extracellular matrix–related genes, whose expression is often increased in tumors, were not mere representatives of stromal activity but were related to tumor clinical biology and were tightly regulated by cancer-relevant transcription factors. This example illustrates the potential of this approach for identifying transcriptional regulation in complex tissues; it also shows how an unbiased discovery approach can lead the observer to unexpected conclusions (such as the possible role of fibrotic and inflammatory modules in cancer).

Comparative analysis

Taking a more global view, we can extend our analysis to encompass multiple studies across diverse organisms and conditions. In such comparative analysis, conserved patterns can help to identify true biological signals and key mechanisms, and highlight commonalities and differences. This approach is particularly compelling when applied to the available data from an increasing number of mammalian species and of animal models of cancer.

Several works have explored the conservation of coexpression relationships and gene modules across a diverse range of organisms^46,47,48. These works showed that conserved coexpression relationships were more likely to correspond to true functional interactions (Fig. 4a) and allowed us to study the change in the role of functional modules over evolution (Fig. 4b). This analysis can highlight functional modules that have a key role in a process of interest. For example, McCarroll et al.⁴⁸ identified a common expression signature in aging between flies and worms, which included genes involved in mitochondrial metabolism, DNA repair and cellular transport.

**Figure 4: Multispecies analysis of gene expression data.**

Applying a similar approach to cancer data from mouse and human can shed light on the mechanisms underlying tumorigenesis. For example, Sweet-Cordero et al.⁴⁹ used three different mouse models of lung cancer to identify signatures of specific genetic alterations that lead to tumorigenesis. They projected the genes in each signature to their human orthologs and used a gene-set–based method⁸ to test for activity of these signatures in different human lung tumors. This design used changes observed in controlled manipulations in mouse disease models to draw insights about disease manifestations in humans.

This approach transfers results of an analysis done in mouse to inform a subsequent analysis in human; we can also carry out a joint analysis that explicitly searches for patterns conserved across multiple species. Along these lines, we analyzed a human-mouse data set of normal and tumor brain tissue using an extension⁵⁰ of the module network approach⁴¹. This analysis suggested regulatory modules that were conserved across human and mouse, and proposed new hypotheses regarding regulation in medulloblastoma (Fig. 4c).

Challenges and opportunities

The reconstruction of the molecular mechanisms that underlie a complex process, such as tumorigenesis, is a formidable challenge. This challenge arises in part from difficulties associated with microarray assays, including noise in the data and limited reproducibility across platforms and researchers^51,52. Moreover, most analyses implicitly treat mRNA expression as a surrogate for protein activity level, an assumption that does not account for processes such as mRNA stability, protein degradation and post-translational modifications. In addition, when we attempt to find complex patterns in data, we invariably encounter multiple alternative explanations of the data (e.g., clusters, regulatory modules, etc.). Therefore, the results of such analysis are sensitive to the choice of model and parameters and the specific data used, and must be interpreted with care.

Nevertheless, the successes obtained by combining genomic techniques and computational algorithms to reconstruct networks (albeit primarily in model organisms) are encouraging. Three recurring themes form the basis for this success. The first is the analysis of data at the level of biological modules, rather than individual genes, an approach that produces results that are biologically interpretable and statistically robust. The second is the use of biological knowledge in developing analytic techniques, either directly (e.g., to define functionally coherent gene sets) or indirectly (e.g., to construct biologically realistic models). As we create more realistic biological models, we can hope for better biological understanding and more focused predictions to inform further experiments. The third theme is the integration of multiple sources of data in the analysis. By putting together different partial (and noisy) views of a single complex process (gene expression, promoter sequences, protein-DNA binding, protein-protein interactions and more), we can often obtain a much more accurate and complete picture. In addition, by considering data from different conditions or cell types, we can obtain a more global understanding of the function of the same set of building blocks in different contexts. Finally, the integration of data across organisms allows us to identify functional components based on their conservation and, conversely, to recognize the mechanisms that are the basis for biological diversity.

Although genomic approaches are prevalent in cancer research, we are still far from reconstructing molecular mechanisms in human cancer. In fact, the methods we describe do not always scale easily to mammalian systems. Unlike yeast genomes, mammalian genomes are less compact, and enhancers are more dispersed and remote. Both regulatory and signaling networks are larger and more elaborate, and the control of many genes and processes involves undefined epigenetic mechanisms, a higher degree of combinatorial regulation and multiple signaling pathways. Furthermore, many interactions are context-specific, as different components of the molecular network are active in different cellular states and phenotypes.

Much of the added complexity in applying genomic analysis to cancer is related to multicellularity, which can confound the analysis of data from tissue samples that contain heterogeneous population of cells. Most genomic techniques measure an average signal in a sample from a cell population. This is a concern even when studying unicellular organisms or cell cultures as the averaging process tends to obscure variations between cells^53,54,55. When analyzing a heterogeneous tissue, this problem is more pronounced because the signal for different cell types is obfuscated; differential regulation of genes associated to changes in cell state can be hard to distinguish or can even disappear entirely. Moreover, the averaging effect introduces an additional source of noise as the proportions of the different cell types are typically different across samples. This variability may swamp the variability resulting from other, perhaps more relevant, differences between the samples. Another, more challenging issue is raised by intercellular signaling in tissues. Interactions between cells often lead to complex behaviors, which are hard to distinguish from the regulatory processes in the cell itself and cannot be emulated in in vitro cell-culture assays. Finally, this epigenetic variability is further confounded in tumor samples, where considerable genetic variability occurs between and within samples.

In light of these challenges, is there hope for systematic mechanistic insights from genomic and computational studies? We believe that a positive answer lies in the combination of computational and experimental insights. Computational methods should be developed to tackle cell and tissue heterogeneity^56,57. For example, Stuart et al.⁵⁶ used histological evaluation of tissue heterogeneity to deconvolve expression profiles and identify cell type–specific expression responses. Experimentally, most cancer genomic studies have focused on tumor samples from the human population and have therefore suffered from inevitable confounding genetic and environmental factors, tissue heterogeneity, lack of time courses for disease progression and unavailability of perturbations instrumental in identifying regulatory events. Recent studies^9,49,58 suggest that careful design can greatly improve the utility of such studies, by combining the study of human tissue samples with tissue culture and animal models, to obtain a more controlled and comprehensive view. For example, Lamb et al.⁹ used expression-profiling in a cell-culture model with genetic perturbations to identify a 'cyclin D signature', followed by computational analysis of a compendium of human tumor expression profiles to find the transcription factor that mediated this response in tumors. Similarly, Kang et al.⁵⁹ found a set of genes involved in osteoclastic metastasis by combining expression profiling on human cell cultures with phenotypic effects in animal models. Finally, new biological assays, such as in situ gene-expression signatures using laser capture microdissection^60,61 or fluorescence microscopy data, can provide more refined observations about gene activation in individual cancer cells^54,55.

Successful identification of mechanisms from genomic data will also require more sophisticated computational methods. Much progress can be made along the themes of modularity, incorporation of biological knowledge and data integration across techniques, conditions or organisms. It is important to develop methods that combine data across experimental systems that address the same phenomenon (e.g., different cancers in humans or the same cancer type in human and mouse) and isolate key mechanisms and root causes of the disease. The development of such computational methods should go hand in hand with that of multipronged experiments combining cell culture, animal models and human tumor samples.

A major challenge for analysis is the identification of the correct context and functional importance of different events and mechanisms. This issue is particularly pronounced in cancer, in which aberrant and normal processes are intertwined. A cancer cell has a mixture of different processes: processes that are the source for tumorigenesis (e.g., a constitutively active Ras mutant); processes that are normal on their own but are suborned by tumors and support their proliferation (e.g., cell division or angiogenesis); processes that may represent the normal host response to the tumor and may even be protective (e.g., immune response and inflammatory-cell infiltration); and perhaps processes that are simply a by-product of cancer and have no functional role. Although some of the modular approaches outlined above enhance our ability to analyze disease process–relevant signatures, we are still far from understanding the role that these signatures have in cancer. We may be able to derive a more comprehensive perspective on cancer processes by integrating existing assays with histopathologic, clinical and environmental information on the one hand, and with measurements of genetic variation, such as SNPs or DNA copy-number changes, on the other.

Finally, when considering the analysis of cancer data, we must keep in mind that our ultimate goal is to improve diagnosis and treatment of the disease. How can the methods we described above help in achieving this goal? Understanding cancer processes and identifying new drug targets is one contribution, but many of the key regulators and basic pathways of carcinogenesis were identified long before the introduction of high-throughput methods, through the careful hypothesis-based work of molecular and cell biologists. Modular analysis can place the complex interactions of these pathways in the biological context of the tumor microenvironment. Previous analyses may tell us that abnormal WNT–β-catenin pathway activation is important in certain solid tumors and increased activation of EGF receptors is important in others. The results of modular analyses can uncover a certain tumor's use of bone-survival machinery (that promotes bone metastasis) or information about its ability to create a proangiogenic microenvironment or evade immune surveillance; any one of these characteristics is potentially crucial to the disease mechanism and the final outcome for an affected individual.

An understanding of the complexity of the pathways that create and sustain tumors can enable a better use of available therapies by using rational combinations in accordance with the pathways that characterize a certain cancer. Furthermore, a detailed view of the tumor's microenvironment could lead to better design of therapeutic interventions that would help to reverse or contain the carcinogenic process. The availability of multiple secreted and membrane proteins that characterize tumors should allow the identification of combinatorial markers for early detection and noninvasive disease classification, whereas the functional and regulatory characterization of tumors should allow personalized treatment of cancer that is based not on histological appearance but on a global and detailed mechanistic understanding of an individual's disease.

Note: Supplementary information is available on the Nature Genetics website.

References

Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).
Article CAS PubMed Google Scholar
Iyer, V.R. et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001).
Article CAS PubMed Google Scholar
Kononen, J. et al. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat. Med. 4, 844–847 (1998).
Article CAS PubMed Google Scholar
Lander, E.S. Array of hope. Nat. Genet. 21, 3–4 (1999).
Article CAS PubMed Google Scholar
Khan, J. et al. Expression profiling in cancer using cDNA microarrays. Electrophoresis 20, 223–239 (1999).
Article CAS PubMed Google Scholar
Garber, K. Genomic medicine. Gene expression tests foretell breast cancer's future. Science 303, 1754–1755 (2004).
Article PubMed Google Scholar
Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nat. Genet. 36, 1090–1098 (2004).
Article CAS PubMed Google Scholar
Mootha, V.K. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).
Article CAS PubMed Google Scholar
Lamb, J. et al. A mechanism of cyclin D1 action encoded in the patterns of gene expression in human cancer. Cell 114, 323–334 (2003).
Article CAS PubMed Google Scholar
Huang, E. et al. Gene expression phenotypic models that predict the activity of oncogenic pathways. Nat. Genet. 34, 226–230 (2003).
Article CAS PubMed Google Scholar
Rhodes, D.R. et al. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Natl. Acad. Sci. USA 101, 9309–9314 (2004).
Article CAS PubMed PubMed Central Google Scholar
Chang, C.F., Wai, K.M. & Patterton, H.G. Calculating the statistical significance of physical clusters of co-regulated genes in the genome: the role of chromatin in domain-wide gene regulation. Nucleic Acids Res. 32, 1798–1807 (2004).
Article CAS PubMed PubMed Central Google Scholar
Desai, K.V. et al. Initiating oncogenic event determines gene-expression patterns of human breast cancer models. Proc. Natl. Acad. Sci. USA 99, 6967–6972 (2002).
Article CAS PubMed PubMed Central Google Scholar
Odom, D.T. et al. Control of pancreas and liver gene expression by HNF transcription factors. Science 303, 1378–1381 (2004).
Article CAS PubMed PubMed Central Google Scholar
Li, Z. et al. A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells. Proc. Natl. Acad. Sci. USA 100, 8164–8169 (2003).
Article CAS PubMed PubMed Central Google Scholar
Wingender, E. et al. The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29, 281–283 (2001).
Article CAS PubMed PubMed Central Google Scholar
Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E.S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003).
Article CAS PubMed Google Scholar
Cliften, P. et al. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301, 71–76 (2003).
Article CAS PubMed Google Scholar
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. & Church, G.M. Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999).
Article CAS PubMed Google Scholar
Shen-Orr, S.S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68 (2002).
Article CAS PubMed Google Scholar
Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).
Article CAS PubMed Google Scholar
Pilpel, Y., Sudarsanam, P. & Church, G.M. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat. Genet. 29, 153–159 (2001).
Article CAS PubMed Google Scholar
Harbison, C.T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004).
Article CAS PubMed PubMed Central Google Scholar
Pritsker, M., Liu, Y.C., Beer, M.A. & Tavazoie, S. Whole-genome discovery of transcription factor binding sites by network-level conservation. Genome Res. 14, 99–108 (2004).
Article CAS PubMed PubMed Central Google Scholar
Segal, E., Barash Y., Simon I., Friedman N. & Koller D. From promoter sequence to expression: a probabilistic framework. Proceedings of the 6th International Conference on Research in Computational Molecular Biology 263–272 (ACM Press, Washington, DC, 2002).
Google Scholar
Segal, E., Yelensky, R. & Koller, D. Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19 Suppl. 1, i273–i282 (2003).
Article PubMed Google Scholar
Beer, M.A. & Tavazoie, S. Predicting gene expression from sequence. Cell 117, 185–198 (2004).
Article CAS PubMed Google Scholar
Bar-Joseph, Z. et al. Computational discovery of gene modules and regulatory networks. Nat. Biotechnol. 21, 1337–1342 (2003).
Article CAS PubMed Google Scholar
Elkon, R., Linhart, C., Sharan, R., Shamir, R. & Shiloh, Y. Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res. 13, 773–780 (2003).
Article CAS PubMed PubMed Central Google Scholar
Sharan, R., Ben-Hur, A., Loots, G.G. & Ovcharenko, I. CREME: cis-regulatory module explorer for the human genome. Nucleic Acids Res. 32, W253–W256 (2004).
Article CAS PubMed PubMed Central Google Scholar
Schroeder, M.D. et al. Transcriptional control in the segmentation gene network of Drosophila. PLoS Biol. 2, E271 (2004).
Article PubMed PubMed Central CAS Google Scholar
Segal, E. & Sharan, R. A discriminative model for identifying spatial cis-regulatory modules. Research in Computational Molecular Biology 141–149 (ACM Press, San Diego, 2004).
Google Scholar
Sinha, S., van Nimwegen, E. & Siggia, E.D. A probabilistic method to detect regulatory modules. Bioinformatics 19 Suppl. 1, i292–i301 (2003).
Article PubMed Google Scholar
Berman, B.P. et al. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. USA 99, 757–762 (2002).
Article CAS PubMed PubMed Central Google Scholar
Pe'er, D., Regev, A. & Tanay, A. Minreg: Inferring an active regulator set. Bioinformatics 18 Suppl. 1, S258–S267 (2002).
Article PubMed Google Scholar
Friedman, N., Linial, M., Nachman, I. & Pe'er, D. Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 (2000).
Article CAS PubMed Google Scholar
Hartemink, A.J., Gifford, D.K., Jaakkola, T.S. & Young, R.A. Combining location and expression data for principled discovery of genetic regulatory networks. Pacific Symposium on Biocomputing, 437–439 (World Scientific, Lihue, Hawaii, 2002).
Google Scholar
Nachman, I., Regev, A. & Friedman, N. Inferring quantitative models of regulatory networks from expression data. Bioinformatics 20 Suppl. 1, I248–I256 (2004).
Article CAS PubMed Google Scholar
Kalir, S. & Alon, U. Using a quantitative blueprint to reprogram the dynamics of the flagella gene network. Cell 117, 713–720 (2004).
Article CAS PubMed Google Scholar
Ronen, M., Rosenberg, R., Shraiman, B.I. & Alon, U. Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc. Natl. Acad. Sci. USA 99, 10555–10560 (2002).
Article CAS PubMed PubMed Central Google Scholar
Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).
Article CAS PubMed Google Scholar
Lossos, I.S. et al. Transformation of follicular lymphoma to diffuse large-cell lymphoma: alternative patterns with increased or decreased expression of c-myc and its regulated genes. Proc. Natl. Acad. Sci. USA 99, 8886–8891 (2002).
Article CAS PubMed PubMed Central Google Scholar
Beer, D.G. et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8, 816–824 (2002).
Article CAS PubMed Google Scholar
Wiseman, B.S. & Werb, Z. Stromal effects on mammary gland development and breast cancer. Science 296, 1046–1049 (2002).
Article CAS PubMed PubMed Central Google Scholar
Chang, H.Y. et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc. Natl. Acad. Sci. USA 102, 3738–3743 (2005).
Article CAS PubMed PubMed Central Google Scholar
Stuart, J.M., Segal, E., Koller, D. & Kim, S.K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
Article CAS PubMed Google Scholar
Bergmann, S., Ihmels, J. & Barkai, N. Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2, E9 (2004).
Article PubMed CAS Google Scholar
McCarroll, S.A. et al. Comparing genomic expression patterns across species identifies shared transcriptional profile in aging. Nat. Genet. 36, 197–204 (2004).
Article CAS PubMed Google Scholar
Sweet-Cordero, A. et al. An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis. Nat. Genet. 37, 48–55 (2005).
Article CAS PubMed Google Scholar
Segal, E. Rich Probabilistic Models for Genomic Data PhD thesis, Stanford Univ. (2004).
Google Scholar
Mecham, B.H. et al. Increased measurement accuracy for sequence-verified microarray probes. Physiol. Genomics 18, 308–315 (2004).
Article CAS PubMed Google Scholar
Michiels, S., Koscielny, S. & Hill, C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365, 488–492 (2005).
Article CAS PubMed Google Scholar
Cluzel, P., Surette, M. & Leibler, S. An ultrasensitive bacterial motor revealed by monitoring signaling proteins in single cells. Science 287, 1652–1655 (2000).
Article CAS PubMed Google Scholar
Lahav, G. et al. Dynamics of the p53-Mdm2 feedback loop in individual cells. Nat. Genet. 36, 147–150 (2004).
Article CAS PubMed Google Scholar
Irish, J.M. et al. Single cell profiling of potentiated phospho-protein networks in cancer cells. Cell 118, 217–228 (2004).
Article CAS PubMed Google Scholar
Stuart, R.O. et al. In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc. Natl. Acad. Sci. USA 101, 615–620 (2004).
Article CAS PubMed PubMed Central Google Scholar
Lu, P., Nakorchevskiy, A. & Marcotte, E.M. Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations. Proc. Natl. Acad. Sci. USA 100, 10370–10375 (2003).
Article CAS PubMed PubMed Central Google Scholar
Chang, H.Y. et al. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol. 2, E7 (2004).
Article PubMed PubMed Central CAS Google Scholar
Kang, Y. et al. A multigenic program mediating breast cancer metastasis to bone. Cancer Cell 3, 537–549 (2003).
Article CAS PubMed Google Scholar
Fuller, A.P., Palmer-Toy, D., Erlander, M.G. & Sgroi, D.C. Laser capture microdissection and advanced molecular analysis of human breast cancer. J. Mammary Gland Biol. Neoplasia 8, 335–345 (2003).
Article PubMed Google Scholar
Kobayashi, K. et al. Identification of genes whose expression is upregulated in lung adenocarcinoma cells in comparison with type II alveolar cells and bronchiolar epithelial cells in vivo. Oncogene 23, 3089–3096 (2004).
Article CAS PubMed Google Scholar
Whitfield, M.L. et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell. 13, 1977–2000 (2002).
Article CAS PubMed PubMed Central Google Scholar
Caetano, M.S. et al. NFATC2 transcription factor regulates cell cycle progression during lymphocyte activation: evidence of its involvement in the control of cyclin gene expression. FASEB J. 16, 1940–1942 (2002).
Article CAS PubMed Google Scholar
Baksh, S. et al. NFATc2-mediated repression of cyclin-dependent kinase 4 expression. Mol. Cell. 10, 1071–1081 (2002).
Article CAS PubMed Google Scholar
Behrens, J. & Lustig, B. The Wnt connection to tumorigenesis. Int. J. Dev. Biol. 48, 477–487 (2004).
Article CAS PubMed Google Scholar
Hulboy, D.L., Matrisian, L.M. & Crawford, H.C. Loss of JunB activity enhances stromelysin 1 expression in a model of the epithelial-to-mesenchymal transition of mouse skin tumors. Mol. Cell. Biol. 21, 5478–5487 (2001).
Article CAS PubMed PubMed Central Google Scholar
Pomeroy, S.L. et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002).
Article CAS PubMed Google Scholar
Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004).
Article CAS PubMed PubMed Central Google Scholar
Rostomily, R.C. et al. Expression of neurogenic basic helix-loop-helix genes in primitive neuroectodermal tumors. Cancer Res. 57, 3526–3531 (1997).
CAS PubMed Google Scholar

Download references

Acknowledgements

All authors contributed equally to this work. We thank M. Scott and T. Raveh for making available to us their mouse brain microarrays for the multispecies module network analysis. E.S. was supported by a Fellowship from the Center for Studies in Physics and Biology at Rockefeller University. N.F. was supported by the Harry & Abe Sherman Senior Lectureship in Computer Science, by the United States-Israel Bi-National Science Foundation grant and by grants from the US National Institutes of Health. N.K. was partly supported by grants from the US National Institutes of Health, by the Tel-Aviv Chapter of the Israeli Lung Association and by a donation from the Simmons family. A.R. was supported by a grant from the US National Institutes of Health and by the Bauer Center. D.K. was supported by a grant from the US National Science Foundation and by a BioX Center grant.

Author information

Authors and Affiliations

the Center for Studies in Physics and Biology, Rockefeller University, New York, USA
Eran Segal
the School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel
Nir Friedman
the Dorothy P. and Richard P. Simmons for Interstitial Lung Diseases, Pulmonary Allergy and Critical Care Medicine, University of Pittsburgh, USA
Naftali Kaminski
the Bauer Center for Genomics Research, Harvard University, Cambridge, Massachusetts, USA
Aviv Regev
the Computer Science Department, Stanford University, Stanford, California, USA
Daphne Koller

Authors

Eran Segal
View author publications
You can also search for this author in PubMed Google Scholar
Nir Friedman
View author publications
You can also search for this author in PubMed Google Scholar
Naftali Kaminski
View author publications
You can also search for this author in PubMed Google Scholar
Aviv Regev
View author publications
You can also search for this author in PubMed Google Scholar
Daphne Koller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daphne Koller.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1 (PDF 723 kb)

Supplementary Note (PDF 723 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Segal, E., Friedman, N., Kaminski, N. et al. From signatures to models: understanding cancer using microarrays. Nat Genet 37 (Suppl 6), S38–S45 (2005). https://doi.org/10.1038/ng1561

Download citation

Issue Date: June 2005
DOI: https://doi.org/10.1038/ng1561

This article is cited by

Learning to predict RNA sequence expressions from whole slide images with applications for search and classification
- Areej Alsaafin
- Amir Safarpoor
- H. R. Tizhoosh
Communications Biology (2023)
A deep learning model to predict RNA-Seq expression of tumours from whole slide images
- Benoît Schmauch
- Alberto Romagnoni
- Gilles Wainrib
Nature Communications (2020)
Construction of gene causal regulatory networks using microarray data with the coefficient of intrinsic dependence
- Li-yu Daisy Liu
- Ya-Chun Hsiao
- Men-Chi Chang
Botanical Studies (2019)
A new insight into underlying disease mechanism through semi-parametric latent differential network model
- Yong He
- Jiadong Ji
- Fuzhong Xue
BMC Bioinformatics (2018)
Multidisciplinary insight into clonal expansion of HTLV-1–infected cells in adult T-cell leukemia via modeling by deterministic finite automata coupled with high-throughput sequencing
- Amir Farmanbar
- Sanaz Firouzi
- Toshiki Watanabe
BMC Medical Genomics (2017)

From signatures to models: understanding cancer using microarrays

Abstract

Similar content being viewed by others

Decoding disease: from genomes to networks to phenotypes

Integrative pathway enrichment analysis of multivariate omics data

An integrated approach to biomarker discovery reveals gene signatures highly predictive of cancer progression

Main

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Fig. 1 (PDF 723 kb)

Supplementary Note (PDF 723 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

Learning to predict RNA sequence expressions from whole slide images with applications for search and classification

A deep learning model to predict RNA-Seq expression of tumours from whole slide images

Construction of gene causal regulatory networks using microarray data with the coefficient of intrinsic dependence

A new insight into underlying disease mechanism through semi-parametric latent differential network model

Multidisciplinary insight into clonal expansion of HTLV-1–infected cells in adult T-cell leukemia via modeling by deterministic finite automata coupled with high-throughput sequencing

Search

Quick links

Abstract

Similar content being viewed by others

Main

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links