Comparative biology: beyond sequence analysis

https://doi.org/10.1016/j.copbio.2007.07.003Get rights and content

Comparative analysis is a fundamental tool in biology. Conservation among species greatly assists the detection and characterization of functional elements, whereas inter-species differences are probably the best indicators of biological adaptation. Traditionally, comparative approaches were applied to the analysis of genomic sequences. With the growing availability of functional genomic data, comparative paradigms are now being extended also to the study of other functional attributes, most notably the gene expression. Here we review recent works applying comparative analysis to large-scale gene expression datasets and discuss the central principles and challenges of such approaches.

Introduction

One of the surprising discoveries of modern biology is the strong conservation of protein sequences, as well as cellular mechanisms, across evolution. Some of our metabolic genes, for example, display strong sequence and functional similarities to their bacterial counterparts. Moreover, central features of core cellular processes such as cell cycle progression, DNA replication or transcription are conserved from yeast to human. Indeed, this extensive conservation had motivated the use of model organisms as means for studying conserved processes that are more difficult to assay in complex systems. Sequence similarity, in particular, has emerged as a key tool in predicting functional properties. In fact, most annotations of newly sequenced genomes, including gene prediction, gene function or regulatory elements, are based on similarity with other sequences whose functions have been described (e.g. [1, 2, 3]).

Whereas most comparative studies focus on conserved properties as means for characterizing functional elements, inter-species differences are also of interest. These differences are arguably the best indicators of evolutionary history and provide much information about species-specific adaptations. Identifying such differences is thus central to our search for what makes us human and how biological diversity is generated.

Technological advances over the past decade have led to the accumulation of genome-scale data describing not only gene sequence but also functional properties including gene expression, protein–protein interactions, and the binding of transcription factors to DNA. Such data are now available for multiple organisms, and their complexity presents a new set of challenges to comparative analysis. For example, unlike sequence information, most functional properties are condition-dependent, a property that needs to be accounted for during inter-species comparisons. Furthermore, sequence analysis is typically gene-specific and compares specific sets of homologs. Functional properties, on the contrary, often reflect the integrated function of multiple genes, calling for novel methods that allow network-centered rather than gene-centered comparisons. Finally, functional genomic data generated with current technologies suffer from high levels of noise and therefore need to be filtered in order to obtain valid conclusions.

In this review we focus on recent studies that employ comparative methods to analyze gene expression data. We describe the different approaches employed in such an analysis and highlight the remaining challenges.

Section snippets

Comparative analysis of condition-specific gene expression

An important element in a gene's function is the spatial and temporal pattern by which it is expressed. For example, diverse cell types are generated by extensive modifications in the expression of the same set of genes. In recent years, microarray technology has facilitated thousands of experiments that characterized genome-wide expression levels under a wide variety of conditions. Such data are now available for diverse organisms, providing a rich resource for comparative studies.

The initial

Comparative analysis of co-expression

A major difficulty in comparing expression data between organisms is that gene expression is not static and changes depending on the external conditions. As was described above, this fact can be accounted for by considering compatible gene expression data. This approach, however, severely limits the data that can be used for comparative analysis, as only a small fraction of the available data in different species is comparable. Moreover, even if conditions seem equivalent, evolutionary distant

Duplicated genes and orthology relationships

For any comparison to be meaningful, one must first decide what is being compared and identify the common elements of the compared objects (Figure 2). In comparing data of different organisms, the basic elements are typically pairs of one-to-one orthologous genes, which rely on a mapping of orthology. Most comparative studies employ a strict definition for one-to-one orthology by considering, for example, syntenic or reciprocal best sequence matches and excluding ambiguities that arise from

Concluding remarks

We described different approaches for comparative analysis of functional genomics data (Figure 1). The most straightforward approach is to consider a single feature, characterizing each gene (e.g. expression levels under a specific condition) and compare it among orthologous genes. This approach is easy to interpret and the simplest to apply. It can also be easily extended to profiles of multiple features (e.g. expression levels under several conditions) to capture a broader scope of gene

References and recommended reading

Papers of particular interest, published within the annual period of the review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

This work was supported by grants from the Kahn fund for Systems Biology at the Weizmann Institute of Science, the Tauber fund and the Israeli Ministry of Science. This work was partially funded by the UniNet EC NEST consortium contract number 12990 to YB.

References (51)

  • S.A. Rifkin et al.

    Evolution of gene expression in the Drosophila melanogaster subgroup

    Nat Genet

    (2003)
  • M. Kellis et al.

    Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae

    Nature

    (2004)
  • P. Cliften et al.

    Finding functional features in Saccharomyces genomes by phylogenetic footprinting

    Science

    (2003)
  • M. Kellis et al.

    Sequencing and comparison of yeast species to identify genes and regulatory elements

    Nature

    (2003)
  • Insights into social insects from the genome of the honeybee Apis mellifera. Nature 2006,...
  • S.A. McCarroll et al.

    Comparing genomic expression patterns across species identifies shared transcriptional profile in aging

    Nat Genet

    (2004)
  • J.C. Fay et al.

    Population genetic variation in gene expression is associated with phenotypic variation in Saccharomyces cerevisiae

    Genome Biol

    (2004)
  • Y. Gilad et al.

    Expression profiling in primates reveals a rapid evolution of human transcription factors

    Nature

    (2006)
  • P. Khaitovich et al.

    Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees

    Science

    (2005)
  • D.J. Kliebenstein et al.

    Genomic survey of gene expression diversity in Arabidopsis thaliana

    Genetics

    (2006)
  • C.R. Landry et al.

    Genome-wide scan reveals that genetic variation for transcriptional plasticity in yeast is biased towards multicopy and dispensable genes

    Gene

    (2006)
  • J.M. Ranz et al.

    Sex-dependent gene expression and evolution of the Drosophila transcriptome

    Science

    (2003)
  • M.A. Sartor et al.

    A new method to remove hybridization bias for interspecies comparison of global gene expression profiles uncovers an association between mRNA sequence divergence and differential gene expression in Xenopus

    Nucl Acids Res

    (2006)
  • I. Tirosh et al.

    A genetic signature of interspecies variations in gene expression

    Nat Genet

    (2006)
  • G. Yvert et al.

    Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors

    Nat Genet

    (2003)
  • B. Lemos et al.

    Regulatory evolution across the protein interaction network

    Nat Genet

    (2004)
  • Z. Gu et al.

    Duplicate genes increase gene expression diversity within and between species

    Nat Genet

    (2004)
  • J. Berg et al.

    Cross-species analysis of biological networks by Bayesian alignment

    Proc Natl Acad Sci USA

    (2006)
  • S. Bergmann et al.

    Similarities and differences in genome-wide expression data of six organisms

    PLoS Biol

    (2004)
  • J.M. Stuart et al.

    A gene-coexpression network for global discovery of conserved genetic modules

    Science

    (2003)
  • I. Tirosh et al.

    Computational verification of protein–protein interactions by orthologous co-expression

    BMC Bioinformatics

    (2005)
  • B. Snel et al.

    Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes

    Nucl Acids Res

    (2004)
  • R. Sharan et al.

    Conserved patterns of protein interaction in multiple species

    Proc Natl Acad Sci USA

    (2005)
  • J. Ihmels et al.

    Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program

    PLoS Genet

    (2005)
  • I. Lozada-Chavez et al.

    Bacterial regulatory networks are extremely flexible in evolution

    Nucl Acids Res

    (2006)
  • Cited by (43)

    • Systems Immunology and Infection Microbiology

      2021, Systems Immunology and Infection Microbiology
    • Development and application of the adverse outcome pathway framework for understanding and predicting chronic toxicity: I. Challenges and research needs in ecotoxicology

      2015, Chemosphere
      Citation Excerpt :

      The motivation to compare transcriptional responses across species is mainly driven by the notion that transcriptional responses that are conserved would also be functionally important (McCarroll et al., 2004; Tirosh et al., 2006; Lu et al., 2009) and could therefore lead to identification of MIEs relevant for a large number of species. When actual expression levels are compared across species and time points, typically only a small fraction of the available information can be analyzed, because ortholog pairs are difficult to identify among distantly related species (Tirosh et al., 2007). To overcome this limitation, it has been suggested to focus instead on patterns of gene expression, as it was shown that such co-expressed modules, when conserved, also have high functional significance (Bergmann et al., 2004).

    • The evolution of gene expression regulatory networks in yeasts

      2011, Comptes Rendus - Biologies
      Citation Excerpt :

      In other words, subtle genomic changes in cisregulatory elements and in the amino acid composition of transregulatory proteins can result in large rewiring of regulatory networks and gene expression patterns. This observation, together with the rapid increase of available genome sequences from the whole tree of life and the development of costless and efficient multispecies transcriptomic platforms, led to the emergence of a new discipline called comparative functional genomics [9]. Among eucaryotes, the Hemiascomycota yeast phylum is a valuable model to study the evolution of gene expression regulation [10].

    • The importance of transcriptomics in plant stress response studies

      2023, Transcriptome Analysis and Why it Matters
    View all citing articles on Scopus
    View full text