Comparative biology: beyond sequence analysis
Introduction
One of the surprising discoveries of modern biology is the strong conservation of protein sequences, as well as cellular mechanisms, across evolution. Some of our metabolic genes, for example, display strong sequence and functional similarities to their bacterial counterparts. Moreover, central features of core cellular processes such as cell cycle progression, DNA replication or transcription are conserved from yeast to human. Indeed, this extensive conservation had motivated the use of model organisms as means for studying conserved processes that are more difficult to assay in complex systems. Sequence similarity, in particular, has emerged as a key tool in predicting functional properties. In fact, most annotations of newly sequenced genomes, including gene prediction, gene function or regulatory elements, are based on similarity with other sequences whose functions have been described (e.g. [1, 2, 3]).
Whereas most comparative studies focus on conserved properties as means for characterizing functional elements, inter-species differences are also of interest. These differences are arguably the best indicators of evolutionary history and provide much information about species-specific adaptations. Identifying such differences is thus central to our search for what makes us human and how biological diversity is generated.
Technological advances over the past decade have led to the accumulation of genome-scale data describing not only gene sequence but also functional properties including gene expression, protein–protein interactions, and the binding of transcription factors to DNA. Such data are now available for multiple organisms, and their complexity presents a new set of challenges to comparative analysis. For example, unlike sequence information, most functional properties are condition-dependent, a property that needs to be accounted for during inter-species comparisons. Furthermore, sequence analysis is typically gene-specific and compares specific sets of homologs. Functional properties, on the contrary, often reflect the integrated function of multiple genes, calling for novel methods that allow network-centered rather than gene-centered comparisons. Finally, functional genomic data generated with current technologies suffer from high levels of noise and therefore need to be filtered in order to obtain valid conclusions.
In this review we focus on recent studies that employ comparative methods to analyze gene expression data. We describe the different approaches employed in such an analysis and highlight the remaining challenges.
Section snippets
Comparative analysis of condition-specific gene expression
An important element in a gene's function is the spatial and temporal pattern by which it is expressed. For example, diverse cell types are generated by extensive modifications in the expression of the same set of genes. In recent years, microarray technology has facilitated thousands of experiments that characterized genome-wide expression levels under a wide variety of conditions. Such data are now available for diverse organisms, providing a rich resource for comparative studies.
The initial
Comparative analysis of co-expression
A major difficulty in comparing expression data between organisms is that gene expression is not static and changes depending on the external conditions. As was described above, this fact can be accounted for by considering compatible gene expression data. This approach, however, severely limits the data that can be used for comparative analysis, as only a small fraction of the available data in different species is comparable. Moreover, even if conditions seem equivalent, evolutionary distant
Duplicated genes and orthology relationships
For any comparison to be meaningful, one must first decide what is being compared and identify the common elements of the compared objects (Figure 2). In comparing data of different organisms, the basic elements are typically pairs of one-to-one orthologous genes, which rely on a mapping of orthology. Most comparative studies employ a strict definition for one-to-one orthology by considering, for example, syntenic or reciprocal best sequence matches and excluding ambiguities that arise from
Concluding remarks
We described different approaches for comparative analysis of functional genomics data (Figure 1). The most straightforward approach is to consider a single feature, characterizing each gene (e.g. expression levels under a specific condition) and compare it among orthologous genes. This approach is easy to interpret and the simplest to apply. It can also be easily extended to profiles of multiple features (e.g. expression levels under several conditions) to capture a broader scope of gene
References and recommended reading
Papers of particular interest, published within the annual period of the review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
This work was supported by grants from the Kahn fund for Systems Biology at the Weizmann Institute of Science, the Tauber fund and the Israeli Ministry of Science. This work was partially funded by the UniNet EC NEST consortium contract number 12990 to YB.
References (51)
- et al.
Evolution of gene expression in the Drosophila melanogaster subgroup
Nat Genet
(2003) - et al.
Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae
Nature
(2004) - et al.
Finding functional features in Saccharomyces genomes by phylogenetic footprinting
Science
(2003) - et al.
Sequencing and comparison of yeast species to identify genes and regulatory elements
Nature
(2003) - Insights into social insects from the genome of the honeybee Apis mellifera. Nature 2006,...
- et al.
Comparing genomic expression patterns across species identifies shared transcriptional profile in aging
Nat Genet
(2004) - et al.
Population genetic variation in gene expression is associated with phenotypic variation in Saccharomyces cerevisiae
Genome Biol
(2004) - et al.
Expression profiling in primates reveals a rapid evolution of human transcription factors
Nature
(2006) - et al.
Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees
Science
(2005) - et al.
Genomic survey of gene expression diversity in Arabidopsis thaliana
Genetics
(2006)
Genome-wide scan reveals that genetic variation for transcriptional plasticity in yeast is biased towards multicopy and dispensable genes
Gene
Sex-dependent gene expression and evolution of the Drosophila transcriptome
Science
A new method to remove hybridization bias for interspecies comparison of global gene expression profiles uncovers an association between mRNA sequence divergence and differential gene expression in Xenopus
Nucl Acids Res
A genetic signature of interspecies variations in gene expression
Nat Genet
Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors
Nat Genet
Regulatory evolution across the protein interaction network
Nat Genet
Duplicate genes increase gene expression diversity within and between species
Nat Genet
Cross-species analysis of biological networks by Bayesian alignment
Proc Natl Acad Sci USA
Similarities and differences in genome-wide expression data of six organisms
PLoS Biol
A gene-coexpression network for global discovery of conserved genetic modules
Science
Computational verification of protein–protein interactions by orthologous co-expression
BMC Bioinformatics
Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes
Nucl Acids Res
Conserved patterns of protein interaction in multiple species
Proc Natl Acad Sci USA
Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program
PLoS Genet
Bacterial regulatory networks are extremely flexible in evolution
Nucl Acids Res
Cited by (43)
Evolution and function of the epithelial cell-specific ER stress sensor IRE1β
2021, Mucosal ImmunologySystems Immunology and Infection Microbiology
2021, Systems Immunology and Infection MicrobiologyDevelopment and application of the adverse outcome pathway framework for understanding and predicting chronic toxicity: I. Challenges and research needs in ecotoxicology
2015, ChemosphereCitation Excerpt :The motivation to compare transcriptional responses across species is mainly driven by the notion that transcriptional responses that are conserved would also be functionally important (McCarroll et al., 2004; Tirosh et al., 2006; Lu et al., 2009) and could therefore lead to identification of MIEs relevant for a large number of species. When actual expression levels are compared across species and time points, typically only a small fraction of the available information can be analyzed, because ortholog pairs are difficult to identify among distantly related species (Tirosh et al., 2007). To overcome this limitation, it has been suggested to focus instead on patterns of gene expression, as it was shown that such co-expressed modules, when conserved, also have high functional significance (Bergmann et al., 2004).
The evolution of gene expression regulatory networks in yeasts
2011, Comptes Rendus - BiologiesCitation Excerpt :In other words, subtle genomic changes in cisregulatory elements and in the amino acid composition of transregulatory proteins can result in large rewiring of regulatory networks and gene expression patterns. This observation, together with the rapid increase of available genome sequences from the whole tree of life and the development of costless and efficient multispecies transcriptomic platforms, led to the emergence of a new discipline called comparative functional genomics [9]. Among eucaryotes, the Hemiascomycota yeast phylum is a valuable model to study the evolution of gene expression regulation [10].
The dual role of microRNA (miR)-20b in cancers: Friend or foe?
2023, Cell Communication and SignalingThe importance of transcriptomics in plant stress response studies
2023, Transcriptome Analysis and Why it Matters