Evolutionary systems biology: links between gene evolution and function

https://doi.org/10.1016/j.copbio.2006.08.003Get rights and content

The recent accumulation of genome-wide data on various facets of gene expression, function and evolution stimulated the emergence of a new field, evolutionary systems biology. Many significant correlations were detected between variables that characterize the functioning of a gene, such as expression level, knockout effect, connectivity of genetic and protein–protein interaction networks, and variables that describe gene evolution, such as sequence evolution rate and propensity for gene loss. The first attempts on multidimensional analysis of genomic data yielded composite variables that describe the ‘status’ of a gene in the genomic community. However, it remains uncertain whether different functional variables affect gene evolution synergistically or there is a single, dominant factor. The number of translation events, linked to selection for translational robustness, was proposed as a candidate for such a major determinant of protein evolution. These developments show that, although the methodological basis of evolutionary systems biology is not yet fully solidified, this area of research is already starting to yield fundamental biological insights.

Introduction

The ultimate goal of systems biology is to achieve an integrated understanding of life forms with all their characteristic complexity of interactions at multiple levels. Clearly, such an understanding is unimaginable without an essential evolutionary component; that is deciphering the ways in which biological systems change in time, which changes are affected by selection and which are neutral, and how changes at one level of a system reflect on the evolution at other levels [1, 2, 3, 4, 5]. From a more general perspective, evolutionary systems biology focuses on one of the core problems of all biology, the evolutionary interplay between the genotype and the phenotype [6].

The most basic level of analysis in evolutionary systems biology involves identification of correlations between diverse genome-wide variables, and many such correlations have been described [7•, 8, 9, 10]. Examples include the negative correlation between the sequence evolution rate and expression level of a gene [11], the positive correlation between a gene's centrality in interaction networks and knockout effect [12], the positive correlation between a gene's dispensability and sequence evolution rate [13, 14], the negative correlation between node degree in gene co-expression networks and rate of evolution in both synonymous and non-synonymous positions of the coding sequence [15], and others. More often than not, however, the interpretation of these observations remains problematic for at least two reasons: first, although statistically significant thanks to the (typically) huge number of data points, the correlations are, usually, relatively weak, and second, the existence of multiple, weak correlations makes it hard to identify the primary, functionally meaningful connections as opposed to secondary, induced ones.

In this brief review, we discuss recent studies that addressed these problems by devising complementary statistical approaches aimed at identifying genuinely relevant relationships between variables that characterize genome function and evolution. We also cover recent work that attempts integrative approaches to the analysis of multiple genome-related variables and discuss glimpses of biological meaning that are starting to emerge from these quantitative studies.

Section snippets

Dissection and interpretation of correlations between multiple genome-related variables

Table 1 summarizes the most prominent correlations that have been reported to exist between variables that characterize evolution, expression and function of genes. Notably, the validity and significance of most of these correlations have been questioned at one point or another. A good example is the fundamental relationship between a gene's dispensability and its rate of evolution. The intuitively plausible prediction that genes making a large contribution to the organism's fitness would

The multidimensional space of quantitative genomics

Given the multiple, relatively weak correlations between the variables that characterize genome function and evolution, and the coherent pattern formed by the two classes of these variables (Figure 1), multivariate statistics methods appear to be the approach of choice for dissecting the multidimensional space of quantitative genomics. Essentially, these methods attempt to replace multiple, correlated variables such as those shown in Table 1 and Figure 1 by a smaller number of composite,

Gleaning the biology behind the correlations

The links between different variables characterizing genome function and evolution are intriguing, and the observations that composite variables seem to better describe biological systems are encouraging. However, these findings do not immediately reveal the biological underpinning of the observed relationships, and the evolutionary rationales for the emergence of these connections. Several recent studies strived to address the biology behind the deluge of genomic data in a more direct manner.

A

Conclusions

In this review, we cover only one research direction in evolutionary systems biology. Because of space constraints, we could not give any attention to other, equally dynamic, subfields, in particular, the burgeoning area of biological network analysis [44]. However, even this limited perspective seems to reveal the salient features of today's evolutionary systems biology. Above all, this is still a nascent field where the methodology remains in constant flux, the data are, obviously, noisy, the

References and recommended reading

Papers of particular interest, published within the annual period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

We apologize to all researchers whose relevant contributions could not be cited because of space limitations. This work was supported by the Intramural Research Program of the National Library of Medicine at National Institutes of Health/DHHS.

References (44)

  • Y.I. Wolf

    Coping with the quantitative genomics ‘elephant’: the correlation between the gene dispensability and evolution rate

    Trends Genet

    (2006)
  • C. Pal et al.

    Highly expressed genes in yeast evolve slowly

    Genetics

    (2001)
  • H. Jeong et al.

    Lethality and centrality in protein networks

    Nature

    (2001)
  • A.E. Hirsh et al.

    Protein dispensability and rate of evolution

    Nature

    (2001)
  • I.K. Jordan et al.

    Essential genes are more evolutionarily conserved than are nonessential genes in bacteria

    Genome Res

    (2002)
  • I.K. Jordan et al.

    Conservation and co-evolution in the scale-free human gene co-expression network

    Mol Biol Evol

    (2004)
  • M. Kimura et al.

    On some principles governing molecular evolution

    Proc Natl Acad Sci USA

    (1974)
  • E. Zuckerkandl

    Evolutionary processes and evolutionary noise at the molecular level. I. Functional density in proteins

    J Mol Evol

    (1976)
  • A.C. Wilson et al.

    Biochemical evolution

    Annu Rev Biochem

    (1977)
  • L.D. Hurst et al.

    Do essential genes evolve slowly?

    Curr Biol

    (1999)
  • C. Pal et al.

    Genomic function: Rate of evolution and gene dispensability

    Nature

    (2003)
  • E.P. Rocha et al.

    An analysis of determinants of amino acids substitution rates in bacterial proteins

    Mol Biol Evol

    (2004)
  • Cited by (79)

    • Gene Golden Age paradox and its partial solution

      2019, Genomics
      Citation Excerpt :

      The previous works showed that purifying selection is on average weaker in primates as compared with rodents [23–28]. However, it is known that intensity of purifying selection varies within the genome depending on gene function [29]. Therefore, a partial solution of GGA paradox could be a concentration of purifying selection in more complex organisms on key molecular players.

    • Evolutionary design of regulatory control. I. A robust control theory analysis of tradeoffs

      2019, Journal of Theoretical Biology
      Citation Excerpt :

      Studies rarely combine the details of regulatory control architecture with the evolutionary analysis of variability and change in populations. In this article, I work toward building the theoretical foundation for integrating regulatory control and evolutionary perspectives (Koonin and Wolf, 2006; Soyer, 2012). In the mechanistic literature, studies in systems biology, physiology, and behavior consider how regulatory control systems respond to changes in the environment.

    • Deep metazoan phylogeny: When different genes tell different stories

      2013, Molecular Phylogenetics and Evolution
      Citation Excerpt :

      Since complete genome sequences are available for few non-bilaterian metazoan species, the alignments used in this study (and in other genomic-scale deep metazoan phylogeny studies) are dominated by EST-derived sequences and contain relatively high amounts of missing data (13–36% missing data in our matrices and 50% and 27% in the datasets from Dunn et al. (2008) and Philippe et al. (2009), respectively (Table 1). In this study, we partitioned our total dataset based on gene functions as a proxy for the rate of evolution (reviewed in Koonin and Wolf (2006)). We constructed two non-overlapping matrices sufficiently long for analyzing deep metazoan phylogeny (>8000 characters; as suggested by Rokas et al. (2003b)).

    • Expanding interactome analyses beyond model eukaryotes

      2022, Briefings in Functional Genomics
    View all citing articles on Scopus
    View full text