Evolutionary systems biology: links between gene evolution and function
Introduction
The ultimate goal of systems biology is to achieve an integrated understanding of life forms with all their characteristic complexity of interactions at multiple levels. Clearly, such an understanding is unimaginable without an essential evolutionary component; that is deciphering the ways in which biological systems change in time, which changes are affected by selection and which are neutral, and how changes at one level of a system reflect on the evolution at other levels [1, 2, 3, 4, 5]. From a more general perspective, evolutionary systems biology focuses on one of the core problems of all biology, the evolutionary interplay between the genotype and the phenotype [6].
The most basic level of analysis in evolutionary systems biology involves identification of correlations between diverse genome-wide variables, and many such correlations have been described [7•, 8, 9, 10]. Examples include the negative correlation between the sequence evolution rate and expression level of a gene [11], the positive correlation between a gene's centrality in interaction networks and knockout effect [12], the positive correlation between a gene's dispensability and sequence evolution rate [13, 14], the negative correlation between node degree in gene co-expression networks and rate of evolution in both synonymous and non-synonymous positions of the coding sequence [15], and others. More often than not, however, the interpretation of these observations remains problematic for at least two reasons: first, although statistically significant thanks to the (typically) huge number of data points, the correlations are, usually, relatively weak, and second, the existence of multiple, weak correlations makes it hard to identify the primary, functionally meaningful connections as opposed to secondary, induced ones.
In this brief review, we discuss recent studies that addressed these problems by devising complementary statistical approaches aimed at identifying genuinely relevant relationships between variables that characterize genome function and evolution. We also cover recent work that attempts integrative approaches to the analysis of multiple genome-related variables and discuss glimpses of biological meaning that are starting to emerge from these quantitative studies.
Section snippets
Dissection and interpretation of correlations between multiple genome-related variables
Table 1 summarizes the most prominent correlations that have been reported to exist between variables that characterize evolution, expression and function of genes. Notably, the validity and significance of most of these correlations have been questioned at one point or another. A good example is the fundamental relationship between a gene's dispensability and its rate of evolution. The intuitively plausible prediction that genes making a large contribution to the organism's fitness would
The multidimensional space of quantitative genomics
Given the multiple, relatively weak correlations between the variables that characterize genome function and evolution, and the coherent pattern formed by the two classes of these variables (Figure 1), multivariate statistics methods appear to be the approach of choice for dissecting the multidimensional space of quantitative genomics. Essentially, these methods attempt to replace multiple, correlated variables such as those shown in Table 1 and Figure 1 by a smaller number of composite,
Gleaning the biology behind the correlations
The links between different variables characterizing genome function and evolution are intriguing, and the observations that composite variables seem to better describe biological systems are encouraging. However, these findings do not immediately reveal the biological underpinning of the observed relationships, and the evolutionary rationales for the emergence of these connections. Several recent studies strived to address the biology behind the deluge of genomic data in a more direct manner.
A
Conclusions
In this review, we cover only one research direction in evolutionary systems biology. Because of space constraints, we could not give any attention to other, equally dynamic, subfields, in particular, the burgeoning area of biological network analysis [44]. However, even this limited perspective seems to reveal the salient features of today's evolutionary systems biology. Above all, this is still a nascent field where the methodology remains in constant flux, the data are, obviously, noisy, the
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
We apologize to all researchers whose relevant contributions could not be cited because of space limitations. This work was supported by the Intramural Research Program of the National Library of Medicine at National Institutes of Health/DHHS.
References (44)
- et al.
Integrating ‘omic’ information: a bridge between genomics and systems biology
Trends Genet
(2003) - et al.
Converging on a general model of protein evolution
Trends Biotechnol
(2005) - Koonin EV: Systemic determinants of gene evolution and function. Mol Syst Biol 2005, 1:2005...
- et al.
Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution
Genome Res
(2003) Back to the biology in systems biology: what can we learn from biomolecular networks?
Brief Funct Genomic Proteomic
(2004)- et al.
The evolution of molecular genetic pathways and networks
Bioessays
(2004) Genomes, phylogeny, and evolutionary systems biology
Proc Natl Acad Sci USA
(2005)How will big pictures emerge from a sea of biological data?
Science
(2005)- et al.
From phenotype to genotype
Evol Dev
(2000) - Wolf YI, Carmel L, Koonin EV: Correlations between quantitative measures of genome evolution, expression and function....
Coping with the quantitative genomics ‘elephant’: the correlation between the gene dispensability and evolution rate
Trends Genet
Highly expressed genes in yeast evolve slowly
Genetics
Lethality and centrality in protein networks
Nature
Protein dispensability and rate of evolution
Nature
Essential genes are more evolutionarily conserved than are nonessential genes in bacteria
Genome Res
Conservation and co-evolution in the scale-free human gene co-expression network
Mol Biol Evol
On some principles governing molecular evolution
Proc Natl Acad Sci USA
Evolutionary processes and evolutionary noise at the molecular level. I. Functional density in proteins
J Mol Evol
Biochemical evolution
Annu Rev Biochem
Do essential genes evolve slowly?
Curr Biol
Genomic function: Rate of evolution and gene dispensability
Nature
An analysis of determinants of amino acids substitution rates in bacterial proteins
Mol Biol Evol
Cited by (79)
Gene Golden Age paradox and its partial solution
2019, GenomicsCitation Excerpt :The previous works showed that purifying selection is on average weaker in primates as compared with rodents [23–28]. However, it is known that intensity of purifying selection varies within the genome depending on gene function [29]. Therefore, a partial solution of GGA paradox could be a concentration of purifying selection in more complex organisms on key molecular players.
Evolutionary design of regulatory control. I. A robust control theory analysis of tradeoffs
2019, Journal of Theoretical BiologyCitation Excerpt :Studies rarely combine the details of regulatory control architecture with the evolutionary analysis of variability and change in populations. In this article, I work toward building the theoretical foundation for integrating regulatory control and evolutionary perspectives (Koonin and Wolf, 2006; Soyer, 2012). In the mechanistic literature, studies in systems biology, physiology, and behavior consider how regulatory control systems respond to changes in the environment.
Deep metazoan phylogeny: When different genes tell different stories
2013, Molecular Phylogenetics and EvolutionCitation Excerpt :Since complete genome sequences are available for few non-bilaterian metazoan species, the alignments used in this study (and in other genomic-scale deep metazoan phylogeny studies) are dominated by EST-derived sequences and contain relatively high amounts of missing data (13–36% missing data in our matrices and 50% and 27% in the datasets from Dunn et al. (2008) and Philippe et al. (2009), respectively (Table 1). In this study, we partitioned our total dataset based on gene functions as a proxy for the rate of evolution (reviewed in Koonin and Wolf (2006)). We constructed two non-overlapping matrices sufficiently long for analyzing deep metazoan phylogeny (>8000 characters; as suggested by Rokas et al. (2003b)).
Quantum biological insights into CRISPR-Cas9 sgRNA efficiency from explainable-AI driven feature engineering
2023, Nucleic Acids ResearchExpanding interactome analyses beyond model eukaryotes
2022, Briefings in Functional Genomics