Integration of gene expression data into genome-scale metabolic models

doi:10.1016/j.ymben.2003.12.002

Metabolic Engineering

Volume 6, Issue 4, October 2004, Pages 285-293

https://doi.org/10.1016/j.ymben.2003.12.002 Get rights and content

Abstract

A framework for integration of transcriptome data into stoichiometric metabolic models to obtain improved flux predictions is presented. The key idea is to exploit the regulatory information in the expression data to give additional constraints on the metabolic fluxes in the model. Measurements of gene expression from chemostat and batch cultures of Saccharomyces cerevisiae were combined with a recently developed genome-scale model, and the computed metabolic flux distributions were compared to experimental values from carbon labeling experiments and metabolic network analysis. The integration of expression data resulted in improved predictions of metabolic behavior in batch cultures, enabling quantitative predictions of exchange fluxes as well as qualitative estimations of changes in intracellular fluxes. A critical discussion of correlation between gene expression and metabolic fluxes is given.

Introduction

Following the developments in genomics there has been an increased focus on the behavior of complete biological systems. In such integrative analysis, also referred to as systems biology (Kitano, 2000; Ideker et al., 2001; Nielsen and Olsson, 2002), biological data from all levels of metabolism, from genome to metabolome, are combined in order to view the studied organism as a whole rather than investigating the single components of the system. In order to integrate the wealth of information, mathematical models play an important role, and systems biology is therefore often associated with quantitative investigation of the biological system under study.

Several mathematical modeling frameworks have been developed to describe and to analyze the metabolic behavior of an organism or a living cell (Gombert and Nielsen, 2000; Arkin, 2001). One of these approaches is stoichiometric modeling, which relies on mass balances over intracellular metabolites and the assumption of pseudo-steady-state conditions to determine intracellular metabolic fluxes. The information contained in a stoichiometric model itself results in an underdetermined linear equation system, which is not enough to calculate a unique flux distribution, and the models are therefore combined with additional experimental data or assumptions to yield a well-defined flux map. Examples of applications are, for instance, calculation of metabolic fluxes for a specific experiment (Aiba and Matsuoka, 1979; Christensen and Nielsen, 2000), and prediction of how phenotypic behavior is affected by genetic or environmental changes (Varma et al., 1993; Edwards and Palsson, 2000; Stuckrath et al., 2002; Segre et al., 2002).

An advantage with stoichiometric modeling is that it is based on well-known stoichiometric coefficients and that it does not require determination of parameters like kinetic constants. With the increasing amount of biological knowledge in public databases, it is therefore relatively straightforward to construct detailed metabolic models, and in recent years large-scale models primarily based on genome sequence information have been developed. The modeled organisms include the prokaryotes Haemophilus influenzae, Escherichia coli, Helicobacter pylori (Edwards and Palsson (1999), Edwards and Palsson (2000); Schilling et al., 2002), and most recently the eukaryote Saccharomyces cerevisiae (Forster et al., 2003). These models have a few hundred to over thousand reactions and are typically used for computational studies, for instance systematic insertion or deletion of heterologous reactions to obtain improved metabolic properties (Burgard and Maranas, 2001). A common approach is the so-called flux balance analysis where metabolic behavior is simulated under the assumption that the cells exhibit optimal growth (Varma and Palsson, 1994). For prokaryotes this assumption seems to hold true in many cases (Edwards et al., 2001) and recently it was demonstrated experimentally that sub-optimal growing cells could be evolved to the predicted optimal phenotype (Ibarra et al., 2002).

The price to pay for the simplicity of stoichiometric models is that no information on metabolic regulation is included. For instance, the S. cerevisiae model readily provides a good prediction of phenotypic behavior in glucose-limited aerobic and anaerobic chemostats (Famili et al., 2003). However, unless further information is supplied it is difficult to describe batch cultivations where the glucose levels are high and regulatory phenomena, referred to as glucose repression (Ronne, 1995; Gancedo, 1998; Johnston, 1999), drastically decrease respiration, biomass yield, etc. To improve the flux estimates, one can provide additional physiological information, such as experimentally measured uptake rates or knowledge about enzyme activities, to constrain the range of possible flux distributions.

Covert et al. (2001) suggested how the stoichiometric modeling framework could be extended with an overlaid transcriptional regulatory network using a Boolean logic formalism. This was later applied to a moderate size model for the central carbon metabolism in E. coli (Covert and Palsson, 2002). Apart from the fact that many regulatory phenomena cannot be accurately described by Boolean logic, this approach is at present primarily limited by the available knowledge on regulatory processes. As an alternative, we here investigate the possibilities to benefit from genome-wide measurements of transcription, e.g., using DNA or oligonucleotide microarrays. We discuss how such measurements can be combined with genome-scale stoichiometric models, thereby incorporating information on transcriptional regulation and hence improving prediction performance. Examples where gene expression data, from batch and chemostat cultivations of S. cerevisiae, are combined with the recently developed yeast model (Forster et al., 2003) are given, and the results are compared to experimental values obtained in Gombert et al. (2001) using ¹³C-labeling experiments.

Section snippets

Relating gene expression to fluxes

With the aim of obtaining improved flux predictions by extracting information from gene expression data, an important question is if or when gene expression, via translation, enzyme regulation, etc., at all correlates with a given metabolic flux. As have been shown, there is not necessarily correlation between gene expression and protein concentration (Gygi et al., 1999) or between enzyme activity and metabolic flux (ter Kuile and Westerhoff, 2001). Obviously, the same applies for the relation

From transcriptome data to flux constraints: a simple approach

As discussed above, it is difficult to correlate gene expression with metabolic flux. To generate constraints on metabolic fluxes, however, it can obviously be exploited that if a gene is not expressed, the corresponding protein and its related activity will, at steady state, be absent. Accordingly, one may use expression data to detect enzyme-coding genes that are not expressed and then constrain the corresponding metabolic fluxes to zero in the model simulation, thus reducing the feasible

Prediction of fluxes in batch cultivations of $S. cerevisiae$

As a case study, the method outlined above was applied to expression data from batch and chemostat cultivations of S. cerevisiae with glucose as carbon source (Piper et al., 2002; Westergaard et al., manuscript in preparation) in combination with the genome-scale model presented in Forster et al. (2003). Flux balance predictions using the model give good results for aerobic and anaerobic glucose-limited chemostat cultivations but as mentioned above, it fails to predict the reduced biomass yield

Robustness: experimental and computational considerations

Control of metabolism rarely resides at single enzymes but is rather distributed over several genes or enzymes as in glucose repression, where the expression for a large number of genes is affected (Ronne, 1995). The Boolean nature of the presented method may, in particular for lowly expressed genes, give a large impact for small expression changes in single genes. On the one hand, one may have to accept that the upper bounds for metabolic performance calculated in the simulations are, although

Concluding remarks

While many systems biology approaches neglect the metabolite level, we want to emphasis the vast amount of existing metabolic knowledge and see the metabolism as an important part to understand cellular systems. As an example of this, we have discussed how detailed metabolic models can be combined with transcription data to get improved predictions of cellular behavior.

The key idea was to exploit regulatory information in transcriptome data to give additional constraints on metabolic fluxes in

Acknowledgements

The authors thank Steen Lund Westergaard and Christoffer Bro for valuable discussions and for sharing experimental data prior to publication. Financial support from the Alf Åkerman—Trygg Hansa foundation, the Danish Biotechnology Instrument Center (DABIC), and the Øresund Bio+IT postdoc program is gratefully acknowledged.

References (48)

S.H. Ackerman et al.
ATP10, a yeast nuclear gene required for the assembly of the mitochondrial F1–F0 complex
J. Biol. Chem
(1990)
A.P. Arkin
Synthetic cell biology
Curr. Opin. Biotechnol
(2001)
M.W. Covert et al.
Transcriptional regulation in constraints-based metabolic models of E. coli
J. Biol. Chem
(2002)
M.W. Covert et al.
Regulation of gene expression in flux balance models of metabolism
J. Theor. Biol
(2001)
E. Dibrov et al.
The Saccharomyces cerevisiae TCM62 gene encodes a chaperone necessary for the assembly of the mitochondrial succinate dehydrogenase (Complex II)
J. Biol. Chem
(1998)
J.S. Edwards et al.
Systems properties of the Haemophilus influenzae Rd metabolic genotype
J. Biol. Chem
(1999)
F. Foury et al.
The complete sequence of the mitochondrial genome of Saccharomyces cerevisiae
FEBS Lett
(1998)
A.K. Gombert et al.
Mathematical modelling of metabolism
Curr. Opin. Biotechnol
(2000)
M. Johnston
Feasting, fasting and fermenting—glucose sensing in yeast and other cells
Trends Genet
(1999)
J. Nielsen et al.
An expanded role for microbial physiology in metabolic engineering and functional genomicsmoving towards systems biology
FEMS Yeast Res
(2002)

M.F. Paul et al.

A single amino acid change in subunit 6 of the yeast mitochondrial ATPase suppresses a null mutation in ATP10

J. Biol. Chem

(2000)

M.D.W. Piper et al.

Reproducibility of oligonucleotide microarray transcriptome analyses—an interlaboratory comparison using chemostat cultures of Saccharomyces cerevisiae

J. Biol. Chem

(2002)

H. Ronne

Glucose repression in fungi

Trends Genet

(1995)

S. Schuster et al.

Detection of elementary flux modes in biochemical networksa promising tool for pathway analysis and metabolic engineering

Trends Biotechnol

(1999)

B.H. ter Kuile et al.

Transcriptome meets metabolomehierarchical and metabolic regulation of the glycolytic pathway

FEBS Lett

(2001)

H. Tourriere et al.

MRNA degradation machines in eukaryotic cells

Biochimie

(2002)

J.P. van Dijken et al.

An interlaboratory comparison of physiological and genetic properties of four Saccharomyces cerevisiae strains

Enz. Microb. Technol

(2000)

Affymetrix, 2000. Affymetrix GeneChip Expression Analysis Technical Manual. Affymetrix Inc., Santa Clara, CA,...

S. Aiba et al.

Identification of metabolic model—citrate production from glucose by Candida lipolytica

Biotechnol. Bioeng

(1979)

D. Bertsimas et al.

Introduction to Linear Optimization

(1997)

A.P. Burgard et al.

Probing the performance limits of the E. coli metabolic network subject to gene additions or deletions

Biotechnol. Bioeng

(2001)

B. Christensen et al.

Metabolic network analysis of Penicillium chrysogenum using C-13-labeled glucose

Biotechnol. Bioeng

(2000)

J.S. Edwards et al.

The E. coli MG1655 in silico metabolic genotypeits definition, characteristics, and capabilities

Proc. Natl. Acad. Sci

(2000)

J.S. Edwards et al.

In silico predictions of E. coli metabolic capabilities are consistent with experimental data

Nat. Biotechnol

(2001)

Cited by (175)

Phenotype-specific estimation of metabolic fluxes using gene expression data
2023, iScience
A cell’s genome influences its metabolism via the expression of enzyme-related genes, but transcriptome and fluxome are not perfectly correlated as post-transcriptional mechanisms also regulate reaction’s kinetics. Here, we addressed the question: given a transcriptome, how unobserved mechanisms of reaction kinetics should be systematically accounted for when inferring the fluxome? To infer the most likely and least biased fluxome, we present Pheflux, a constraint-based model maximizing Shannon’s entropy of fluxes per mRNA. Benchmarked against ¹³C fluxes of yeast and bacteria, Pheflux accurately estimates the carbon core metabolism. We applied Pheflux to thousands of normal and tumor cell transcriptomes obtained from The Cancer Genome Atlas. Pheflux showed statistically significantly higher glucose yields on lactate in breast, kidney, and bronchus-lung tumoral cells than their normal counterparts. Results are consistent with the Warburg effect, a hallmark of cancer metabolism, suggesting that Pheflux can be efficiently used to study the metabolism of eukaryotic cells.
Guidelines for extracting biologically relevant context-specific metabolic models using gene expression data
2023, Metabolic Engineering
Genome-scale metabolic models comprehensively describe an organism's metabolism and can be tailored using omics data to model condition-specific physiology. The quality of context-specific models is impacted by (i) choice of algorithm and parameters and (ii) alternate context-specific models that equally explain the -omics data. Here we quantify the influence of alternate optima on microbial and mammalian model extraction using GIMME, iMAT, MBA, and mCADRE. We find that metabolic tasks defining an organism's phenotype must be explicitly and quantitatively protected. The scope of alternate models is strongly influenced by algorithm choice and the topological properties of the parent genome-scale model with fatty acid metabolism and intracellular metabolite transport contributing much to alternate solutions in all models. mCADRE extracted the most reproducible context-specific models and models generated using MBA had the most alternate solutions. There were fewer qualitatively different solutions generated by GIMME in E. coli, but these increased substantially in the mammalian models. Screening ensembles using a receiver operating characteristic plot identified the best-performing models. A comprehensive evaluation of models extracted using combinations of extraction methods and expression thresholds revealed that GIMME generated the best-performing models in E. coli, whereas mCADRE is better suited for complex mammalian models. These findings suggest guidelines for benchmarking -omics integration algorithms and motivate the development of a systematic workflow to enumerate alternate models and extract biologically relevant context-specific models.
Minireview: Engineering evolution to reconfigure phenotypic traits in microbes for biotechnological applications
2023, Computational and Structural Biotechnology Journal
Adaptive laboratory evolution (ALE) has long been used as the tool of choice for microbial engineering applications, ranging from the production of commodity chemicals to the innovation of complex phenotypes. With the advent of systems and synthetic biology, the ALE experimental design has become increasingly sophisticated. For instance, implementation of in silico metabolic model reconstruction and advanced synthetic biology tools have facilitated the effective coupling of desired traits to adaptive phenotypes. Furthermore, various multi-omic tools now enable in-depth analysis of cellular states, providing a comprehensive understanding of the biology of even the most genomically perturbed systems. Emerging machine learning approaches would assist in streamlining the interpretation of massive and multiplexed datasets and promoting our understanding of complexity in biology. This review covers some of the representative case studies among the 700 independent ALE studies reported to date, outlining key ideas, principles, and important mechanisms underlying ALE designs in bioproduction and synthetic cell engineering, with evidence from literatures to aid comprehension.
Network biology and artificial intelligence drive the understanding of the multidrug resistance phenotype in cancer
2022, Drug Resistance Updates
Globally with over 10 million deaths per year, cancer is the most transversal disease across countries, cultures, and ethnicities, affecting both developed and developing regions. Tumorigenesis is dynamically altered by distinct events and can be lethal when untreated. Despite the innovative therapeutics available, multidrug resistance (MDR) to chemotherapy remains the major hindrance to the success of cancer therapy. The multiple mechanisms by which cancer cells evade cell death are diverse, indicating that MDR involves complex interconnected biological networks.
Molecular profiling is currently able to stratify cancer into its distinct subtypes and help identify the best therapeutics, leading to “translational systems medicine”. Highly specialized methodologies are generating a large amount of “omics” data – including epigenetics, genomics, transcriptomics, proteomics, metabolomics, as well as pharmacogenomics. Many of the resulting databases store data in non-standard formats, which need to be converted, interpreted, and merged into readable formats. The latest development of artificial intelligence (AI) methodologies and tools, coupled with advancements in large-scale data management and powerful graphic processing computing units, potentiate the integration of these large data sources into relevant biological networks, which will enhance our understanding of cancer MDR.
In this review, we revisit common MDR mechanisms and compile a list of the most relevant “omics” public databases. We highlight examples of AI methods that are now decisively contributing to clear advances in cancer research, such as identification of new drugs from large databases and prediction of relevant drug, target, and system properties. An overview of several freely available “ready-to-use” algorithms is also provided. The described molecular scale AI algorithms and tools will undoubtedly guide important improvements in efficiency and efficacy of traditional methods of cancer diagnostics and treatment.
Metabolic modeling of fungi
2021, Encyclopedia of Mycology
Fungi have received special interest from the biotechnological sector focused on the production of active biomolecules and strain engineering. Genome-scale metabolic models (GEMs) are used to understand and improve their metabolism. GEMs can be obtained using computational methods, but they are susceptible to errors. Here, we describe the process to reconstruct a GEM and discuss several methodologies for analysis and validation of the model. We review different applications and explore the latest advances in GEM reconstruction to ease the process. With the new developments, we will be able to reconstruct and analyze high-quality GEMs for non-model species.
Model validation and selection in metabolic flux analysis and flux balance analysis
2024, Biotechnology Progress

View all citing articles on Scopus

¹: Current address: Novo Nordisk A/S, BioProcess Laboratories, Novo Allé, Dk-2880 Bagsvaerd, Denmark. E-mail address: [email protected].

View full text

Integration of gene expression data into genome-scale metabolic models

Abstract

Introduction

Section snippets

Relating gene expression to fluxes

From transcriptome data to flux constraints: a simple approach

Prediction of fluxes in batch cultivations of S.cerevisiae

Robustness: experimental and computational considerations

Concluding remarks

Acknowledgements

J. Biol. Chem

Curr. Opin. Biotechnol

J. Biol. Chem

J. Theor. Biol

J. Biol. Chem

J. Biol. Chem

FEBS Lett

Curr. Opin. Biotechnol

Trends Genet

FEMS Yeast Res

J. Biol. Chem

J. Biol. Chem

Trends Genet

Trends Biotechnol

FEBS Lett

Biochimie

Enz. Microb. Technol

Identification of metabolic model—citrate production from glucose by Candida lipolytica

Biotechnol. Bioeng

Introduction to Linear Optimization

Probing the performance limits of the E. coli metabolic network subject to gene additions or deletions

Biotechnol. Bioeng

Metabolic network analysis of Penicillium chrysogenum using C-13-labeled glucose

Biotechnol. Bioeng

The E. coli MG1655 in silico metabolic genotypeits definition, characteristics, and capabilities

Proc. Natl. Acad. Sci

In silico predictions of E. coli metabolic capabilities are consistent with experimental data

Nat. Biotechnol

Prediction of fluxes in batch cultivations of $S. cerevisiae$