Flux balance analysis: A geometric perspective

https://doi.org/10.1016/j.jtbi.2009.01.027Get rights and content

Abstract

Advances in the field of bioinformatics have led to reconstruction of genome-scale networks for a number of key organisms. The application of physicochemical constraints to these stoichiometric networks allows researchers, through methods such as flux balance analysis, to highlight key sets of reactions necessary to achieve particular objectives. The key benefits of constraint-based analysis lie in the minimal knowledge required to infer systemic properties. However, network degeneracy leads to a large number of flux distributions that satisfy any objective; moreover, these distributions may be dominated by biologically irrelevant internal cycles. By examining the geometry underlying the problem, we define two methods for finding a unique solution within the space of all possible flux distributions; such a solution contains no internal cycles, and is representative of the space as a whole. The first method draws on typical geometric knowledge, but cannot be applied to large networks because of the high computational complexity of the problem. Thus a second method, an iteration of linear programs which scales easily to the genome scale, is defined. The algorithm is run on four recent genome-scale models, and unique flux solutions are found. The algorithm set out here will allow researchers in flux balance analysis to exchange typical solutions to their models in a reproducible format. Moreover, having found a single solution, statistical analyses such as correlations may be performed.

Introduction

Recent advances in genome sequencing techniques and bioinformatic analyses have led to an explosion of systems-wide biological data. In turn the reconstruction of genome-scale networks for micro-organisms has become possible. Whilst the first stoichiometric model of E. coli was limited to the central metabolic pathways (Varma and Palsson, 1993), the most recent reported model is much more comprehensive, consisting of 2077 reactions and 1039 metabolites (Feist et al., 2007). Reaction networks for S. cerevisiae have been similarly expanded through incorporation of more genes and their corresponding metabolites—a recent consensus model consists of 1761 reactions and 1168 metabolites (Herrgård et al., 2008). Genome-scale stoichiometric models for other micro-organisms (Kim et al., 2008) and even H. sapiens (Duarte et al., 2007) have been developed.

The ability to analyse, interpret and ultimately predict cellular behaviour has been a long sought-after goal. The genome sequencing projects are defining the molecular components within the cell, and describing the integrated function of these molecular components will be a challenging task (Edwards and Palsson, 2000). Ideally, one would like to use kinetic modelling to characterize fully the mechanics of each enzymatic reaction, in terms of how changes in metabolite concentrations affect local reaction rates. However, a considerable amount of data is required to parameterize even a small mechanistic model; the determination of such parameters is costly and time-consuming, and moreover many may be difficult or impossible to determine experimentally. Instead, genome-scale metabolic modelling has relied on constraint-based analysis (Beard et al., 2002, Covert et al., 2003, Kim et al., 2008, Price et al., 2004), which uses physicochemical constraints such as mass balance, energy balance, and flux limitations to describe the potential behaviour of an organism. In particular, flux balance analysis (FBA) (Kauffman et al., 2003) highlights the most effective and efficient paths through the network in order to achieve a particular objective function, such as the maximization of biomass or ATP production. The key benefit of FBA and similar techniques lies in the minimal amount of biological knowledge and data required to make quantitative inferences about network behaviour (Bonarius et al., 1997).

In general there is degeneracy in stoichiometric networks, leading to an infinite number of flux distributions satisfying the given optimality criteria. It is a great focus of the FBA community to reduce the size of this optimal flux space, through imposing tighter limits on each flux based, for instance, on measurements of intracellular fluxes with nuclear magnetic resonance, or other additional constraints (Kim et al., 2008). Despite use of these techniques, the resultant solution remains a space of fluxes, rather than a unique flux. In a recent paper, Price et al. (2004) do not view this as a problem, stating

The mathematical notion of equivalent optimal states is coincident with the biological notion of silent phenotypes. This property distinguishes in silico modelling in biology from that in the physicochemical sciences where a single and unique solution is sought.

While it is true that equivalent solutions may be important in representing biological reality, at the same time we also believe that the ability to state a well-defined, single solution that is representative of the space of all possible fluxes would be of great benefit to the modelling community. From a practical perspective, researchers often do quote a single flux that results from their analysis; since this is chosen randomly from a large space of possible fluxes, their results are irreproducible and entirely dependent on the software or algorithm used to solve the linear programming (LP) problem. From a scientific perspective, the ability to extract a representative solution from the space would allow us to perform typical analyses, for example correlating flux with associated protein levels.

In this paper we show that, by examining the geometry underpinning the problem, we may find a well-defined, unique flux through application of a well-known and fundamental mathematical theorem. Unfortunately, this method proves impracticable for large, genome-scale models. Thus, we define a second method that gives similar results to the first, and moreover is computationally feasible. The algorithm is applied to a range of existing genome-scale models.

Section snippets

Methods

The problems encountered when performing constraint-based analysis, and the methods we propose to overcome these problems are best described through referral to a simple example. Consider the small metabolic network presented in Fig. 1, whose fluxes we wish to estimate. This may be addressed through appealing to FBA. This method allows us to identify the optimal path through the network in order to achieve a particular objective; quantitative predictions will then hold true if the cell

Results and discussion

We run the algorithm presented in method two on genome-scale metabolic models for four organisms to test its efficacy. The four models—with known biomass representations and nutrient limitations—were imported to MATLAB® using libSBML (Bornstein et al., 2008) and solved with GLPK (GNU Linear Programming Kit) (Makhorin, 2001).

The results may be found in Table 1. The maximal biomass yield was found and flux variability analysis performed to find the number of non-fixed fluxes (presented as

Conclusion

The algorithm presented in this paper allows researchers from the flux balance community to choose a unique and well-defined flux from the space of all possible solutions. In turn, any results produced with the use of this algorithm will be fully reproducible and allow the exchange of typical solutions to models in a reproducible format.

From a biological perspective, the exact flux utilized by the cell will be dependent on a wide range of stimuli, and thus impossible to predict from network

Acknowledgements

We acknowledge the support of the BBSRC/EPSRC Grant BB/C008219/1 “The Manchester Centre for Integrative Systems Biology (MCISB)”. We thank Nils Blüthgen, Douglas Kell and Michael Howard for their useful input.

References (26)

  • N.C. Duarte et al.

    Global reconstruction of the human metabolic network based on genomic and bibliomic data

    Proc. Natl. Acad. Sci. USA

    (2007)
  • N.C. Duarte et al.

    Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model

    Genome Res.

    (2004)
  • J.S. Edwards et al.

    The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities

    Proc. Natl. Acad. Sci. USA

    (2000)
  • Cited by (62)

    View all citing articles on Scopus
    View full text