A hybrid approach for efficient and robust parameter estimation in biochemical pathways
Introduction
Building sound dynamic models of biological systems is a key step towards the development of predictive models for cells or whole organisms. Such models can be regarded as the keystones of Systems Biology (Wolkenhauer, 2001, You, 2004), ultimately providing scientific explanations of the biological phenomena. Relevant examples of their usefulness can be found in, e.g. metabolomics (Kell, 2004, Goodacre et al., 2004), or in genome expression and regulation (De Jong, 2002, Wolkenhauer, 2002, Wolkenhauer et al., 2003). Since the amount and quality of experimental data continue to increase rapidly, there is great need of sound model building methods which can cope with this complexity.
In this work, we consider deterministic, non-linear dynamic models of biochemical pathways, i.e. those described by deterministic ordinary differential equations (ODEs), differential-algebraic equations (DAEs) or partial differential equations (PDEs). In the case of ODEs, a popular statement in the so-called state-space formulation:where x is the vector of Nx state variables and p is the vector of Np model parameters. Note that f specifies the model, u the vector of inputs (i.e. for a particular experiment) and y specifies the vector of Ny measured states. An experiment is specified by the initial conditions x(0), the inputs u chosen from among some set of possible inputs U and the observations y. Note that the inputs can be time dependent. Although in the remaining of this paper we will consider the above formulation, it should be noted that the parameter estimation methodology presented below can be extended to cover other model types (e.g. difference equations, stochastic differential equations, etc.) although these cases will not be treated explicitly here.
Model building can be regarded as a cycle: starting from a goal definition (purpose of the model), and some a priori knowledge (i.e. preliminary data, basic analysis and initial hypothesis), a model framework is chosen and a model structure is proposed. From the available data, parameter estimation is then performed, leading to a first working model. This initial model must be validated with new experiments, which in most cases will reveal a number of deficiencies. Thus, a new model structure and/or a new experimental design must be planned, and the process is repeated iteratively until the validation step is considered satisfactory. This is the typical model building cycle as considered in the area of systems identification (Ljung, 1999, Walter and Pronzato, 1997).
In this work, we will focus on the steps of parameter estimation and optimal experimental design, assuming the structure of the non-linear dynamic model as given. Parameter estimation (also known as the inverse problem, or model calibration) aims to find the parameters of the model which give the best fit to a set of experimental data. Optimal experimental design aims to devise the optimal dynamic experiments which provide the maximum information content for subsequent non-linear model identification, estimation and/or discrimination.
These topics are receiving great attention in the recent Systems Biology literature. For example, experimental design and optimal sampling for parameter estimation has been considered by Cho et al. (2003), Faller et al. (2003) and Kutalik et al. (2004). The important problem of model discrimination and its relation with parameter estimation has been studied by Swameye et al. (2003) and Kremling et al. (2004), while the key issue of identifiability checking has been illustrated by Zak et al. (2003). In the case of parameter estimation, Mendes and Kell (1998) and Moles et al. (2003b) have highlighted the need of global optimisation techniques in order to avoid the spurious solutions often found by traditional gradient-based local methods. In particular, Moles et al. (2003b) demonstrated the challenging nature of inverse problems considering a benchmark three-step pathway. These authors concluded that only a certain type of stochastic global optimisation method, Evolution Strategies (ES), was able to solve it successfully, although at a rather large computational cost.
In this new contribution, which is an extension of the results presented by Rodríguez et al. (2004), our main objective has been to reduce such computational cost while preserving robustness. In addition, we have also considered other issues (handling of noise, experimental design) not covered by Moles et al. (2003b). As a result, we present a new integrated methodology with a number of significant advantages and improvements:
- •
reduced computation time (by one order of magnitude) by means of a hybrid stochastic–deterministic optimisation method, which increases efficiency while guaranteeing robustness (i.e. reliability and accuracy of the parameter estimation);
- •
adequate handling of measurement noise (errors) and partial observations;
- •
automatic testing of identifiability of the model (both local and practical);
- •
evaluation of the information content of the experiments via the Fisher information matrix (FIM), with subsequent application to the design of new optimal experiments through dynamic optimisation (extending the procedure outlined by Banga et al., 2002).
This paper is structured as follows: in the next section, we describe the class of parameter estimation problems considered here, with an overview of current solution methods and possible pitfalls and difficulties experiences by these methods. Next, we provide a motivation for using global optimisation methods, focusing on the need of hybrid approaches, and presenting a novel hybrid for the parameter estimation problem. We then provide the motivation for performing identifiability analysis, giving details of a numerical procedure to be coupled with the hybrid. Similarly, in the following section, the need of optimal experimental design is justified, detailing a numerical procedure to be integrated with the hybrid as a post-processing step. Finally, these methods are applied to a challenging benchmark problem, arriving to a set of conclusions in the last section. Additional details and data are given in two appendixes.
Section snippets
Parameter estimation in non-linear dynamic models
Given a model structure and a set of experimental data, the objective of parameter estimation is to calibrate the model (looking for parameters which cannot be measured directly) so as to reproduce the experimental results in the best possible way. This calibration is performed by minimizing a cost function which measures the goodness of the fit.
In other words, once the characterization of the model has been performed, the identification problem is stated as the optimisation of a scalar cost
Global optimisation: hybrid methods
When the traditional L–M or G–N methods are used, it is sometimes argued that, in order to avoid convergence to local solutions, multiple runs (from, e.g. random guesses inside the parameter space) should be carried out, subsequently identifying the best solution attained as the global one. This is equivalent to the so-called plain multi-start approach in global optimisation, which has been shown to fail even in relatively simple cases.
Moles et al. (2003b) provide a discussion on this issue,
Identifiability analysis
The problem of parameter estimation, i.e. determining the parameters of a system from input–output data, is often called the identification problem. This is just one aspect of a larger problem, the inverse problem, which includes a priori structural identifiability, a posteriori or practical identifiability and parameter identification (Audoly et al., 2001).
The a priori structural identifiability problem reads: can we, under the ideal conditions of noise-free observations and error-free model
Optimal experimental design
Performing experiments to obtain a rich enough set of experimental data is a costly and time-consuming activity. The purpose of optimal experimental design (OED) is to devise the necessary dynamic experiments in such a way that the parameters are estimated from the resulting experimental data with the best possible statistical quality, which is usually a measure of the accuracy and/or decorrelation of the estimated parameters.
Mathematically, the OED problem can be formulated as a dynamic
Problem statement
We have considered the challenging benchmark problem recently presented by Mendes (2001) and Moles et al. (2003). The detailed statement is presented in Appendix A. For the sake of brevity, all the additional needed data (including the experimental data set) for solving this problem under the same conditions are not given here, but can be found in electronic format at http://www.iim.csic.es/∼julio/GR03_statement.txt.
All the computations reported here were performed using a PC/Pentium 4 (1.8 GHz,
Conclusions
In this contribution, we have considered the parameter estimation problem for non-linear dynamic models of biochemical pathways, together with procedures for identifiability checking and generation of new optimal experimental designs. Traditional (gradient-based) local methods for data fitting in non-linear dynamic systems can suffer from slow and/or local convergence, among other problems. However, this is frequently ignored, potentially leading to wrong conclusions about, e.g. the validity of
Acknowledgements
Authors MRF and JRB thank the Spanish Ministry of Science and Technology (MCyT project AGL2001-2610-C02-02) and Xunta de Galicia (grant PGIDIT02PXIC40211PN) for financial support.
References (57)
- et al.
Global identifiability of the parameters of nonlinear-systems with specified inputs: a comparison of methods
Math. Biosci.
(1990) - et al.
Genetic and Nelder–Mead algorithms hybridized for a more accurate global optimization of continuous multi-minima functions
Eur. J. Operational Res.
(2003) - et al.
Metabolomics by numbers: acquiring and understanding global metabolite data
Trends Biotechnol.
(2004) - et al.
Numerical parameter identifiability and estimability: Integrating identifiability, estimability, and optimal sampling desing
Math. Biosci.
(1985) Metabolomics and systems biology: making sense of the soup
Curr. Opin. Microbiol.
(2004)- et al.
Hybrid global optimization algorithms for protein structure prediction: alternating hybrids
Biophys. J.
(2003) - et al.
Optimal sampling time selection for parameter estimation in dynamic pathway modeling
BioSystems
(2004) - et al.
On global identifiability for arbitrary model parametrizations
Automatica
(1994) - et al.
Confidence regions of estimated parameters for ecological systems
Ecol. Model.
(2003) - et al.
Integrated process design and control via global optimization: a wastewater treatment plant case study
Chem. Eng. Res. Des.
(2003)
System identifiability based on the power series expansion of the solution
Math. Biosci.
A hybrid algorithm for identifying global and local minima when optimizing functions with many minima
Eur. J. Operational Res.
Similarity transformation approach to structural identifiability of nonlinear models
Math. Biosci.
Mathematical modelling in the post-genome era: understanding genome expression and regulation—a system theoretic approach
Biosystems
Generalized pattern searches with derivative information
Math. Program.
Global identifiability of nonlinear models of biological systems
IEEE Trans. Biomed. Eng.
Dynamic optimization of bioprocesses: deterministic and stochastic strategies
Stochastic dynamic optimization of batch and semicontinuous bioprocesses
Biotechnol. Prog.
Global optimization of chemical processes using stochastic algorithms
Global optimization of bioprocesses using stochastic and hybrid methods
Computation of optimal identification experiments for nonlinear dynamic process models: an stochastic global optimization approach
Ind. Eng. Chem. Res.
Recent advances in parameter identification for ordinary differential equations
Algorithm 717, subroutines for maximum likelihood and quasi-likelihood estimation of parameters in nonlinear regression models
ACM Trans. Math. Software
A hybrid method for the optimal control of chemical processes
Experimental design in systems biology, based on parameter sensitivity analysis using a Monte Carlo method: a case study for the TNFα mediated NF-κB signal transduction pathway
Simul. Trans. Soc. Model. Simul. Int.
Modeling and simulation of genetic regulatory systems: a literature review
J. Comput. Biol.
Algorithm 573, NL2SOL—an adaptive nonlinear least-squares algorithm
ACM Trans. Math. Software
Global optimization for the parameter estimation of differential-algebraic systems
Ind. Eng. Chem. Res.
Cited by (237)
Identification of parameters for large-scale kinetic models
2021, Journal of Computational PhysicsLearning nonlinear turbulent dynamics from partial observations via analytically solvable conditional statistics
2020, Journal of Computational PhysicsCitation Excerpt :On the other hand, regarding the model parameters as augmented state variables, ensemble Kalman filter and particle filter can be applied for online parameter estimation [30–32]. In addition, finding the solutions associated with the maximum a posteriori or maximum likelihood estimates [33,34,30] with certain numerical approximations is also a widely used method in coping with many nonlinear problems. Despite the success and recent advance of these sampling or numerical methods in many applications, incorporating closed analytical formulae into the calculation of the objective function is still highly preferable for learning complex nonlinear dynamics with strong non-Gaussian features.
Parameter estimation and sensitivity analysis for dynamic modelling and simulation of beer fermentation
2020, Computers and Chemical EngineeringGlobal optimization using Gaussian processes to estimate biological parameters from image data
2019, Journal of Theoretical BiologyA new efficient parameter estimation algorithm for high-dimensional complex nonlinear turbulent dynamical systems with partial observations
2019, Journal of Computational PhysicsSystems biology informed deep learning for inferring parameters and hidden dynamics
2020, PLoS Computational Biology