Elsevier

Biosystems

Volume 83, Issues 2–3, February–March 2006, Pages 248-265
Biosystems

A hybrid approach for efficient and robust parameter estimation in biochemical pathways

https://doi.org/10.1016/j.biosystems.2005.06.016Get rights and content

Abstract

Developing suitable dynamic models of biochemical pathways is a key issue in Systems Biology. Predictive models for cells or whole organisms could ultimately lead to model-based predictive and/or preventive medicine. Parameter estimation (i.e. model calibration) in these dynamic models is therefore a critical problem. In a recent contribution [Moles, C.G., Mendes, P., Banga, J.R., 2003b. Parameter estimation in biochemical pathways: a comparison of global optimisation methods. Genome Res. 13, 2467–2474], the challenging nature of such inverse problems was highlighted considering a benchmark problem, and concluding that only a certain type of stochastic global optimisation method, Evolution Strategies (ES), was able to solve it successfully, although at a rather large computational cost. In this new contribution, we present a new integrated optimisation methodology with a number of very significant improvements: (i) computation time is reduced by one order of magnitude by means of a hybrid method which increases efficiency while guaranteeing robustness, (ii) measurement noise (errors) and partial observations are handled adequately, (iii) automatic testing of identifiability of the model (both local and practical) is included and (iv) the information content of the experiments is evaluated via the Fisher information matrix, with subsequent application to design of new optimal experiments through dynamic optimisation.

Introduction

Building sound dynamic models of biological systems is a key step towards the development of predictive models for cells or whole organisms. Such models can be regarded as the keystones of Systems Biology (Wolkenhauer, 2001, You, 2004), ultimately providing scientific explanations of the biological phenomena. Relevant examples of their usefulness can be found in, e.g. metabolomics (Kell, 2004, Goodacre et al., 2004), or in genome expression and regulation (De Jong, 2002, Wolkenhauer, 2002, Wolkenhauer et al., 2003). Since the amount and quality of experimental data continue to increase rapidly, there is great need of sound model building methods which can cope with this complexity.

In this work, we consider deterministic, non-linear dynamic models of biochemical pathways, i.e. those described by deterministic ordinary differential equations (ODEs), differential-algebraic equations (DAEs) or partial differential equations (PDEs). In the case of ODEs, a popular statement in the so-called state-space formulation:x˙(p,t)=f[x(p,t),u(t),p],x(0)=x0,y(p,t)=g[x(p,t),u(p,t),p]where x is the vector of Nx state variables and p is the vector of Np model parameters. Note that f specifies the model, u the vector of inputs (i.e. for a particular experiment) and y specifies the vector of Ny measured states. An experiment is specified by the initial conditions x(0), the inputs u chosen from among some set of possible inputs U and the observations y. Note that the inputs can be time dependent. Although in the remaining of this paper we will consider the above formulation, it should be noted that the parameter estimation methodology presented below can be extended to cover other model types (e.g. difference equations, stochastic differential equations, etc.) although these cases will not be treated explicitly here.

Model building can be regarded as a cycle: starting from a goal definition (purpose of the model), and some a priori knowledge (i.e. preliminary data, basic analysis and initial hypothesis), a model framework is chosen and a model structure is proposed. From the available data, parameter estimation is then performed, leading to a first working model. This initial model must be validated with new experiments, which in most cases will reveal a number of deficiencies. Thus, a new model structure and/or a new experimental design must be planned, and the process is repeated iteratively until the validation step is considered satisfactory. This is the typical model building cycle as considered in the area of systems identification (Ljung, 1999, Walter and Pronzato, 1997).

In this work, we will focus on the steps of parameter estimation and optimal experimental design, assuming the structure of the non-linear dynamic model as given. Parameter estimation (also known as the inverse problem, or model calibration) aims to find the parameters of the model which give the best fit to a set of experimental data. Optimal experimental design aims to devise the optimal dynamic experiments which provide the maximum information content for subsequent non-linear model identification, estimation and/or discrimination.

These topics are receiving great attention in the recent Systems Biology literature. For example, experimental design and optimal sampling for parameter estimation has been considered by Cho et al. (2003), Faller et al. (2003) and Kutalik et al. (2004). The important problem of model discrimination and its relation with parameter estimation has been studied by Swameye et al. (2003) and Kremling et al. (2004), while the key issue of identifiability checking has been illustrated by Zak et al. (2003). In the case of parameter estimation, Mendes and Kell (1998) and Moles et al. (2003b) have highlighted the need of global optimisation techniques in order to avoid the spurious solutions often found by traditional gradient-based local methods. In particular, Moles et al. (2003b) demonstrated the challenging nature of inverse problems considering a benchmark three-step pathway. These authors concluded that only a certain type of stochastic global optimisation method, Evolution Strategies (ES), was able to solve it successfully, although at a rather large computational cost.

In this new contribution, which is an extension of the results presented by Rodríguez et al. (2004), our main objective has been to reduce such computational cost while preserving robustness. In addition, we have also considered other issues (handling of noise, experimental design) not covered by Moles et al. (2003b). As a result, we present a new integrated methodology with a number of significant advantages and improvements:

  • reduced computation time (by one order of magnitude) by means of a hybrid stochastic–deterministic optimisation method, which increases efficiency while guaranteeing robustness (i.e. reliability and accuracy of the parameter estimation);

  • adequate handling of measurement noise (errors) and partial observations;

  • automatic testing of identifiability of the model (both local and practical);

  • evaluation of the information content of the experiments via the Fisher information matrix (FIM), with subsequent application to the design of new optimal experiments through dynamic optimisation (extending the procedure outlined by Banga et al., 2002).

This paper is structured as follows: in the next section, we describe the class of parameter estimation problems considered here, with an overview of current solution methods and possible pitfalls and difficulties experiences by these methods. Next, we provide a motivation for using global optimisation methods, focusing on the need of hybrid approaches, and presenting a novel hybrid for the parameter estimation problem. We then provide the motivation for performing identifiability analysis, giving details of a numerical procedure to be coupled with the hybrid. Similarly, in the following section, the need of optimal experimental design is justified, detailing a numerical procedure to be integrated with the hybrid as a post-processing step. Finally, these methods are applied to a challenging benchmark problem, arriving to a set of conclusions in the last section. Additional details and data are given in two appendixes.

Section snippets

Parameter estimation in non-linear dynamic models

Given a model structure and a set of experimental data, the objective of parameter estimation is to calibrate the model (looking for parameters which cannot be measured directly) so as to reproduce the experimental results in the best possible way. This calibration is performed by minimizing a cost function which measures the goodness of the fit.

In other words, once the characterization of the model has been performed, the identification problem is stated as the optimisation of a scalar cost

Global optimisation: hybrid methods

When the traditional L–M or G–N methods are used, it is sometimes argued that, in order to avoid convergence to local solutions, multiple runs (from, e.g. random guesses inside the parameter space) should be carried out, subsequently identifying the best solution attained as the global one. This is equivalent to the so-called plain multi-start approach in global optimisation, which has been shown to fail even in relatively simple cases.

Moles et al. (2003b) provide a discussion on this issue,

Identifiability analysis

The problem of parameter estimation, i.e. determining the parameters of a system from input–output data, is often called the identification problem. This is just one aspect of a larger problem, the inverse problem, which includes a priori structural identifiability, a posteriori or practical identifiability and parameter identification (Audoly et al., 2001).

The a priori structural identifiability problem reads: can we, under the ideal conditions of noise-free observations and error-free model

Optimal experimental design

Performing experiments to obtain a rich enough set of experimental data is a costly and time-consuming activity. The purpose of optimal experimental design (OED) is to devise the necessary dynamic experiments in such a way that the parameters are estimated from the resulting experimental data with the best possible statistical quality, which is usually a measure of the accuracy and/or decorrelation of the estimated parameters.

Mathematically, the OED problem can be formulated as a dynamic

Problem statement

We have considered the challenging benchmark problem recently presented by Mendes (2001) and Moles et al. (2003). The detailed statement is presented in Appendix A. For the sake of brevity, all the additional needed data (including the experimental data set) for solving this problem under the same conditions are not given here, but can be found in electronic format at http://www.iim.csic.es/∼julio/GR03_statement.txt.

All the computations reported here were performed using a PC/Pentium 4 (1.8 GHz,

Conclusions

In this contribution, we have considered the parameter estimation problem for non-linear dynamic models of biochemical pathways, together with procedures for identifiability checking and generation of new optimal experimental designs. Traditional (gradient-based) local methods for data fitting in non-linear dynamic systems can suffer from slow and/or local convergence, among other problems. However, this is frequently ignored, potentially leading to wrong conclusions about, e.g. the validity of

Acknowledgements

Authors MRF and JRB thank the Spanish Ministry of Science and Technology (MCyT project AGL2001-2610-C02-02) and Xunta de Galicia (grant PGIDIT02PXIC40211PN) for financial support.

References (57)

  • H. Pohjanpalo

    System identifiability based on the power series expansion of the solution

    Math. Biosci.

    (1978)
  • S. Salhi et al.

    A hybrid algorithm for identifying global and local minima when optimizing functions with many minima

    Eur. J. Operational Res.

    (2004)
  • S. Vajda et al.

    Similarity transformation approach to structural identifiability of nonlinear models

    Math. Biosci.

    (1989)
  • O. Wolkenhauer

    Mathematical modelling in the post-genome era: understanding genome expression and regulation—a system theoretic approach

    Biosystems

    (2002)
  • M.A. Abramson et al.

    Generalized pattern searches with derivative information

    Math. Program.

    (2004)
  • S. Audoly et al.

    Global identifiability of nonlinear models of biological systems

    IEEE Trans. Biomed. Eng.

    (2001)
  • E. Balsa-Canto et al.

    Dynamic optimization of bioprocesses: deterministic and stochastic strategies

  • J.R. Banga et al.

    Stochastic dynamic optimization of batch and semicontinuous bioprocesses

    Biotechnol. Prog.

    (1997)
  • J.R. Banga et al.

    Global optimization of chemical processes using stochastic algorithms

  • J.R. Banga et al.

    Global optimization of bioprocesses using stochastic and hybrid methods

    (2003)
  • J.R. Banga et al.

    Computation of optimal identification experiments for nonlinear dynamic process models: an stochastic global optimization approach

    Ind. Eng. Chem. Res.

    (2002)
  • H. Bock

    Recent advances in parameter identification for ordinary differential equations

  • D.S. Bunch et al.

    Algorithm 717, subroutines for maximum likelihood and quasi-likelihood estimation of parameters in nonlinear regression models

    ACM Trans. Math. Software

    (1993)
  • E.F. Carrasco et al.

    A hybrid method for the optimal control of chemical processes

  • K.H. Cho et al.

    Experimental design in systems biology, based on parameter sensitivity analysis using a Monte Carlo method: a case study for the TNFα mediated NF-κB signal transduction pathway

    Simul. Trans. Soc. Model. Simul. Int.

    (2003)
  • H. De Jong

    Modeling and simulation of genetic regulatory systems: a literature review

    J. Comput. Biol.

    (2002)
  • J.E. Dennis et al.

    Algorithm 573, NL2SOL—an adaptive nonlinear least-squares algorithm

    ACM Trans. Math. Software

    (1981)
  • W.R. Esposito et al.

    Global optimization for the parameter estimation of differential-algebraic systems

    Ind. Eng. Chem. Res.

    (2000)
  • Cited by (237)

    • Learning nonlinear turbulent dynamics from partial observations via analytically solvable conditional statistics

      2020, Journal of Computational Physics
      Citation Excerpt :

      On the other hand, regarding the model parameters as augmented state variables, ensemble Kalman filter and particle filter can be applied for online parameter estimation [30–32]. In addition, finding the solutions associated with the maximum a posteriori or maximum likelihood estimates [33,34,30] with certain numerical approximations is also a widely used method in coping with many nonlinear problems. Despite the success and recent advance of these sampling or numerical methods in many applications, incorporating closed analytical formulae into the calculation of the objective function is still highly preferable for learning complex nonlinear dynamics with strong non-Gaussian features.

    View all citing articles on Scopus
    View full text