Multi-criterion optimization for genetic network modeling
Introduction
One of the biggest challenges in the age of “Functional Genomics” is to infer regulatory interactions between genes from time-course micro-array data. Currently, many different modeling approaches have been suggested [8]. The performance of practically all approaches is severely hampered by the dimensionality problem, i.e. the fact that data sets contain relatively few time-points (micro-arrays) compared to the number of genes. Due to the high costs associated with each micro-array measurement, the number of arrays will always remain relatively small compared to the number of genes. Therefore, it is imperative to employ additional (biological) knowledge about genetic networks, such as limited connectivity, redundancy, stability and robustness, to sensibly constrain the modeling process.
In previous work [15], [9], we showed through comparative studies that models that do not apply constraints, perform poorly when faced with realistic situations, i.e. data with a limited number of time-points and with substantial measurement noise. Recently, we have shown that enforcing robustness dramatically improves the performance [10], [12] and that by explicitly searching for limited connectivity correct interactions can be found more reliably [11]. In a preliminary study [13], we showed that each of these two approaches effectively solves a different bi-criterion optimization problem, i.e. one optimizes data-fit (predictive power) together with robustness and the other optimizes data-fit together with connectivity. The main focus of this paper lies on how to simultaneously apply more than one (biologically inspired) constraint. For this purpose, the problem is viewed as an multi-criterion optimization problem (MCOP).
First, the concepts associated with learning genetic network models and multi-criterion optimization are introduced. The multi-criterion approach to genetic network modeling is illustrated on a problem where three criteria are optimized, i.e. the data-fit is optimized together with both robustness as well as connectivity. Models are tested and compared on artificially generated data and the applicability of the approach is shown on a small subset of the Spellman [6] data-set. Some directions for future work are discussed in the final section.
Section snippets
Learning genetic network models
In this section, the representation of gene expression data and genetic network models is defined, we describe how genetic interactions can be learned from the data and introduce some of the constraints that could be applied.
Data: Let gi(t) represent the gene expression level of gene i at time-point t and let N genes be measured. Then, an expression-state is represented by a column vector of all gene expression values measured at a certain time instant, i.e. . A typical
Solving the multi-criterion optimization problem
Because of the limited amount of information contained in the data, it is not sufficient to just minimize the MSE of the model to obtain reliable estimates of the interactions. In fact, it is necessary to employ the general properties that network solutions should posses, as a way to distinguish more probable network solutions from less probable ones. Thus, a gene regulation matrix is found by simultaneously optimizing fMSE with other (conflicting) criteria. This problem results in a
Mean squared error versus robustness and connectivity
To illustrate the applicability of MCOP for genetic network modeling, an optimization problem with three criteria is considered which is complicated by the fact that one criterion is discrete. More precisely, we search for the optimal GRM that minimizes both the corresponding MSE (data-fit), the sum of squared weights (robustness) and the number of non-zero weights (limited connectivity):
Pareto-front border: In this three-dimensional attribute space the
Experiments
To examine the performance of the Pareto-front estimators more thoroughly, each method was applied on artificially generated data-sets of varying size (5–30 genes), generated from random sparse GRMs with (fixed) connectivity of four. Performances at each condition are averages over 40 different data-sets.
Pareto estimation performance: Fig. 4 shows in each plot one of four different performance measures for each of the different estimators7
Discussion
We have analyzed the problem of how to simultaneously incorporate different network properties to improve the performance of genetic network modeling as a multi-criterion optimization problem. By adding constraints we restrict the search space such that more plausible solutions are returned. However, with each additional constraint, an additional trade-off needs to be determined between criteria. If such a trade-off needs to be determined automatically, the complexity of the problem is not
Acknowledgements
This work was funded by the DIOC-IMDS program of the Delft University of Technology.
References (15)
- et al.
Dissecting the regulatory circuitry of a eukaryotic genome
Cell
(1998) - J. Andersson, A survey of multiobjective optimization in engineering design, Technical Report LiTH-IKP-R-1097,...
- et al.
The hardwiring of development: Organization and function of genomic regulatory systems
Development
(1997) - P. D'Haeseleer, X. Wen, S. Fuhrman, R. Somogyi, Linear modeling of mrna expression levels during cns development and...
- et al.
Ridge regression: applications to nonorthogonal problems
Technometrics
(1970) - P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, Futcher, Comprehensive...
- D. Thieffry, R. Thomas, Qualitative analysis of gene networks, in: Pacific Symposium on Biocomputing ’98, Vol. 3, World...
Cited by (33)
Search of optimal locations for species- or group-specific primer design in DNA sequences: Non-dominated Sorting Genetic Algorithm II (NSGA-II)
2015, Ecological InformaticsCitation Excerpt :Thus, we expect that those two simple objective functions that manipulate binary data can increase efficiency of pre-screening process. There are many works which use the benefits (compared to single-objective methods) of MOO algorithms in bioinformatics and computational biology (Curteanu et al., 2006; Deb and Raji Reddy, 2003; Handl et al., 2007; Mandal et al., 2005; Shin et al., 2006, 2009; van Someren et al., 2003). Even though applying MOO algorithms to bioinformatics is popular recently, our findings that NSGA-II is applicable to species-specific primer design supports the utility of NSGA-II for other known DNA sequences.
Soft computing methods to predict gene regulatory networks: An integrative approach on time-series gene expression data
2008, Applied Soft Computing JournalA Hybrid Multiobjective Memetic Metaheuristic for Multiple Sequence Alignment
2016, IEEE Transactions on Evolutionary ComputationA Multi-State Optimization Framework for Parameter Estimation in Biological Systems
2016, IEEE/ACM Transactions on Computational Biology and BioinformaticsIdentification of target system operations: The practice of determining the optimal control
2015, Eastern-European Journal of Enterprise Technologies