Elsevier

Signal Processing

Volume 83, Issue 4, April 2003, Pages 763-775
Signal Processing

Multi-criterion optimization for genetic network modeling

https://doi.org/10.1016/S0165-1684(02)00473-5Get rights and content

Abstract

A major problem associated with the reverse engineering of genetic networks from micro-array data is how to reliably find genetic interactions when faced with a relatively small number of arrays compared to the number of genes. To cope with this dimensionality problem, it is imperative to employ additional (biological) knowledge about real genetic networks, such as limited connectivity, redundancy, stability and robustness, to sensibly constrain the modeling process. In previous work (Proceedings of the 2001 IEEE—EURASIP Workshop on Nonlinear Signal and Image Processing, Baltimore, MA, June 2001; Proceedings of the Second International Conference on Systems Biology, Pasadena, CA, November 2, pp. 222–230), we have shown that by applying single constraints, the inference of genetic interactions under realistic conditions can be significantly improved. Recently (Proceedings of the SPIE, San Jose, CA, January 2002), we have made a preliminary study on how these approaches based on single constraints solve the underlying bi-criterion optimization problem. In this paper, we study the problem of how multiple constraints can be combined by formulating genetic network modeling as a multi-criterion optimization problem. Results are shown on artificial as well as on a real data example.

Introduction

One of the biggest challenges in the age of “Functional Genomics” is to infer regulatory interactions between genes from time-course micro-array data. Currently, many different modeling approaches have been suggested [8]. The performance of practically all approaches is severely hampered by the dimensionality problem, i.e. the fact that data sets contain relatively few time-points (micro-arrays) compared to the number of genes. Due to the high costs associated with each micro-array measurement, the number of arrays will always remain relatively small compared to the number of genes. Therefore, it is imperative to employ additional (biological) knowledge about genetic networks, such as limited connectivity, redundancy, stability and robustness, to sensibly constrain the modeling process.

In previous work [15], [9], we showed through comparative studies that models that do not apply constraints, perform poorly when faced with realistic situations, i.e. data with a limited number of time-points and with substantial measurement noise. Recently, we have shown that enforcing robustness dramatically improves the performance [10], [12] and that by explicitly searching for limited connectivity correct interactions can be found more reliably [11]. In a preliminary study [13], we showed that each of these two approaches effectively solves a different bi-criterion optimization problem, i.e. one optimizes data-fit (predictive power) together with robustness and the other optimizes data-fit together with connectivity. The main focus of this paper lies on how to simultaneously apply more than one (biologically inspired) constraint. For this purpose, the problem is viewed as an multi-criterion optimization problem (MCOP).

First, the concepts associated with learning genetic network models and multi-criterion optimization are introduced. The multi-criterion approach to genetic network modeling is illustrated on a problem where three criteria are optimized, i.e. the data-fit is optimized together with both robustness as well as connectivity. Models are tested and compared on artificially generated data and the applicability of the approach is shown on a small subset of the Spellman [6] data-set. Some directions for future work are discussed in the final section.

Section snippets

Learning genetic network models

In this section, the representation of gene expression data and genetic network models is defined, we describe how genetic interactions can be learned from the data and introduce some of the constraints that could be applied.

Data: Let gi(t) represent the gene expression level of gene i at time-point t and let N genes be measured. Then, an expression-state is represented by a column vector of all gene expression values measured at a certain time instant, i.e. g(t)=[g1(t),…,gN(t)]T. A typical

Solving the multi-criterion optimization problem

Because of the limited amount of information contained in the data, it is not sufficient to just minimize the MSE of the model to obtain reliable estimates of the interactions. In fact, it is necessary to employ the general properties that network solutions should posses, as a way to distinguish more probable network solutions from less probable ones. Thus, a gene regulation matrix is found by simultaneously optimizing fMSE with other (conflicting) criteria. This problem results in a

Mean squared error versus robustness and connectivity

To illustrate the applicability of MCOP for genetic network modeling, an optimization problem with three criteria is considered which is complicated by the fact that one criterion is discrete. More precisely, we search for the optimal GRM that minimizes both the corresponding MSE (data-fit), the sum of squared weights (robustness) and the number of non-zero weights (limited connectivity):ŵi=argminwi(fMSE(wi),fSSW(wi),fC(wi)).

Pareto-front border: In this three-dimensional attribute space the

Experiments

To examine the performance of the Pareto-front estimators more thoroughly, each method was applied on artificially generated data-sets of varying size (5–30 genes), generated from random sparse GRMs with (fixed) connectivity of four. Performances at each condition are averages over 40 different data-sets.

Pareto estimation performance: Fig. 4 shows in each plot one of four different performance measures for each of the different estimators7

Discussion

We have analyzed the problem of how to simultaneously incorporate different network properties to improve the performance of genetic network modeling as a multi-criterion optimization problem. By adding constraints we restrict the search space such that more plausible solutions are returned. However, with each additional constraint, an additional trade-off needs to be determined between criteria. If such a trade-off needs to be determined automatically, the complexity of the problem is not

Acknowledgements

This work was funded by the DIOC-IMDS program of the Delft University of Technology.

References (15)

  • F.C. Holstege et al.

    Dissecting the regulatory circuitry of a eukaryotic genome

    Cell

    (1998)
  • J. Andersson, A survey of multiobjective optimization in engineering design, Technical Report LiTH-IKP-R-1097,...
  • A. Arnone et al.

    The hardwiring of development: Organization and function of genomic regulatory systems

    Development

    (1997)
  • P. D'Haeseleer, X. Wen, S. Fuhrman, R. Somogyi, Linear modeling of mrna expression levels during cns development and...
  • A.E. Hoerl et al.

    Ridge regression: applications to nonorthogonal problems

    Technometrics

    (1970)
  • P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, Futcher, Comprehensive...
  • D. Thieffry, R. Thomas, Qualitative analysis of gene networks, in: Pacific Symposium on Biocomputing ’98, Vol. 3, World...
There are more references available in the full text version of this article.

Cited by (33)

  • Search of optimal locations for species- or group-specific primer design in DNA sequences: Non-dominated Sorting Genetic Algorithm II (NSGA-II)

    2015, Ecological Informatics
    Citation Excerpt :

    Thus, we expect that those two simple objective functions that manipulate binary data can increase efficiency of pre-screening process. There are many works which use the benefits (compared to single-objective methods) of MOO algorithms in bioinformatics and computational biology (Curteanu et al., 2006; Deb and Raji Reddy, 2003; Handl et al., 2007; Mandal et al., 2005; Shin et al., 2006, 2009; van Someren et al., 2003). Even though applying MOO algorithms to bioinformatics is popular recently, our findings that NSGA-II is applicable to species-specific primer design supports the utility of NSGA-II for other known DNA sequences.

  • A Multi-State Optimization Framework for Parameter Estimation in Biological Systems

    2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics
View all citing articles on Scopus
View full text