Multi-criterion optimization for genetic network modeling

doi:10.1016/S0165-1684(02)00473-5

Signal Processing

Volume 83, Issue 4, April 2003, Pages 763-775

https://doi.org/10.1016/S0165-1684(02)00473-5 Get rights and content

Abstract

A major problem associated with the reverse engineering of genetic networks from micro-array data is how to reliably find genetic interactions when faced with a relatively small number of arrays compared to the number of genes. To cope with this dimensionality problem, it is imperative to employ additional (biological) knowledge about real genetic networks, such as limited connectivity, redundancy, stability and robustness, to sensibly constrain the modeling process. In previous work (Proceedings of the 2001 IEEE—EURASIP Workshop on Nonlinear Signal and Image Processing, Baltimore, MA, June 2001; Proceedings of the Second International Conference on Systems Biology, Pasadena, CA, November 2, pp. 222–230), we have shown that by applying single constraints, the inference of genetic interactions under realistic conditions can be significantly improved. Recently (Proceedings of the SPIE, San Jose, CA, January 2002), we have made a preliminary study on how these approaches based on single constraints solve the underlying bi-criterion optimization problem. In this paper, we study the problem of how multiple constraints can be combined by formulating genetic network modeling as a multi-criterion optimization problem. Results are shown on artificial as well as on a real data example.

Introduction

One of the biggest challenges in the age of “Functional Genomics” is to infer regulatory interactions between genes from time-course micro-array data. Currently, many different modeling approaches have been suggested [8]. The performance of practically all approaches is severely hampered by the dimensionality problem, i.e. the fact that data sets contain relatively few time-points (micro-arrays) compared to the number of genes. Due to the high costs associated with each micro-array measurement, the number of arrays will always remain relatively small compared to the number of genes. Therefore, it is imperative to employ additional (biological) knowledge about genetic networks, such as limited connectivity, redundancy, stability and robustness, to sensibly constrain the modeling process.

In previous work [15], [9], we showed through comparative studies that models that do not apply constraints, perform poorly when faced with realistic situations, i.e. data with a limited number of time-points and with substantial measurement noise. Recently, we have shown that enforcing robustness dramatically improves the performance [10], [12] and that by explicitly searching for limited connectivity correct interactions can be found more reliably [11]. In a preliminary study [13], we showed that each of these two approaches effectively solves a different bi-criterion optimization problem, i.e. one optimizes data-fit (predictive power) together with robustness and the other optimizes data-fit together with connectivity. The main focus of this paper lies on how to simultaneously apply more than one (biologically inspired) constraint. For this purpose, the problem is viewed as an multi-criterion optimization problem (MCOP).

First, the concepts associated with learning genetic network models and multi-criterion optimization are introduced. The multi-criterion approach to genetic network modeling is illustrated on a problem where three criteria are optimized, i.e. the data-fit is optimized together with both robustness as well as connectivity. Models are tested and compared on artificially generated data and the applicability of the approach is shown on a small subset of the Spellman [6] data-set. Some directions for future work are discussed in the final section.

Section snippets

Learning genetic network models

In this section, the representation of gene expression data and genetic network models is defined, we describe how genetic interactions can be learned from the data and introduce some of the constraints that could be applied.

Data: Let g_i(t) represent the gene expression level of gene i at time-point t and let N genes be measured. Then, an expression-state is represented by a column vector of all gene expression values measured at a certain time instant, i.e. $g (t)=[g_{1} (t),…,g_{N} (t)]^{T}$ . A typical

Solving the multi-criterion optimization problem

Because of the limited amount of information contained in the data, it is not sufficient to just minimize the MSE of the model to obtain reliable estimates of the interactions. In fact, it is necessary to employ the general properties that network solutions should posses, as a way to distinguish more probable network solutions from less probable ones. Thus, a gene regulation matrix is found by simultaneously optimizing f^MSE with other (conflicting) criteria. This problem results in a

Mean squared error versus robustness and connectivity

To illustrate the applicability of MCOP for genetic network modeling, an optimization problem with three criteria is considered which is complicated by the fact that one criterion is discrete. More precisely, we search for the optimal GRM that minimizes both the corresponding MSE (data-fit), the sum of squared weights (robustness) and the number of non-zero weights (limited connectivity): $w ̂_{i} = arg min w_{i} (f^{MSE} (w_{i}),f^{SSW} (w_{i}),f^{C} (w_{i})).$

Pareto-front border: In this three-dimensional attribute space the

Experiments

To examine the performance of the Pareto-front estimators more thoroughly, each method was applied on artificially generated data-sets of varying size (5–30 genes), generated from random sparse GRMs with (fixed) connectivity of four. Performances at each condition are averages over 40 different data-sets.

Pareto estimation performance: Fig. 4 shows in each plot one of four different performance measures for each of the different estimators⁷

Discussion

We have analyzed the problem of how to simultaneously incorporate different network properties to improve the performance of genetic network modeling as a multi-criterion optimization problem. By adding constraints we restrict the search space such that more plausible solutions are returned. However, with each additional constraint, an additional trade-off needs to be determined between criteria. If such a trade-off needs to be determined automatically, the complexity of the problem is not

Acknowledgements

This work was funded by the DIOC-IMDS program of the Delft University of Technology.

References (15)

F.C. Holstege et al.
Dissecting the regulatory circuitry of a eukaryotic genome
Cell
(1998)
J. Andersson, A survey of multiobjective optimization in engineering design, Technical Report LiTH-IKP-R-1097,...
A. Arnone et al.
The hardwiring of development: Organization and function of genomic regulatory systems
Development
(1997)
P. D'Haeseleer, X. Wen, S. Fuhrman, R. Somogyi, Linear modeling of mrna expression levels during cns development and...
A.E. Hoerl et al.
Ridge regression: applications to nonorthogonal problems
Technometrics
(1970)
P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, Futcher, Comprehensive...
D. Thieffry, R. Thomas, Qualitative analysis of gene networks, in: Pacific Symposium on Biocomputing ’98, Vol. 3, World...

There are more references available in the full text version of this article.

Cited by (33)

Search of optimal locations for species- or group-specific primer design in DNA sequences: Non-dominated Sorting Genetic Algorithm II (NSGA-II)
2015, Ecological Informatics
Citation Excerpt :
Thus, we expect that those two simple objective functions that manipulate binary data can increase efficiency of pre-screening process. There are many works which use the benefits (compared to single-objective methods) of MOO algorithms in bioinformatics and computational biology (Curteanu et al., 2006; Deb and Raji Reddy, 2003; Handl et al., 2007; Mandal et al., 2005; Shin et al., 2006, 2009; van Someren et al., 2003). Even though applying MOO algorithms to bioinformatics is popular recently, our findings that NSGA-II is applicable to species-specific primer design supports the utility of NSGA-II for other known DNA sequences.
In this study, we applied Non-dominated Sorting Genetic Algorithm II (NSGA-II) to the problem of identifying appropriate locations in cytochrome oxidase I (COI) gene for species- or group-specific primer design. As concerns about ecological management grow, quantification of predator-prey interaction has become a central issue in ecology. Recently, improved techniques have allowed for the extensive use of genomic DNA barcoding for qualitative analysis. However, quantification of DNA barcoding results is important and still needs to be resolved. Even though species- or group-specific primers that can be used for samples containing multiple species are difficult to design, species- or group-specific primers are a practical solution for quantification in the current era. To resolve this issue, we present here an efficient method for discovering the regions of a DNA sequence that have the highest inter-species variability by applying the NSGA-II algorithm. DNA sequence information for the COI gene region was obtained for 24 species from Jo et al. (2014). These sequences were transformed into binary data, either 0 (not different) or 1 (different), to reflect sequence conservation at each base for all combinations of two species. These data were analyzed by two objective functions, the average and standard deviation of the difference, which were used in the NSGA-II algorithm to search for appropriate DNA locations for species-specific primer design. The NSGA-II program identified four solutions (possible primer binding sites); consequently NSGA-II is believed to be a suitable algorithm for species-specific primer design, and is expected to make this difficult and time-consuming process more efficient.
Soft computing methods to predict gene regulatory networks: An integrative approach on time-series gene expression data
2008, Applied Soft Computing Journal
To unravel the controlling mechanisms of gene regulation, in this paper we present the application of sophisticated soft computing methods applied on an important problem from Bioinformatics—inferring gene regulatory networks (GRN) from time series gene expression microarray data. The main questions addressed in this paper are: (a) what knowledge can be derived from different models? (b) Would an integrated approach be more suitable to reveal about the controls of gene regulation? To reduce the number of genes in addition to apply the appropriate clustering methods, here we have also considered the valuable inputs from the biological experiments. To infer the GRN we have applied: three computational intelligence methods—Least Angle Regression (LARS), Expectation Maximization (EM) with Kalman Filter (KF), and an Evolving Fuzzy Neural Network (EFuNN). The methods are applied on time series microarray data of Schizosaccharomyces pombe yeast cell-cycle genes. Each method reveals some new aspects of the problem and it is agreed that to infer the GRN and to understand the processes behind gene regulation it is more suitable to adopt such integrative approach as ours through which some new knowledge is discovered, such as: using LARS we hypothesize—first, an exoglucanase gene exg1 is now implicated to be tied with MCB cluster regulation and second, a mannosidase with histone linked mannoses. A new quantitative prediction is that the time delay of the interaction between two genes seems to be approximately 30 min, or 0.17 cell cycles. Using the method of EM with KF, 25 cell cycle-regulated key genes were successfully clustered into three functionally co-regulated groups. We have also identified two genes namely Cdc22 and Suc22 that indeed interact with each other and are the potential candidates as a control in Ribonucleotide reductase (RNR) activity. Based on the EFuNN results and integrating knowledge from EM-KF method, we hypothesize that interaction between Suc22, Cdc22 and Mrc1 may be mediated by two other genes namely Cds1 and Spd1. The methods discussed and applied here can be used to analyze any kind of short time series of many interacting variables for inferring the regulatory network. Researchers should take such integrative computational intelligence approach seriously to understand the complex phenomenon of gene regulation and thus to simulate the development of the cell.
Setting Some Milestones when Modelling Cell Gene Expression Regulatory Circuits Under Variable-Volume Whole-cell Modelling Framework: I. Generalities
2017, Revista de Chimie
A Hybrid Multiobjective Memetic Metaheuristic for Multiple Sequence Alignment
2016, IEEE Transactions on Evolutionary Computation
A Multi-State Optimization Framework for Parameter Estimation in Biological Systems
2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics
Identification of target system operations: The practice of determining the optimal control
2015, Eastern-European Journal of Enterprise Technologies

View all citing articles on Scopus

View full text

Multi-criterion optimization for genetic network modeling

Abstract

Introduction

Section snippets

Learning genetic network models

Solving the multi-criterion optimization problem

Mean squared error versus robustness and connectivity

Experiments

Discussion

Acknowledgements

Cell

The hardwiring of development: Organization and function of genomic regulatory systems

Development

Ridge regression: applications to nonorthogonal problems

Technometrics