Skip to main content
Log in

A network flow model for biclustering via optimal re-ordering of data matrices

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

The analysis of large-scale data sets using clustering techniques arises in many different disciplines and has important applications. Most traditional clustering techniques require heuristic methods for finding good solutions and produce suboptimal clusters as a result. In this article, we present a rigorous biclustering approach, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix. The physical permutations of the rows and columns are accomplished via a network flow model according to a given objective function. This optimal re-ordering model is used in an iterative framework where cluster boundaries in one dimension are used to partition and re-order the other dimensions of the corresponding submatrices. The performance of OREO is demonstrated on metabolite concentration data to validate the ability of the proposed method and compare it to existing clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Anderberg M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)

    Google Scholar 

  2. Jain, A.K., Flynn, P.J.: Image segmentation using clustering. In: Ahuja, N., Bowyer, K. (eds.) Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, pp. 65–83. IEEE Press, Piscataway (1996)

    Google Scholar 

  3. Salton G.: Developments in automatic text retrieval. Science 253, 974–980 (1991)

    Article  Google Scholar 

  4. Eisen M.B., Spellman P.T., Brown P.O., Botstein D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)

    Article  Google Scholar 

  5. Zhang Y., Skolnick J.: SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem. 25, 865–871 (2004)

    Article  Google Scholar 

  6. Mönnigmann M., Floudas C.A.: Protein loop structure prediction with flexible stem geometries. Protein: Struct. Funct. Bioinform. 61, 748–762 (2005)

    Article  Google Scholar 

  7. Edwards A.W.F., Cavalli-Sforza L.L.: A method for cluster analysis. Biometrics 21, 362–375 (1965)

    Article  Google Scholar 

  8. Wolfe J.H.: Pattern clustering by multivariate mixture analysis. Multivariate Behav. Res. 5, 329–350 (1970)

    Article  Google Scholar 

  9. Jain A.K., Mao J.: Artificial neural networks: a tutorial. IEEE Comput. 29, 31–44 (1996)

    Google Scholar 

  10. Klein R.W., Dubes R.C.: Experiments in projection and clustering by simulated annealing. Pattern Recognit. 22, 213–220 (1989)

    Article  Google Scholar 

  11. Raghavan, V.V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)

  12. Bhuyan, J.N., Raghavan, V.V., Venkatesh, K.E.: Genetic algorithm for clustering with an ordered representation. In: Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 408–415 (1991)

  13. Slonim N., Atwal G.S., Tkacik G., Bialek W.: Information-based clustering. Proc. Natl. Acad. Sci. USA 102(51), 18297–18302 (2005)

    Article  Google Scholar 

  14. Tan M.P., Broach J.R., Floudas C.A.: A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning. J. Glob. Optim. 39(3), 323–346 (2007)

    Article  Google Scholar 

  15. Tan M.P., Broach J.R., Floudas C.A.: Evaluation of normalization and pre-clustering issues in a novel clustering approach: global optimum search with enhanced positioning. J. Bioinform. Comput. Biol. 5(4), 895–913 (2007)

    Article  Google Scholar 

  16. Tan M.P., Smith E.R., Broach J.R., Floudas C.A.: Microarray data mining: a novel optimization-based approach to uncover biologically coherent structures. BMC Biol. 9, 268–283 (2008)

    Google Scholar 

  17. Busygin S., Prokopyev O.A., Pardalos P.M.: An optimization based approach for data classification. Optim. Methods Softw. 22(1), 3–9 (2007)

    Article  Google Scholar 

  18. Lenstra J.K.: Clustering a data array and the traveling-salesman problem. Oper. Res. 22(2), 413–414 (1974)

    Article  Google Scholar 

  19. Lenstra J.K., Rinnooy Kan A.H.G.: Some simple applications of the traveling-salesman problem. Oper. Res. Q 26(4), 717–733 (1975)

    Article  Google Scholar 

  20. Alpert C.J., Kahng A.B.: Splitting an ordering into a partition to minimize diameter. J. Classif. 14, 51–74 (1997)

    Article  Google Scholar 

  21. Climer S., Zhang W.: Rearrangement clustering: pitfalls, remedies, and applications. J. Mach. Learn. Res. 7, 919–943 (2006)

    Google Scholar 

  22. Turner H.L., Bailey T.C., Krzanowski W.J., Hemingway C.A.: Biclustering models for structured microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(4), 316–329 (2005)

    Article  Google Scholar 

  23. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. ISMB 2000, pp. 93–103 (2000)

  24. Reiss D.J., Baliga N.S., Bonneau R.: Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinform. 7, 280–302 (2006)

    Article  Google Scholar 

  25. Kluger Y., Basri R., Chang J.T., Gerstein M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13, 703–716 (2003)

    Article  Google Scholar 

  26. Prelic A., Bleuler S., Zimmermann P., Wille A., Buhlmann P., Gruissem W., Hennig L., Thiele L., Zitzler E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)

    Article  Google Scholar 

  27. Tanay A., Sharan R., Shamir R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, S136–S144 (2002)

    Google Scholar 

  28. Yoon S., Nardini C., Benini L., Micheli G.: Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(4), 339–354 (2005)

    Article  Google Scholar 

  29. Bleuler, S., Prelic, A., Zitzler, E.: An EA framework for biclustering of gene expression data. In: IEEE Congress on Evolutionary Computation, pp. 166–173 (2004)

  30. Divina F., Aguilar J.: Biclustering of expression data with evolutionary computation. Trans. Knowl. Data Eng. 18(5), 590–602 (2006)

    Article  Google Scholar 

  31. Busygin S., Prokopyev O.A., Pardalos P.M.: Feature selection for consistent biclustering via fractional 0–1 programming. J. Comb. Optim. 10, 7–21 (2005)

    Article  Google Scholar 

  32. Ford L.R., Fulkerson D.R.: Flows in Networks. Princeton University Press, Princeton (1962)

    Google Scholar 

  33. Floudas C.A., Grossmann I.E.: Synthesis of flexible heat exchanger networks with uncertain flowrates and temperatures. Comput. Chem. Eng. 11(4), 319–336 (1987)

    Article  Google Scholar 

  34. Ciric A.R., Floudas C.A.: A retrofit approach for heat-exchanger networks. Comput. Chem. Eng. 13(6), 703–715 (1989)

    Article  Google Scholar 

  35. Floudas C.A., Anastasiadis S.H.: Synthesis of distillation sequences with several multicomponent feed and product streams. Chem. Eng. Sci. 43(9), 2407–2419 (1988)

    Article  Google Scholar 

  36. Kokossis A.C., Floudas C.A.: Optimization of complex reactor networks-II: nonisothermal operation. Chem. Eng. Sci. 49(7), 1037–1051 (1994)

    Article  Google Scholar 

  37. Aggarwal A., Floudas C.A.: Synthesis of general separation sequences—nonsharp separations. Comput. Chem. Eng. 14(6), 631–653 (1990)

    Article  Google Scholar 

  38. CPLEX.: ILOG CPLEX 9.0 User’s Manual (2005)

  39. Applegate D.L., Bixby R.E., Chvatal V., Cook W.J.: The traveling salesman problem: a computational study. Princeton University Press, Princeton (2007)

    Google Scholar 

  40. Brauer M.J., Yuan J., Bennett B., Lu W., Kimball E., Bostein D., Rabinowitz J.D.: Conservation of the metabolomic response to starvation across two divergent microbes. Proc. Natl. Acad. Sci. USA 103, 19302–19307 (2006)

    Article  Google Scholar 

  41. Ihmels J., Friedlander G., Bergmann S., Sarig O., Ziv Y., Barkai N.: Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31, 370–377 (2002)

    Google Scholar 

  42. Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the Sixth Annual International Conference on Computational Biology (RECOMB 2002), Washington, DC, USA, pp. 49–57 (2002)

  43. Grothaus G.A., Mufti A., Murali T.M.: Automatic layout and visualization of biclusters. Algorithms Mol. Biol. 1, 1–15 (2006)

    Article  Google Scholar 

  44. Androulakis I.P., Maranas C.D., Floudas C.A.: Prediction of oligopeptide conformations via deterministic global optimization. J. Glob. Optim. 11, 1–34 (1997)

    Article  Google Scholar 

  45. Klepeis J.L., Floudas C.A.: Free energy calculations for peptides via deterministic global optimization. J. Chem. Phys. 110, 7491–7512 (1999)

    Article  Google Scholar 

  46. Klepeis J.L., Floudas C.A., Morikis D., Lambris J.D.: Predicting peptide structures using NMR data and deterministic global optimization. J. Comput. Chem. 20(13), 1354–1370 (1999)

    Article  Google Scholar 

  47. Klepeis J.L., Floudas C.A.: Ab initio tertiary structure prediction of proteins. J. Glob. Optim. 25, 113–140 (2003)

    Article  Google Scholar 

  48. Klepeis J.L., Floudas C.A.: ASTRO-FOLD: a combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys. J. 85, 2119–2146 (2003)

    Article  Google Scholar 

  49. Klepeis J.L., Floudas C.A., Morikis D., Tsokos C.G., Argyropoulos E., Spruce L., Lambris J.D.: Integrated computational and experimental approach for lead optimization and design of compstatin variants with improved activity. J. Am. Chem. Soc. 125(28), 8422–8423 (2003)

    Article  Google Scholar 

  50. Fung H.K., Floudas C.A., Taylor M.S., Zhang L., Morikis D.: Towards full sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys. J. 94, 584–599 (2008)

    Article  Google Scholar 

  51. Lin X., Floudas C.A.: Design, synthesis and scheduling of multipurpose batch plants via an effective continuous-time formulation. Comput. Chem. Eng. 25, 665–674 (2001)

    Article  Google Scholar 

  52. Janak S.L., Lin X., Floudas C.A.: Enhanced continuous-time unit-specific event based formulation for short-term scheduling of multipurpose batch processes: resource constraints and mixed storage policies. Ind. Eng. Chem. Res. 43, 2516–2533 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christodoulos A. Floudas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

DiMaggio, P.A., McAllister, S.R., Floudas, C.A. et al. A network flow model for biclustering via optimal re-ordering of data matrices. J Glob Optim 47, 343–354 (2010). https://doi.org/10.1007/s10898-008-9349-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-008-9349-z

Keywords

Navigation