Journal of Molecular Biology
Volume 356, Issue 1, 10 February 2006, Pages 222-236
Journal home page for Journal of Molecular Biology

Inferring Meaningful Pathways in Weighted Metabolic Networks

https://doi.org/10.1016/j.jmb.2005.09.079Get rights and content

An approach is presented for computing meaningful pathways in the network of small molecule metabolism comprising the chemical reactions characterized in all organisms. The metabolic network is described as a weighted graph in which all the compounds are included, but each compound is assigned a weight equal to the number of reactions in which it participates. Path finding is performed in this graph by searching for one or more paths with lowest weight. Performance is evaluated systematically by computing paths between the first and last reactions in annotated metabolic pathways, and comparing the intermediate reactions in the computed pathways to those in the annotated ones. For the sake of comparison, paths are computed also in the un-weighted raw (all compounds and reactions) and filtered (highly connected pool metabolites removed) metabolic graphs, respectively. The correspondence between the computed and annotated pathways is very poor (<30%) in the raw graph; increasing to ∼65% in the filtered graph; reaching ∼85% in the weighted graph. Considering the best-matching path among the five lightest paths increases the correspondence to 92%, on average. We then show that the average distance between pairs of metabolites is significantly larger in the weighted graph than in the raw unfiltered graph, suggesting that the small-world properties previously reported for metabolic networks probably result from irrelevant shortcuts through pool metabolites. In addition, we provide evidence that the length of the shortest path in the weighted graph represents a valid measure of the “metabolic distance” between enzymes.

We suggest that the success of our simplistic approach is rooted in the high degree of specificity of the reactions in metabolic pathways, presumably reflecting thermodynamic constraints operating in these pathways. We expect our approach to find useful applications in inferring metabolic pathways in newly sequenced genomes.

Introduction

Small molecules metabolism (SMM) comprises the ensemble of chemical reactions used by the cell to process medium size and small molecules. These molecules serve as essential building blocks for vital macromolecules, such as proteins, nucleic acids, lipids and sugars, or act as co-factors, regulators and modulators of enzyme activity, gene expression, and signal transduction. SMM is commonly organized in pathways,1 where chemical reactions are connected to each other through the small molecules (the metabolites), the product of one reaction acting as the substrate of the next. Most, but not all, reactions of this network are catalyzed by enzymes.

Much of our knowledge about metabolic pathways has been obtained from experiments on model organisms such as Escherichia coli, a few other bacteria, the yeast Saccharomyces cerevisiae, and focused studies in mouse and human. With genome sequences becoming available for a growing number of different organisms, efforts have been devoted to extending this knowledge to other organisms. A method for accomplishing this task is metabolic reconstruction. It involves inferring the metabolic pathways of an organism from its genome sequence, using as template the metabolic pathways characterized in related organisms.2, 3 To store information on known and inferred metabolic pathways, specialized databases have been developed. These include WIT,4, 5 EcoCyc, MetaCyc,6, 7, 8 KEGG9, 10, 11 and aMAZE.12, 13, 14

In addition, analyses of the networks formed by connecting all the reactions and enzymes involved in SMM have attracted much attention. In particular, global properties of these networks have been investigated. Representing these networks as graphs, where reactions are connected to each other via their substrates and products, the distribution of node connectivity, the shortest distance between nodes, and the modularity of the graph were characterized.15, 16, 17, 18

Path finding in metabolic networks has also been used to measure a “metabolic distance”, between pairs of enzymes in E. coli. This distance, defined as the length of the shortest path on the metabolic graph, was compared with protein structure and sequence similarity measures and with distances between the corresponding genes in the genome, in order to gain insights on how the organization and function of metabolic enzymes might have evolved.19, 20, 21 In another study,22 a “universal” metabolic graph was built by linking all the metabolic reactions known to occur, irrespective of the organism and including those taking place in microbes specialized in degrading toxic compounds. Metabolic distances between compounds in this graph were then used to suggest a mechanism whereby degradation pathways may have evolved from central metabolism.

While the above-mentioned studies provided useful insights, they have not addressed some of the fundamental issues that determine our ability to build and analyze graphs that represent functional networks such as SMMs in a biologically meaningful manner. Rison et al.20 restricted their metabolic distance calculations to well-defined portions of the SMM network, restricted to reactions known to occur in one organism (E. coli). Moreover, their distances were computed considering separate subsets of reactions from the E. coli network (groupings of metabolic pathways). Due to these restrictions their procedure does not consider alternative pathways that would be chemically feasible or those that pass through transversal links between known pathways.

Exploring such alternative connections and links should be useful for analyzing functional relations between proteins, and for discovering alternative pathways that may occur under certain cellular conditions or in related organisms. But doing so will dramatically increase the number of possible paths, of which many may be biologically irrelevant.23 To eliminate such irrelevant pathways, additional information must be used. Some studies used expression profiles.24, 25 More often, however, considerations based on the chemistry of the compounds have been employed as constraints.26, 27 In an earlier work,25, 28 we showed that navigating the SMM network in a chemically relevant manner is hindered by the presence of compounds such as H2O, O2, NAD+, NADH, NADP+, H+, NADPH. These so-called pool metabolites or co-factors are consumed and produced in many reactions (e.g. 1999 reactions for H2O, 601 for NAD). These pool metabolites are the so-called “hubs” in the scale-free SMM networks described by Jeong & Barabasi.15 Although path finding in metabolic networks from which pool metabolites have been excluded16, 17, 18, 25, 28 avoids some of the obvious pitfalls, it is far from satisfactory. The choice of excluded compounds has to be made somewhat arbitrarily and little is known on how effective the resulting procedure is in deriving biologically meaningful pathways. A much more relevant approach involves taking into account the chemistry of the metabolites and, more particularly, the actual transfer of structural moieties between adjacent reactions.26, 27 This approach requires, however, a detailed annotation of the over 5000 metabolites in order to describe each compound in terms of its atoms and chemical connectivity, and the application of rather sophisticated graph matching procedures, capable of reliably identifying common substructures between substrates and products of the same and adjacent reactions, in order to determine the biochemically relevant connectivity between reactions in a pathway. Other approaches rely on a manual specification of the links in each reaction in order to retain only those that are biochemically relevant.29

Here, we describe a novel approach for representing the universal SMM network, and show that it enables the calculation of biologically meaningful pathways between pairs of reactions, offering the opportunity for discovering new pathways, determining functional distances between enzymes, or building pathways from groups of enzymes. In this approach, the universal SMM network is represented as a weighted graph, where each compound is given a weight equal to its degree of connectivity, i.e. the number of reactions in which it participates as substrate or product. The path finding calculations identify the k paths with minimal weight(s) (referred to as “lightest paths” in a weighted graph, and “shortest paths” an unweighted graph) between a pair of source and target reactions (see Methods for definitions of pathway distances and weights). When these calculations are performed on reaction pairs representing the first and last reactions in every known metabolic pathway of E. coli and S. cerevisiae annotated in the aMAZE14 or EcoCyc2 databases, the inferred paths are shown to correspond remarkably well to the corresponding annotated pathways. In some cases alternative pathways, which exist in other organisms, are also inferred.

This extremely simple approach requires no pre-processing of the compound data, other than calculating the weight of each metabolite in the universal SMM network, and the reasons why it works at all seem to be rooted in the apparent high degree of specificity of the reactions in biologically relevant pathways. Indeed, with some notable exceptions, metabolic pathways for the synthesis and degradation of amino acids, nucleic acids and sugars, seem to involve compounds and reactions that occur rarely in other pathways. We suggest that this interesting property of metabolic pathways might reflect energetic constraints, which could in turn confer evolutionary advantages.

We also show that much poorer results are obtained when the same calculations are performed using two other network representations. In the first representation, called “raw graph”, the shortest paths are computed from the universal SMM network, represented as an unweighted graph that comprises all compounds including the pool metabolites. In the second representation (filtered graph), a set of highly connected pool metabolites are simply excluded from the network.28

Section snippets

Path finding in SMM graphs

In order to evaluate the performance of our path finding procedure we computed paths between the first (source) and the last (target) reactions of annotated pathways, and compared the intermediate reactions of the inferred paths to those of the annotated pathways. This evaluation was performed on two independent data sets. The first data set comprises a metabolic network compiled from KEGG/LIGAND (4756 compound nodes, 5985 reactions), and pathways annotated in the aMAZE database,14 which are

Discussion

A number of recent studies applied graph theory to analyze metabolic networks.15, 16, 17, 18 In none of these, however, was path finding applied to weighted SMM graphs and, to the best of our knowledge, few if any have systematically evaluated the relevance of the computed paths by comparing them with annotated pathways.

Here, we introduced a novel representation of the SMM network as a weighted graph and showed that this approach presents a number of important advantages. The most obvious of

LIGAND/KEGG metabolic network

Compounds and reactions associated with SMM in all organisms were parsed from the KEGG/LIGAND database†. The universal SMM graph was built from all known metabolic reactions, regardless of the presence of the corresponding enzymes in the considered organism. We did so for two main reasons: (i) genome annotations are currently very fragmentary, so that an enzyme might be present in an organism even if it is not annotated in its genome; (ii) we did not want

Acknowledgements

We gratefully acknowledge the Kyoto Encyclopaedia of Genes and Genomes (KEGG) for free use of the LIGAND database. Many thanks go to Lorenz Wernisch for help with previous versions of the path finding algorithm. We thank Julio Collado-Vides for useful comments on the manuscript. This work was partially supported by funds from the Government of the Brussels Region. The SCMBB laboratory is a partner of the BioSapiens Network of excellence funded under the sixth Framework programme of the European

References (36)

  • E. Selkov et al.

    MPW: the Metabolic Pathways Database

    Nucl. Acids Res.

    (1998)
  • R. Overbeek et al.

    WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction

    Nucl. Acids Res.

    (2000)
  • P.D. Karp et al.

    The EcoCyc and MetaCyc databases

    Nucl. Acids Res.

    (2000)
  • P.D. Karp et al.

    The MetaCyc Database

    Nucl. Acids Res.

    (2002)
  • C.J. Krieger et al.

    MetaCyc: a multiorganism database of metabolic pathways and enzymes

    Nucl. Acids Res.

    (2004)
  • H. Ogata et al.

    KEGG: Kyoto Encyclopedia of Genes and Genomes

    Nucl. Acids Res.

    (1999)
  • M. Kanehisa et al.

    The KEGG resource for deciphering the genome

    Nucl. Acids Res.

    (2004)
  • J. van Helden et al.

    Representing and analysing molecular and cellular function using the computer

    Biol. Chem.

    (2000)
  • Cited by (100)

    • NetFlow: A tool for isolating carbon flows in genome-scale metabolic networks

      2021, Metabolic Engineering Communications
      Citation Excerpt :

      Many algorithms in the field of metabolic pathfinding have been developed for identifying pathways between metabolites within a metabolic network (Kim et al., 2017). These algorithms generally fall into two categories: graph-based approaches (Croes et al., 2005, 2006; Kim et al., 2020; Simeonidis et al., 2003), which utilize the connectivity between metabolites (nodes) as determined by the reactions (edges) in the network, and stoichiometric approaches (Pharkya et al., 2004; Klamt et al., 2017; Chowdhury and Maranas, 2015; Pey et al., 2011), which use optimization to predict stoichiometrically-balanced pathways to generate a given metabolite. A critical challenge for both approaches is limiting the identification of false-positive pathways due to a few highly connected metabolites, e.g. ATP.

    • In Silico Approaches to Metabolic Engineering

      2017, Current Developments in Biotechnology and Bioengineering: Functional Genomics and Metabolic Engineering
    • What is a Complex System, After All?

      2023, Foundations of Science
    • A graph-theory-based method for processing of currency metabolites in metabolic networks

      2022, Shengwu Gongcheng Xuebao/Chinese Journal of Biotechnology
    View all citing articles on Scopus
    View full text