Journal of Molecular Biology
Inferring Meaningful Pathways in Weighted Metabolic Networks
Introduction
Small molecules metabolism (SMM) comprises the ensemble of chemical reactions used by the cell to process medium size and small molecules. These molecules serve as essential building blocks for vital macromolecules, such as proteins, nucleic acids, lipids and sugars, or act as co-factors, regulators and modulators of enzyme activity, gene expression, and signal transduction. SMM is commonly organized in pathways,1 where chemical reactions are connected to each other through the small molecules (the metabolites), the product of one reaction acting as the substrate of the next. Most, but not all, reactions of this network are catalyzed by enzymes.
Much of our knowledge about metabolic pathways has been obtained from experiments on model organisms such as Escherichia coli, a few other bacteria, the yeast Saccharomyces cerevisiae, and focused studies in mouse and human. With genome sequences becoming available for a growing number of different organisms, efforts have been devoted to extending this knowledge to other organisms. A method for accomplishing this task is metabolic reconstruction. It involves inferring the metabolic pathways of an organism from its genome sequence, using as template the metabolic pathways characterized in related organisms.2, 3 To store information on known and inferred metabolic pathways, specialized databases have been developed. These include WIT,4, 5 EcoCyc, MetaCyc,6, 7, 8 KEGG9, 10, 11 and aMAZE.12, 13, 14
In addition, analyses of the networks formed by connecting all the reactions and enzymes involved in SMM have attracted much attention. In particular, global properties of these networks have been investigated. Representing these networks as graphs, where reactions are connected to each other via their substrates and products, the distribution of node connectivity, the shortest distance between nodes, and the modularity of the graph were characterized.15, 16, 17, 18
Path finding in metabolic networks has also been used to measure a “metabolic distance”, between pairs of enzymes in E. coli. This distance, defined as the length of the shortest path on the metabolic graph, was compared with protein structure and sequence similarity measures and with distances between the corresponding genes in the genome, in order to gain insights on how the organization and function of metabolic enzymes might have evolved.19, 20, 21 In another study,22 a “universal” metabolic graph was built by linking all the metabolic reactions known to occur, irrespective of the organism and including those taking place in microbes specialized in degrading toxic compounds. Metabolic distances between compounds in this graph were then used to suggest a mechanism whereby degradation pathways may have evolved from central metabolism.
While the above-mentioned studies provided useful insights, they have not addressed some of the fundamental issues that determine our ability to build and analyze graphs that represent functional networks such as SMMs in a biologically meaningful manner. Rison et al.20 restricted their metabolic distance calculations to well-defined portions of the SMM network, restricted to reactions known to occur in one organism (E. coli). Moreover, their distances were computed considering separate subsets of reactions from the E. coli network (groupings of metabolic pathways). Due to these restrictions their procedure does not consider alternative pathways that would be chemically feasible or those that pass through transversal links between known pathways.
Exploring such alternative connections and links should be useful for analyzing functional relations between proteins, and for discovering alternative pathways that may occur under certain cellular conditions or in related organisms. But doing so will dramatically increase the number of possible paths, of which many may be biologically irrelevant.23 To eliminate such irrelevant pathways, additional information must be used. Some studies used expression profiles.24, 25 More often, however, considerations based on the chemistry of the compounds have been employed as constraints.26, 27 In an earlier work,25, 28 we showed that navigating the SMM network in a chemically relevant manner is hindered by the presence of compounds such as H2O, O2, NAD+, NADH, NADP+, H+, NADPH. These so-called pool metabolites or co-factors are consumed and produced in many reactions (e.g. 1999 reactions for H2O, 601 for NAD). These pool metabolites are the so-called “hubs” in the scale-free SMM networks described by Jeong & Barabasi.15 Although path finding in metabolic networks from which pool metabolites have been excluded16, 17, 18, 25, 28 avoids some of the obvious pitfalls, it is far from satisfactory. The choice of excluded compounds has to be made somewhat arbitrarily and little is known on how effective the resulting procedure is in deriving biologically meaningful pathways. A much more relevant approach involves taking into account the chemistry of the metabolites and, more particularly, the actual transfer of structural moieties between adjacent reactions.26, 27 This approach requires, however, a detailed annotation of the over 5000 metabolites in order to describe each compound in terms of its atoms and chemical connectivity, and the application of rather sophisticated graph matching procedures, capable of reliably identifying common substructures between substrates and products of the same and adjacent reactions, in order to determine the biochemically relevant connectivity between reactions in a pathway. Other approaches rely on a manual specification of the links in each reaction in order to retain only those that are biochemically relevant.29
Here, we describe a novel approach for representing the universal SMM network, and show that it enables the calculation of biologically meaningful pathways between pairs of reactions, offering the opportunity for discovering new pathways, determining functional distances between enzymes, or building pathways from groups of enzymes. In this approach, the universal SMM network is represented as a weighted graph, where each compound is given a weight equal to its degree of connectivity, i.e. the number of reactions in which it participates as substrate or product. The path finding calculations identify the k paths with minimal weight(s) (referred to as “lightest paths” in a weighted graph, and “shortest paths” an unweighted graph) between a pair of source and target reactions (see Methods for definitions of pathway distances and weights). When these calculations are performed on reaction pairs representing the first and last reactions in every known metabolic pathway of E. coli and S. cerevisiae annotated in the aMAZE14 or EcoCyc2 databases, the inferred paths are shown to correspond remarkably well to the corresponding annotated pathways. In some cases alternative pathways, which exist in other organisms, are also inferred.
This extremely simple approach requires no pre-processing of the compound data, other than calculating the weight of each metabolite in the universal SMM network, and the reasons why it works at all seem to be rooted in the apparent high degree of specificity of the reactions in biologically relevant pathways. Indeed, with some notable exceptions, metabolic pathways for the synthesis and degradation of amino acids, nucleic acids and sugars, seem to involve compounds and reactions that occur rarely in other pathways. We suggest that this interesting property of metabolic pathways might reflect energetic constraints, which could in turn confer evolutionary advantages.
We also show that much poorer results are obtained when the same calculations are performed using two other network representations. In the first representation, called “raw graph”, the shortest paths are computed from the universal SMM network, represented as an unweighted graph that comprises all compounds including the pool metabolites. In the second representation (filtered graph), a set of highly connected pool metabolites are simply excluded from the network.28
Section snippets
Path finding in SMM graphs
In order to evaluate the performance of our path finding procedure we computed paths between the first (source) and the last (target) reactions of annotated pathways, and compared the intermediate reactions of the inferred paths to those of the annotated pathways. This evaluation was performed on two independent data sets. The first data set comprises a metabolic network compiled from KEGG/LIGAND (4756 compound nodes, 5985 reactions), and pathways annotated in the aMAZE database,14 which are
Discussion
A number of recent studies applied graph theory to analyze metabolic networks.15, 16, 17, 18 In none of these, however, was path finding applied to weighted SMM graphs and, to the best of our knowledge, few if any have systematically evaluated the relevance of the computed paths by comparing them with annotated pathways.
Here, we introduced a novel representation of the SMM network as a weighted graph and showed that this approach presents a number of important advantages. The most obvious of
LIGAND/KEGG metabolic network
Compounds and reactions associated with SMM in all organisms were parsed from the KEGG/LIGAND database†. The universal SMM graph was built from all known metabolic reactions, regardless of the presence of the corresponding enzymes in the considered organism. We did so for two main reasons: (i) genome annotations are currently very fragmentary, so that an enzyme might be present in an organism even if it is not annotated in its genome; (ii) we did not want
Acknowledgements
We gratefully acknowledge the Kyoto Encyclopaedia of Genes and Genomes (KEGG) for free use of the LIGAND database. Many thanks go to Lorenz Wernisch for help with previous versions of the path finding algorithm. We thank Julio Collado-Vides for useful comments on the manuscript. This work was partially supported by funds from the Government of the Brussels Region. The SCMBB laboratory is a partner of the BioSapiens Network of excellence funded under the sixth Framework programme of the European
References (36)
- et al.
A reconstruction of the metabolism of Methanococcus jannaschii from sequence data
Gene
(1997) - et al.
Computation with the KEGG pathway database
Biosystems
(1998) - et al.
Evolution of enzymes in metabolism: a network perspective
J. Mol. Biol.
(2002) - et al.
Homology, pathway distance and chromosomal localization of the small molecule metabolism enzymes in Escherichia coli
J. Mol. Biol.
(2002) - et al.
Analysis of metabolic networks using a pathway distance metric through linear programming
Metab. Eng.
(2003) Finding paths in graphs avoiding forbidden transitions
Discrete Appl. Math.
(2003)Estimation of standard Gibbs energy changes of biotransformations
J. Biol. Chem.
(1991)- et al.
Synthesis of biochemical production routes
Comput. Chem. Eng.
(1992) Biochemistry
(1995)- et al.
HinCyc: a knowledge base of the complete genome and metabolic pathways of H. influenzae
Proc. Int. Conf. Intell. Syst. Mol. Biol.
(1996)
MPW: the Metabolic Pathways Database
Nucl. Acids Res.
WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction
Nucl. Acids Res.
The EcoCyc and MetaCyc databases
Nucl. Acids Res.
The MetaCyc Database
Nucl. Acids Res.
MetaCyc: a multiorganism database of metabolic pathways and enzymes
Nucl. Acids Res.
KEGG: Kyoto Encyclopedia of Genes and Genomes
Nucl. Acids Res.
The KEGG resource for deciphering the genome
Nucl. Acids Res.
Representing and analysing molecular and cellular function using the computer
Biol. Chem.
Cited by (100)
NetFlow: A tool for isolating carbon flows in genome-scale metabolic networks
2021, Metabolic Engineering CommunicationsCitation Excerpt :Many algorithms in the field of metabolic pathfinding have been developed for identifying pathways between metabolites within a metabolic network (Kim et al., 2017). These algorithms generally fall into two categories: graph-based approaches (Croes et al., 2005, 2006; Kim et al., 2020; Simeonidis et al., 2003), which utilize the connectivity between metabolites (nodes) as determined by the reactions (edges) in the network, and stoichiometric approaches (Pharkya et al., 2004; Klamt et al., 2017; Chowdhury and Maranas, 2015; Pey et al., 2011), which use optimization to predict stoichiometrically-balanced pathways to generate a given metabolite. A critical challenge for both approaches is limiting the identification of false-positive pathways due to a few highly connected metabolites, e.g. ATP.
In Silico Approaches to Metabolic Engineering
2017, Current Developments in Biotechnology and Bioengineering: Functional Genomics and Metabolic EngineeringMetabolic assessment of E. coli as a Biofactory for commercial products
2016, Metabolic EngineeringWhat is a Complex System, After All?
2023, Foundations of ScienceA graph-theory-based method for processing of currency metabolites in metabolic networks
2022, Shengwu Gongcheng Xuebao/Chinese Journal of Biotechnology