Discovering reliable protein interactions from high-throughput experimental data using network topology
Introduction
Technological developments in high-throughput protein–protein interaction (PPI) detection methods, such as yeast-two-hybrid [1] and protein chips [2] have enabled biologists to experimentally detect PPIs at the whole genome level for many organisms [3], [4], [5], [6], [7]. Unfortunately, a significant proportion of the PPIs obtained from these high-throughput biological experiments has been found to contain false positives. Recent surveys have revealed that the reliability of popular high-throughput yeast-two-hybrid assay can be as low as 50% [8], [9], [10]. These errors in the experimental PPI data will lead to spurious discoveries that can be potentially costly, e.g. wrong drug targets for diseases. It is therefore important to develop systematic methods to detect reliable PPIs from high-throughput experimental data.
Biological studies have shown that the interaction clusters obtained from contiguous connections that form closed loops in PPI networks indicate an increased likelihood of biological relevance for the corresponding potential interactions [3], [11], [12]. Proteins that are found together within a circular contig in yeast-two-hybrid screens have been detected for known proteins in macromolecular complexes as well as signal transduction pathways [11], [12]. We observe that such circular contigs are formed by the presence of alternative paths in the interaction networks. This has led to the use of alternative interaction paths in PPI networks as a measure to indicate the functional linkage between two proteins [3].
In this paper, we propose to use the length and strength of the alternative paths between pairs of interacting proteins as a basis for detecting reliable PPIs from high-throughput experimental data. We introduce a quantitative measure called interaction reliability by alternative path (IRAP) for assessing the reliability of a detected PPI with respect to the presence of alternative reliable interaction paths in the underlying topology of the experimentally derived PPI network. We devise an A lternativePathFinder algorithm to compute the IRAP values of the interactions in large complex PPI networks. Using the yeast protein–PPI data with annotated functional information as well as other experimental data, we show positive experimental results that validate IRAP as a good system-wide measure for discovering reliable PPIs in error-prone high-throughput experimental data.
The rest of this paper is organized as follows. Section 2 gives the related work and the motivation for this work. Section 3 introduces IRAP as a quantitative measure for the reliability of PPIs detected in high-throughput genome-wide experiments. In Section 4, we describe the AlternativePathFinder algorithm for computing IRAP values in complex PPI networks. Section 5 presents the various comparative results of using the computed IRAP values for discovering reliable PPIs for yeast. Finally, we conclude in Section 6 with discussions about further work.
Section snippets
Background
The reported high false positive rates associated with high-throughput experimental PPI data [9], [10] have led researchers to develop methods to assess the reliability of PPIs generated by large-scale biological experiments.
One approach is to combine the results from multiple independent detection methods to derive highly reliable data [9]. However, this approach has limited applicability because of the low overlap [9], [13] between the different detection methods.
Another approach is to model
Interaction reliability by alternative path (IRAP)
In this section, we define the proposed interaction reliability measure—interaction reliability by alternative path (IRAP)—that assigns a reliability value to each candidate interacting protein pair in genome-wide PPI data. The reliability of a candidate PPI is indicated by the collective reliability of the strongest alternative path of interactions connecting the two proteins in the underlying PPI network. A reliable PPI is accompanied by at least one reliable alternative interaction path in
AlternativePathFinderalgorithm
The yeast PPI network is very large in size and highly loopy. The network constructed for the yeast PPIs in our experiments has more than 4000 nodes and 8000 edges with many loopy components. Hence, it is necessary to develop an efficient method to find the strongest alternative path and compute the IRAP value for each candidate interacting pair in G, where G is a PPI network as described in Section 3.1.
Based on the definition of IRAP, the strongest alternative path is not necessarily
Experimental validations of IRAP
We implement the AlternativePathFinder algorithm in C++, and apply it to compute the IRAP values of PPIs in large PPI networks generated by data from high-throughout genome-wide biological experimental methods. We combine the following publicly available yeast PPI datasets:
- (1)
From Ito et al. [3], we download the core dataset containing 841 PPIs available from the BRITE web site at KEGG [19] at http://www.genome.ad.jp/brite (accessed: 11 April 2004). The core set of Ito is formed by cases in which
Conclusions
The dissection of the protein interactome is important for extracting invaluable biological knowledge for understanding the molecular mechanism of our cellular system, and eventually leading to the discovery of new drugs and drug targets for various human diseases. Thus far, most of the recent technological advance in this field has focused on the high throughput detection of PPIs in order to map the tremendously vast protein interactome. Unfortunately, the PPI data that have been generated in
Acknowledgments
We would like to thank Dr. Limsoon Wong (Institute for Infocomm Research), and Dr. Prasanna Ratnakar Kolatkar (Genome Institute of Singapore) for their invaluable comments and advice on this work.
References (26)
- et al.
Protein–protein interaction maps: a lead towards cellular functions
Trends Genet
(2001) - et al.
How reliable are experimental protein–protein interaction data?
J Mol Biol
(2003) - et al.
Protein interactions: two methods for assessment of the reliability of high throughput observations
Mol Cell Proteom
(2002) - et al.
A novel genetic system to detect protein–protein interactions
Nature
(1989) - et al.
Global analysis of protein activities using proteome chips
Science
(2001) - et al.
A comprehensive two-hybrid analysis to explore the yeast protein interactome
Proc Natl Acad Sci USA
(2001) - et al.
A comprehensive analysis of protein–protein interactions in saccharomyces cerevisiae
Nature
(2000) - et al.
Genome-wide analysis of vaccinia virus protein–protein interactions
Proc Natl Acad Sci USA
(2000) - et al.
A protein–protein interaction map of the caenorhabditis elegans 26s proteasome
EMBO Rep
(2001) - et al.
The protein–protein interaction map of helicobacter pylori
Nature
(2001)
Comparative assessment of largescale data sets of protein–protein interactions
Nature
Protein interaction mapping in c elegans using proteins involved in vulval development
Science
Yeast two-hybrid systems and protein interaction mapping projects for yeast and worm
Yeast
Cited by (36)
Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding
2013, NeurocomputingCitation Excerpt :More specifically, IG [13,14] use the local topology of a pair of proteins to rank their interaction probability. Chen [15] introduced a more global measure called IRAP, which is defined as the collective reliability of the strongest alternative path between two proteins. A reliability index called CD-Dist, which is defined as the proportion of interaction partners that two proteins have in common, was originally introduced to predict the protein function by Brun et al. [16].
Exploitation of genetic interaction network topology for the prediction of epistatic behavior
2013, GenomicsCitation Excerpt :Finally, the networks are re-densified by adding links uniformly at random from the whole set of links that are not present in the sparsified network, and only the techniques that turned out to be more robust in the sparsification process are used in this case (the random predictor is also considered). Performance evaluation by GO is the preferred tool to measure precision in interaction reliability assessment and prediction [14,16,22,24,32,33], its use has been motivated by the guilt-by-association principle [31] and, in fact, the GO is used after high-throughput detection of epistasis to discriminate between true and false positives [34]. Nevertheless, GO annotations may present experimental bias or inherent errors [35].
Increasing reliability of protein interactome by fast manifold embedding
2013, Pattern Recognition LettersCitation Excerpt :More specifically, Saito et al. (2002) and Saito et al. (2003) developed two indices called IG1 and IG2, which use the local topology of a pair of proteins to rank their interaction probability. Chen et al. (2005) and Chen et al. (2006) introduced a more global measure called IRAP, which is defined as the collective reliability of the strongest alternative path between two proteins. A reliability index called CD-Dist, which is defined as the proportion of interaction partners that two proteins have in common, was originally introduced to predict the protein function by Brun et al. (2003).
Clustering of high throughput gene expression data
2012, Computers and Operations ResearchCitation Excerpt :Abstract networks such as co-expression networks use edges from hypothetical inference, whereas concrete ones such as PPI use edges inferred from physical interactions [150]. Chen et al. [22] construct a graph for experimentally detected PPI. Nodes represent proteins and edges are the interactions with edge weights calculated based on a pre-defined formula.
A survey on computational models for predicting protein-protein interactions
2021, Briefings in Bioinformatics