Elsevier

Artificial Intelligence in Medicine

Volume 35, Issues 1–2, September–October 2005, Pages 37-47
Artificial Intelligence in Medicine

Discovering reliable protein interactions from high-throughput experimental data using network topology

https://doi.org/10.1016/j.artmed.2005.02.004Get rights and content

Summary

Objective:

Current protein–protein interaction (PPI) detection via high-throughput experimental methods, such as yeast-two-hybrid has been reported to be highly erroneous, leading to potentially costly spurious discoveries. This work introduces a novel measure called IRAP, i.e. “interaction reliability by alternative path”, for assessing the reliability of protein interactions based on the underlying topology of the PPI network.

Methods and materials:

A candidate PPI is considered to be reliable if it is involved in a closed loop in which the alternative path of interactions between the two interacting proteins is strong. We devise an algorithm called AlternativePathFinder to compute the IRAP value for each interaction in a complex PPI network. Validation of the IRAP as a measure for assessing the reliability of PPIs is performed with extensive experiments on yeast PPI data. All the data used in our experiments can be downloaded from our supplementary data web site at http://www.comp.nus.edu.sg/∼chenjin/data.html.

Results:

Results show consistently that IRAP measure is an effective way for discovering reliable PPIs in large datasets of error-prone experimentally-derived PPIs. Results also indicate that IRAP is better than IG2, and markedly better than the more simplistic IG1 measure.

Conclusion:

Experimental results demonstrate that a global, system-wide approach—such as IRAP that considers the entire interaction network instead of merely local neighbors—is a much more promising approach for assessing the reliability of PPIs.

Introduction

Technological developments in high-throughput protein–protein interaction (PPI) detection methods, such as yeast-two-hybrid [1] and protein chips [2] have enabled biologists to experimentally detect PPIs at the whole genome level for many organisms [3], [4], [5], [6], [7]. Unfortunately, a significant proportion of the PPIs obtained from these high-throughput biological experiments has been found to contain false positives. Recent surveys have revealed that the reliability of popular high-throughput yeast-two-hybrid assay can be as low as 50% [8], [9], [10]. These errors in the experimental PPI data will lead to spurious discoveries that can be potentially costly, e.g. wrong drug targets for diseases. It is therefore important to develop systematic methods to detect reliable PPIs from high-throughput experimental data.

Biological studies have shown that the interaction clusters obtained from contiguous connections that form closed loops in PPI networks indicate an increased likelihood of biological relevance for the corresponding potential interactions [3], [11], [12]. Proteins that are found together within a circular contig in yeast-two-hybrid screens have been detected for known proteins in macromolecular complexes as well as signal transduction pathways [11], [12]. We observe that such circular contigs are formed by the presence of alternative paths in the interaction networks. This has led to the use of alternative interaction paths in PPI networks as a measure to indicate the functional linkage between two proteins [3].

In this paper, we propose to use the length and strength of the alternative paths between pairs of interacting proteins as a basis for detecting reliable PPIs from high-throughput experimental data. We introduce a quantitative measure called interaction reliability by alternative path (IRAP) for assessing the reliability of a detected PPI with respect to the presence of alternative reliable interaction paths in the underlying topology of the experimentally derived PPI network. We devise an A lternativePathFinder algorithm to compute the IRAP values of the interactions in large complex PPI networks. Using the yeast protein–PPI data with annotated functional information as well as other experimental data, we show positive experimental results that validate IRAP as a good system-wide measure for discovering reliable PPIs in error-prone high-throughput experimental data.

The rest of this paper is organized as follows. Section 2 gives the related work and the motivation for this work. Section 3 introduces IRAP as a quantitative measure for the reliability of PPIs detected in high-throughput genome-wide experiments. In Section 4, we describe the AlternativePathFinder algorithm for computing IRAP values in complex PPI networks. Section 5 presents the various comparative results of using the computed IRAP values for discovering reliable PPIs for yeast. Finally, we conclude in Section 6 with discussions about further work.

Section snippets

Background

The reported high false positive rates associated with high-throughput experimental PPI data [9], [10] have led researchers to develop methods to assess the reliability of PPIs generated by large-scale biological experiments.

One approach is to combine the results from multiple independent detection methods to derive highly reliable data [9]. However, this approach has limited applicability because of the low overlap [9], [13] between the different detection methods.

Another approach is to model

Interaction reliability by alternative path (IRAP)

In this section, we define the proposed interaction reliability measure—interaction reliability by alternative path (IRAP)—that assigns a reliability value to each candidate interacting protein pair in genome-wide PPI data. The reliability of a candidate PPI is indicated by the collective reliability of the strongest alternative path of interactions connecting the two proteins in the underlying PPI network. A reliable PPI is accompanied by at least one reliable alternative interaction path in

AlternativePathFinderalgorithm

The yeast PPI network is very large in size and highly loopy. The network constructed for the yeast PPIs in our experiments has more than 4000 nodes and 8000 edges with many loopy components. Hence, it is necessary to develop an efficient method to find the strongest alternative path and compute the IRAP value for each candidate interacting pair (vA,vB) in G, where G is a PPI network as described in Section 3.1.

Based on the definition of IRAP, the strongest alternative path is not necessarily

Experimental validations of IRAP

We implement the AlternativePathFinder algorithm in C++, and apply it to compute the IRAP values of PPIs in large PPI networks generated by data from high-throughout genome-wide biological experimental methods. We combine the following publicly available yeast PPI datasets:

  • (1)

    From Ito et al. [3], we download the core dataset containing 841 PPIs available from the BRITE web site at KEGG [19] at http://www.genome.ad.jp/brite (accessed: 11 April 2004). The core set of Ito is formed by cases in which

Conclusions

The dissection of the protein interactome is important for extracting invaluable biological knowledge for understanding the molecular mechanism of our cellular system, and eventually leading to the discovery of new drugs and drug targets for various human diseases. Thus far, most of the recent technological advance in this field has focused on the high throughput detection of PPIs in order to map the tremendously vast protein interactome. Unfortunately, the PPI data that have been generated in

Acknowledgments

We would like to thank Dr. Limsoon Wong (Institute for Infocomm Research), and Dr. Prasanna Ratnakar Kolatkar (Genome Institute of Singapore) for their invaluable comments and advice on this work.

References (26)

  • C.V. Mering et al.

    Comparative assessment of largescale data sets of protein–protein interactions

    Nature

    (2002)
  • A.J. Walhout et al.

    Protein interaction mapping in c elegans using proteins involved in vulval development

    Science

    (2000)
  • A. Walhout et al.

    Yeast two-hybrid systems and protein interaction mapping projects for yeast and worm

    Yeast

    (2000)
  • Cited by (36)

    • Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding

      2013, Neurocomputing
      Citation Excerpt :

      More specifically, IG [13,14] use the local topology of a pair of proteins to rank their interaction probability. Chen [15] introduced a more global measure called IRAP, which is defined as the collective reliability of the strongest alternative path between two proteins. A reliability index called CD-Dist, which is defined as the proportion of interaction partners that two proteins have in common, was originally introduced to predict the protein function by Brun et al. [16].

    • Exploitation of genetic interaction network topology for the prediction of epistatic behavior

      2013, Genomics
      Citation Excerpt :

      Finally, the networks are re-densified by adding links uniformly at random from the whole set of links that are not present in the sparsified network, and only the techniques that turned out to be more robust in the sparsification process are used in this case (the random predictor is also considered). Performance evaluation by GO is the preferred tool to measure precision in interaction reliability assessment and prediction [14,16,22,24,32,33], its use has been motivated by the guilt-by-association principle [31] and, in fact, the GO is used after high-throughput detection of epistasis to discriminate between true and false positives [34]. Nevertheless, GO annotations may present experimental bias or inherent errors [35].

    • Increasing reliability of protein interactome by fast manifold embedding

      2013, Pattern Recognition Letters
      Citation Excerpt :

      More specifically, Saito et al. (2002) and Saito et al. (2003) developed two indices called IG1 and IG2, which use the local topology of a pair of proteins to rank their interaction probability. Chen et al. (2005) and Chen et al. (2006) introduced a more global measure called IRAP, which is defined as the collective reliability of the strongest alternative path between two proteins. A reliability index called CD-Dist, which is defined as the proportion of interaction partners that two proteins have in common, was originally introduced to predict the protein function by Brun et al. (2003).

    • Clustering of high throughput gene expression data

      2012, Computers and Operations Research
      Citation Excerpt :

      Abstract networks such as co-expression networks use edges from hypothetical inference, whereas concrete ones such as PPI use edges inferred from physical interactions [150]. Chen et al. [22] construct a graph for experimentally detected PPI. Nodes represent proteins and edges are the interactions with edge weights calculated based on a pre-defined formula.

    View all citing articles on Scopus
    View full text