Journal of Molecular Biology
Volume 405, Issue 5, 4 February 2011, Pages 1295-1310
Journal home page for Journal of Molecular Biology

iWRAP: An Interface Threading Approach with Application to Prediction of Cancer-Related Protein–Protein Interactions

https://doi.org/10.1016/j.jmb.2010.11.025Get rights and content

Abstract

Current homology modeling methods for predicting protein–protein interactions (PPIs) have difficulty in the “twilight zone” (< 40%) of sequence identities. Threading methods extend coverage further into the twilight zone by aligning primary sequences for a pair of proteins to a best-fit template complex to predict an entire three-dimensional structure. We introduce a threading approach, iWRAP, which focuses only on the protein interface. Our approach combines a novel linear programming formulation for interface alignment with a boosting classifier for interaction prediction. We demonstrate its efficacy on SCOPPI, a classification of PPIs in the Protein Databank, and on the entire yeast genome. iWRAP provides significantly improved prediction of PPIs and their interfaces in stringent cross-validation on SCOPPI. Furthermore, by combining our predictions with a full-complex threader, we achieve a coverage of 13% for the yeast PPIs, which is close to a 50% increase over previous methods at a higher sensitivity. As an application, we effectively combine iWRAP with genomic data to identify novel cancer-related genes involved in chromatin remodeling, nucleosome organization, and ribonuclear complex assembly. iWRAP is available at http://iwrap.csail.mit.edu.

Introduction

Protein–protein interactions (PPIs) play a central role in all biological processes. Akin to the complete sequencing of genomes, complete description of interactomes is a fundamental step towards a deeper understanding of biological processes and has a vast potential to impact systems biology, genomics, molecular biology, and therapeutics. Although high-throughput biochemical approaches for discovering PPIs have proven very successful,1, 2, 3, 4 the coverage of experimentally determined PPI data remains poor (Table S1) and is prone to errors.5, 6 Such low coverage is partly because the set of possible PPIs to be verified is so large (50 million for a species with 10,000 genes) that any exhaustive experimental verification will take a long time, even with high-throughput techniques. While the rate of PPI discovery has leveled off in recent years (see Supplementary Fig. S1), the number of solved protein structural complexes has rapidly grown: there has been a 40% increase in the number of complex templates in the 14 months between the two versions of Structural Classification of Proteins database (SCOP, 1.65 and 1.69).7 This growing resource of structural data presents an opportunity to utilize this information for accurate PPI predictions.

There have recently been proposals to harness the information provided by structure-based computational approaches as a potentially high-quality, high-coverage data source for large-scale integrative approaches to interactome construction.8, 9, 10, 11, 12 Prieto et al.13 have reviewed publicly available interaction databases of known structural data that facilitate analysis of PPIs.14, 15, 16 In the absence of a solved structure for a pair of protein “query” sequences, structure-based approaches typically rely on aligning the query sequences to either sequence or structure-based “templates” for solved structures in the Protein Data Bank (PDB).17

In one such approach, homology modeling, two protein sequences are assumed to interact based simply on their primary sequence homology to known interacting proteins. Homology modeling has had considerable success at predicting PPIs on a genome scale11, 18, 19, 20 and reconstructing and predicting three-dimensional multi-protein complexes.9 More recently, Fukuhara and Kawabata have described HOMCOS,21, 22 a web-server that performs a similar task to Aloy and Russell's InterPrets,9 again by homology modeling. MODBase is a database of homology models for protein complexes that have sequence similarity to known structures higher than 50%.23 ADAN is a specialized database for prediction of protein–protein interactions mediated by linear motifs and utilizes position-specific matrices to assess putative interactions.24 Other sequence-based methods utilize genetic information and multiple sequence alignments to predict specific protein–protein interactions.25, 26, 27, 28 However, effective use of homology modeling requires relatively high sequence similarity between the query and template protein pairs.8

In another popular approach, threading, the three-dimensional structure for a pair of protein query sequences is predicted by aligning their sequences to templates, based on both sequence and structure profiles, for complexes in the PDB to see if a similar structure can be found. The goodness of a query pair-template alignment is evaluated using a scoring function. The essential computational components of a PPI threading approach are template construction, alignment of query sequences to templates, and interaction scoring. Lu et al. developed Multiprospector,29 a threading algorithm that constructs statistical potential functions to evaluate potential PPIs.30 Singh et al. further proposed a machine-learning-based threading algorithm, DBLRAP, which also performs full complex threading, and demonstrated its superiority in predicting PPIs over homology modeling and Multiprospector.8, 31 Threading identifies compatible structures for proteins that share less sequence similarity with the template, thus typically widening the range of proteins for which predictions can be made over homology modeling.

While homology modeling/threading approaches work well and have good overall accuracy when sequences are somewhat similar to their putative templates, they perform poorly in the “twilight zone” of sequence identities. In particular, they often give inaccurate alignments in the putative interaction regions for sequences with low similarity and therefore are unable to predict interactions accurately in such cases, which we demonstrated previously for the special case of cytokines.32 It has been observed that functional residues such as those at the interface are more conserved than nonfunctional ones, both in sequence33, 34, 35 and structure.36, 37 Furthermore, it has been shown just recently that partial homology models, based only on interface alignments, are good candidates for templates used in docking studies.38 Here, we capitalize on these observations by performing threading only on the protein–protein interface after a suitable complex template is identified.

We introduce the program iWRAP (Interface Weighted RAPtor), which predicts whether two proteins interact by combining a novel linear programming approach for interface alignment with a boosting classifier39 for interaction prediction. iWRAP simultaneously optimizes contacts in query sequences to templates of protein–protein interfaces, after constraining alignments to only those residues likely to be involved in the interaction. This approach is in contrast to existing threading approaches that align each sequence individually to an entire protein structure in the complex. We recently demonstrated the utility of interface threading on two cytokine receptor families by implementing LTHREADER,32 where we manually generated templates specific to this family and aligned each query sequence separately to each template. The driving hypothesis of iWRAP's approach is that a more accurate prediction of protein–protein interfaces improves predictions of protein–protein interactions. We show here for general PPIs that (i) more accurate interface alignments lead to improved interface contact prediction, which in turn (ii) significantly improves PPI prediction. Thus, by optimizing the interface alignments after identifying a suitable template, iWRAP exploits functional conservation at the interface to predict PPIs.

We demonstrate the efficacy of these techniques on two data sets, SCOPPI, a database that classifies protein complexes in the PDB,40 and the yeast genome. First, we use SCOPPI as our gold standard database to confirm hypothesis: (i) We show that interface threading (i.e., localized threading) leads to better interface contact prediction over full-complex threaders. For difficult alignment problems and a range of sequence identity values less than 40%, iWRAP outperforms standard threading and sequence-based methods, while for easier problems the methods are comparable. Our results on the full yeast genome scan address hypothesis. (ii) We demonstrate that our method, which novelly uses boosting39 to classify iWRAP's interface threading scores for PPI prediction, outperforms methods based on whole-sequence alignments. In particular, we perform a full genome scan of yeast to predict interactions and compare iWRAP's performance on experimental data to DBLRAP, which has been shown to have the best performance amongst available structure-based PPI prediction methods.8, 31

As an application, through mapping of yeast cancer related genes and their putative interactions to the human genome, we identify interactions enriched relative to a recent yeast genetic interaction set.41 We find that these interacting genes are involved in chromatin remodeling, ribonuclear complex assembly, and nucleosome organization,42 processes known to be critically involved in cancer. We focus on yeast cancer-related genes and putative interactions, since the function and interactions of yeast genes are much better understood than human genes.43 Moreover, the malignant behavior of human cells is often caused by dysregulation of cell cycle, growth, and apoptosis processes that are conserved across eukaryotic organisms at the level of genes and their interactions.44

iWRAP's predictions are made publicly available at its website so that they can be used for further exploration or systems-level integrative approaches.

Section snippets

Overview of the threading algorithm

We develop iWRAP, an algorithm for threading query sequence pairs to only the interface of a suitable complex template. Figure 1 is a schematic of iWRAP, displaying a flowchart of the various stages of the algorithm. In the first stage, template construction, from alignments of multiple protein–protein interfaces,36 we construct specific interface profiles based on amino acid propensities, secondary structure, and solvent accessibilities for discrete environmental classes of the interface.

In

Discussion

We introduce the program iWRAP and show that integrating interface profiles into a localized scoring scheme aids in interfacial contact prediction. We introduce the use of across-family templates to mitigate the limited number of templates and also capture convergently evolved interface motifs. We apply our approach to predict interacting proteins encoded by the entire yeast genome. Furthermore, by integrating our predictions in a combined functional and enrichment study of cancer-related genes

Stage 1: Template construction

We utilize the SCOPPI classification of protein–protein interfaces to construct interface profiles. SCOPPI classifies interfaces based on sequence and structural similarity of the interface.40 In addition, for each interacting SCOP family pair, SCOPPI provides a sequence alignment of other interfaces in the same SCOP family pair. Here, we use this classification of interfaces to construct our own multiple interface alignments for each SCOP family pair using CMAPi.36 CMAPi employs a contact-map

Acknowledgements

Thanks to Rohit Singh, Vinay Pulim, and Daniel Park for help with data and software. Thanks to Jerome Waldispuhl and anonymous reviewers for critical reading of the manuscript. Funding was provided by National Institutes of Health grant 1R01GM081871.

References (66)

  • UetzP. et al.

    A comprehensive analysis of protein–protein interactions in saccharomyces cerevisiae

    Nature

    (2000)
  • GiotL. et al.

    A protein interaction map of drosophila melanogaster

    Science

    (2003)
  • BjörklandA. et al.

    Quantitative assessment of the structural bias in protein–protein interaction assays

    Proteomics

    (2008)
  • SontagD. et al.

    Probabilistic modeling of systematic errors in two-hybrid experiments

    Proc. Pac. Symp. Biocomput.

    (2007)
  • SinghR. et al.

    Struct2net: integrating structure into protein–protein interaction prediction

    Proc. Pac. Symp. Biocomput.

    (2006)
  • AloyP. et al.

    Interrogating protein interactions networks through structural biology

    Proc. Natl Acad. Sci.

    (2002)
  • KimP. et al.

    Relating three-dimensional structures to protein networks provides evolutionary insights

    Science

    (2006)
  • AloyP. et al.

    Structural systems biology: modelling protein interactions

    Nat. Rev. Mol. Cell Biol.

    (2006)
  • AytunaA. et al.

    Prediction of protein–protein interactions by combining structure and sequence conservation in protein interfaces

    Bioinformatics

    (2005)
  • PrietoC. et al.

    Structural domain–domain interactions: assessment and comparison with protein–protein interaction data to improve the interactome

    Nucleic Acids Res.

    (2006)
  • SteinA. et al.

    3did: interacting protein domains of known three-dimensional structure

    Nucleic Acids Res.

    (2005)
  • JeffersonE. et al.

    Snappi-db: a database and api of structures, interfaces and alignments for protein–protein interactions

    Nucleic Acids Res.

    (2007)
  • FinnR. et al.

    ipfam: visualization of protein–protein interactions in pdb at domain and amino acid resolutions

    Bioinformatics

    (2005)
  • BermanH. et al.

    The protein data bank

    Nucleic Acids Res.

    (2000)
  • Ben-HurA. et al.

    Kernel methods for predicting protein–protein interactions

    Bioinformatics

    (2005)
  • DengM. et al.

    Inferring domain–domain interactions from protein–protein interactions

    Genome Res.

    (2002)
  • BetelD. et al.

    Structure-templated predictions of novel protein interactions from sequence information

    PLoS Comput. Biol.

    (2007)
  • FukuharaN. et al.

    Prediction of interacting proteins from homology-modeled complex structure using sequence and structure scores

    Biophys. J.

    (2007)
  • FukuharaN. et al.

    HOMCOS: a server to predict interacting protein pairs and interacting sites by homology modeling of complex structures

    Nucleic Acids Res. (Web Server Issue)

    (2008)
  • PieperU. et al.

    MODBASE: a database of annotated comparative protein structure models and associated resources

    Nucleic Acids Res.

    (2009)
  • EncinarJ. et al.

    ADAN: a database for prediction of protein–protein interaction of modular domains mediated by linear motifs

    Bioinformatics

    (2009)
  • ValenciaA. et al.

    In silico two-hybrid system for the selection of physically interacting protein pairs

    Proteins

    (2002)
  • BurgerL. et al.

    Accurate prediction of protein–protein interactions from sequence alignments using a bayesian method

    Mol. Syst. Biol.

    (2008)
  • Cited by (0)

    View full text