Abstract
Protein–protein interactions (PPIs) play crucial roles in a number of biological processes. Recently, protein interaction networks (PINs) for several model organisms and humans have been generated, but few large-scale researches for mice have ever been made neither experimentally nor computationally. In the work, we undertook an effort to map a mouse PIN, in which protein interactions are hidden in enormous amount of biomedical literatures. Following a co-occurrence-based text-mining approach, a probabilistic model—naïve Bayesian was used to filter false-positive interactions by integrating heterogeneous kinds of evidence from genomic and proteomic datasets. A support vector machine algorithm was further used to choose protein pairs with physical interactions. By comparing with the currently available PPI datasets from several model organisms and humans, it showed that the derived mouse PINs have similar topological properties at the global level, but a high local divergence. The mouse protein interaction dataset is stored in the Mouse protein–protein interaction DataBase (MppDB) that is useful source of information for system-level understanding of gene function and biological processes in mammals. Access to the MppDB database is public available at http://bio.scu.edu.cn/mppi.
Similar content being viewed by others
References
Alfarano C, Andrade CE, Anthony K et al (2005) The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res 33:D418–D424. doi:10.1093/nar/gki051
Baldi P, Brunak S, Chauvin Y et al (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16:412–424. doi:10.1093/bioinformatics/16.5.412
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5:101–113. doi:10.1038/nrg1272
Barrett T, Troup DB, Wilhite SE et al (2007) NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic Acids Res 35:D760–D765. doi:10.1093/nar/gkl887
Barsky A, Gardy JL, Hancock RE et al (2007) Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation. Bioinformatics 23:1040–1042. doi:10.1093/bioinformatics/btm057
Beltrao P, Serrano L (2007) Specificity and evolvability in eukaryotic protein interaction networks. PLOS Comput Biol 3:e25. doi:10.1371/journal.pcbi.0030025
Ben-Hur A, Noble WS (2006) Choosing negative examples for the prediction of protein–protein interactions. BMC Bioinformatics 7(Suppl 1):S2. doi:10.1186/1471-2105-7-S1-S2
Berg J, Lassig M (2006) Cross-species analysis of biological networks by Bayesian alignment. Proc Natl Acad Sci USA 103:10967–10972. doi:10.1073/pnas.0602294103
Bowers PM, Pellegrini M, Thompson MJ et al (2004) Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol 5:R35. doi:10.1186/gb-2004-5-5-r35
Brown KR, Jurisica I (2007) Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol 8:R95. doi:10.1186/gb-2007-8-5-r95
Chatr-aryamontri A, Ceol A, Palazzi LM et al (2007) MINT: the molecular INTeraction database. Nucleic Acids Res 35:D572–D574. doi:10.1093/nar/gkl950
Clevers H (2006) Wnt/beta-catenin signaling in development and disease. Cell 127:469–480. doi:10.1016/j.cell.2006.10.018
Cox RD, Brown SD (2003) Rodent models of genetic disease. Curr Opin Genet Dev 13:278–283. doi:10.1016/S0959-437X(03)00051-0
Cui J, Li P, Li G et al (2008) AtPID: Arabidopsis thaliana protein interactome database—an integrative platform for plant systems biology. Nucleic Acids Res 36:D999–D1008. doi:10.1093/nar/gkm844
Date SV, Stoeckert CJ Jr (2006) Computational modeling of the Plasmodium falciparum interactome reveals protein function on a genome-wide scale. Genome Res 16:542–549. doi:10.1101/gr.4573206
Ewing RM, Chu P, Elisma F et al (2007) Large-scale mapping of human protein–protein interactions by mass spectrometry. Mol Syst Biol 3:89. doi:10.1038/msb4100134
Formstecher E, Aresta S, Collura V et al (2005) Protein interaction mapping: a Drosophila case study. Genome Res 15:376–384. doi:10.1101/gr.2659105
Gandhi TK, Zhong J, Mathivanan S et al (2006) Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat Genet 38:285–293. doi:10.1038/ng1747
Gavin AC, Bosche M, Krause R et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141–147. doi:10.1038/415141a
Ge H, Liu Z, Church GM et al (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 29:482–486. doi:10.1038/ng776
Giot L, Bader JS, Brouwer C et al (2003) A protein interaction map of Drosophila melanogaster. Science 302:1727–1736. doi:10.1126/science.1090289
Gordon MD, Nusse R (2006) Wnt signaling: multiple pathways, multiple receptors, and multiple transcription factors. J Biol Chem 281:22429–22433. doi:10.1074/jbc.R600015200
Guan Y, Myers CL, Lu R et al (2008) A genomewide functional network for the laboratory mouse. PLOS Comput Biol 4:e1000165. doi:10.1371/journal.pcbi.1000165
Harris MA, Clark JI, Ireland A et al (2006) The Gene Ontology (GO) project in 2006. Nucleic Acids Res 34:D322–D326. doi:10.1093/nar/gkj021
Hedges SB (2002) The origin and evolution of model organisms. Nat Rev Genet 3:838–849. doi:10.1038/nrg929
Hendrickx M, Leyns L (2008) Non-conventional frizzled ligands and Wnt receptors. Dev Growth Differ 50:229–243
Ho Y, Gruhler A, Heilbut A et al (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415:180–183. doi:10.1038/415180a
Hovatta I, Tennant RS, Helton R et al (2005) Glyoxalase 1 and glutathione reductase 1 regulate anxiety in mice. Nature 438:662–666. doi:10.1038/nature04250
Huang TW, Lin CY, Kao CY (2007) Reconstruction of human protein interolog network using evolutionary conserved network. BMC Bioinformatics 8:152. doi:10.1186/1471-2105-8-152
Ito T, Chiba T, Ozawa R et al (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98:4569–4574. doi:10.1073/pnas.061034498
Jansen R, Yu H, Greenbaum D et al (2003) A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302:449–453. doi:10.1126/science.1087361
Jenssen TK, Laegreid A, Komorowski J et al (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28:21–28. doi:10.1038/88213
Kerrien S, Alam-Faruque Y, Aranda B et al (2007) IntAct–open source resource for molecular interaction data. Nucleic Acids Res 35:D561–D565. doi:10.1093/nar/gkl958
Lehner B, Fraser AG (2004) A first-draft human protein-interaction map. Genome Biol 5:R63. doi:10.1186/gb-2004-5-9-r63
Lemos B, Meiklejohn CD, Hartl DL (2004) Regulatory evolution across the protein interaction network. Nat Genet 36:1059–1060. doi:10.1038/ng1427
Li S, Armstrong CM, Bertin N et al (2004) A map of the interactome network of the metazoan C. elegans. Science 303:540–543. doi:10.1126/science.1091403
Li D, Li J, Ouyang S et al (2006a) Protein interaction networks of Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster: large-scale organization and robustness. Proteomics 6:456–461. doi:10.1002/pmic.200500228
Li S, Wu L, Zhang Z (2006b) Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach. Bioinformatics 22:2143–2150. doi:10.1093/bioinformatics/btl363
Li D, Liu W, Liu Z et al (2008) PRINCESS, a protein interaction confidence evaluation system with multiple data sources. Mol Cell Proteomics 7:1043–1052. doi:10.1074/mcp.M700287-MCP200
Lu LJ, Xia Y, Paccanaro A et al (2005) Assessing the limits of genomic data integration for predicting protein networks. Genome Res 15:945–953. doi:10.1101/gr.3610305
Matthews LR, Vaglio P, Reboul J et al (2001) Identification of potential interaction networks using sequence-based searches for conserved protein–protein interactions or “interologs”. Genome Res 11:2120–2126. doi:10.1101/gr.205301
McDermott J, Guerquin M, Frazier Z et al (2005) BIOVERSE: enhancements to the framework for structural, functional and contextual modeling of proteins and proteomes. Nucleic Acids Res 33:W324–325. doi:10.1093/nar/gki401
Mika S, Rost B (2004) NLProt: extracting protein names and sequences from papers. Nucleic Acids Res 32:W634–637. doi:10.1093/nar/gkh427
Mishra GR, Suresh M, Kumaran K et al (2006) Human protein reference database—2006 update. Nucleic Acids Res 34:D411–D414. doi:10.1093/nar/gkj141
Ng SK, Zhang Z, Tan SH et al (2003) InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 31:251–254. doi:10.1093/nar/gkg079
O’Brien KP, Remm M, Sonnhammer EL (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 33:D476–D480. doi:10.1093/nar/gki107
Okuda S, Yamada T, Hamajima M et al (2008) KEGG Atlas mapping for global analysis of metabolic pathways. Nucleic Acids Res 36:W423–426. doi:10.1093/nar/gkn629
Pagel P, Kovac S, Oesterheld M et al (2005) The MIPS mammalian protein–protein interaction database. Bioinformatics 21:832–834. doi:10.1093/bioinformatics/bti115
Ramani AK, Bunescu RC, Mooney RJ, et al. (2005) Consolidating the set of known human protein–protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol 6:R40. doi:10.1186/gb-2005-6-5-r40
Ramirez F, Schlicker A, Assenov Y et al (2007) Computational analysis of human protein interaction networks. Proteomics 7:2541–2552. doi:10.1002/pmic.200600924
Rhodes DR, Tomlins SA, Varambally S et al (2005) Probabilistic model of the human protein–protein interaction network. Nat Biotechnol 23:951–959. doi:10.1038/nbt1103
Rosenthal N, Brown S (2007) The mouse ascending: perspectives for human-disease models. Nat Cell Biol 9:993–999. doi:10.1038/ncb437
Rual JF, Venkatesan K, Hao T et al (2005) Towards a proteome-scale map of the human protein–protein interaction network. Nature 437:1173–1178. doi:10.1038/nature04209
Salwinski L, Miller CS, Smith AJ et al (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 32:D449–D451. doi:10.1093/nar/gkh086
Shannon P, Markiel A, Ozier O et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504. doi:10.1101/gr.1239303
Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3:88. doi:10.1038/msb4100129
Shen J, Zhang J, Luo X et al (2007) Predicting protein–protein interactions based only on sequences information. Proc Natl Acad Sci USA 104:4337–4341. doi:10.1073/pnas.0607879104
Smith CL, Goldsmith CA, Eppig JT (2005) The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6:R7. doi:10.1186/gb-2004-6-1-r7
Sprenger J, Lynn Fink J, Karunaratne S et al (2008) LOCATE: a mammalian protein subcellular localization database. Nucleic Acids Res 36:D230–D233. doi:10.1093/nar/gkm950
SPSS I (1999) SPSS Base 10.0 User’s Guide. SPSS, Inc., Chicago
Stapley BJ, Benoit G (2000) Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. Pac Symp Biocomput 529–540
Stein A, Russell RB, Aloy P (2005) 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res 33:D413–D417. doi:10.1093/nar/gki037
Stelzl U, Worm U, Lalowski M et al (2005) A human protein–protein interaction network: a resource for annotating the proteome. Cell 122:957–968. doi:10.1016/j.cell.2005.08.029
Su AI, Wiltshire T, Batalov S et al (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101:6062–6067. doi:10.1073/pnas.0400782101
Suzuki H, Fukunishi Y, Kagawa I et al (2001) Protein–protein interaction panel using mouse full-length cDNAs. Genome Res 11:1758–1765. doi:10.1101/gr.180101
Tsaparas P, Marino-Ramirez L, Bodenreider O et al (2006) Global similarity and local divergence in human and mouse gene co-expression networks. BMC Evol Biol 6:70. doi:10.1186/1471-2148-6-70
Uetz P, Giot L, Cagney G et al (2000) A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403:623–627. doi:10.1038/35001009
van Amerongen R, Berns A (2006) Knockout mouse models to study Wnt signal transduction. Trends Genet 22:678–689. doi:10.1016/j.tig.2006.10.001
Vapnik V (2005) The nature of statistical learning theory. Springer, New York
von Mering C, Krause R, Snel B et al (2002) Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417:399–403. doi:10.1038/nature750
Waterston RH, Lindblad-Toh K, Birney E et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562. doi:10.1038/nature01262
Winkel A, Stricker S, Tylzanowski P et al (2008) Wnt-ligand-dependent interaction of TAK1 (TGF-beta-activated kinase-1) with the receptor tyrosine kinase Ror2 modulates canonical Wnt-signalling. Cell Signal 20:2134–2144. doi:10.1016/j.cellsig.2008.08.009
Witten IH, Frank E (2000) Data mining: practical machine learning techniques with Java implementations. Morgan Kaufmann, San Francisco
Wuchty S, Ipsaro JJ (2007) A draft of protein interactions in the malaria parasite P. falciparum. J Proteome Res 6:1461–1470. doi:10.1021/pr0605769
Wuchty S, Barabasi AL, Ferdig MT (2006) Stable evolutionary signal in a yeast protein interaction network. BMC Evol Biol 6:8. doi:10.1186/1471-2148-6-8
Xia K, Dong D, Han JD (2006) IntNetDB v1.0: an integrated protein–protein interaction network database generated by a probabilistic model. BMC Bioinformatics 7:508. doi:10.1186/1471-2105-7-508
Xu Q, Wang Y, Dabdoub A et al (2004) Vascular development in the retina and inner ear: control by Norrin and Frizzled-4, a high-affinity ligand-receptor pair. Cell 116:883–895. doi:10.1016/S0092-8674(04)00216-8
Xuan Z, Wang J, Zhang MQ (2003) Computational comparison of two mouse draft genomes and the human golden path. Genome Biol 4:R1. doi:10.1186/gb-2002-4-1-r1
Yellaboina S, Dudekula DB, Ko M (2008) Prediction of evolutionarily conserved interologs in Mus musculus. BMC Genomics 9:465. doi:10.1186/1471-2164-9-465
Acknowledgments
We are grateful to Dr. Han Hu for his assistance on the website construction. We also thank Prof. Yongsheng Liu, Dr. Bo Liu and Dr. Guan Song for reading the manuscript and their useful suggestions. This work was supported partially by Doctoral Fund of Ministry of Education of China (Grant No. 200806101013) and Important National Science & Technology Specific Projects of China (Grant No. 2009ZX10005-020).
Author information
Authors and Affiliations
Corresponding authors
Additional information
X. Li and H. Cai contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Li, X., Cai, H., Xu, J. et al. A mouse protein interactome through combined literature mining with multiple sources of interaction evidence. Amino Acids 38, 1237–1252 (2010). https://doi.org/10.1007/s00726-009-0335-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-009-0335-7