Skip to main content
Log in

Comparative study of network-based prioritization of protein domains associated with human complex diseases

  • Research Article
  • Published:
Frontiers of Electrical and Electronic Engineering in China

Abstract

Domains are basic structural and functional unit of proteins, and, thus, exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Based on the assumption that deleterious nonsynonymous single nucleotide polymorphisms (nsSNPs) underlying human complex diseases may actually change structures of protein domains, affect functions of corresponding proteins, and finally result in these diseases, we compile a dataset that contains 1174 associations between 433 protein domains and 848 human disease phenotypes. With this dataset, we compare two approaches (guilt-by-association and correlation coefficient) that use a domain-domain interaction network and a phenotype similarity network to prioritize associations between candidate domains and human disease phenotypes. We implement these methods with three distance measures (direct neighbor, shortest path with Gaussian kernel, and diffusion kernel), demonstrate the effectiveness of these methods using three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and whole-genome scan), and evaluate the performance of these methods in terms of three criteria (mean rank ratio, precision, and AUC score). Results show that both methods can effectively prioritize domains that are associated with human diseases at the top of the candidate list, while the correlation coefficient approach can achieve slightly higher performance in most cases. Finally, taking the advantage that the correlation coefficient method does not require known disease-domain associations, we calculate a genome-wide landscape of associations between 4036 protein domains and 5080 human disease phenotypes using this method and offer a freely accessible web interface for this landscape.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Glazier A M, Nadeau J H, Aitman T J. Finding genes that underlie complex traits. Science, 2002, 298(5602): 2345–2349

    Article  Google Scholar 

  2. Bird T D. Genetic factors in Alzheimer’s disease. The New England Journal of Medicine, 2005, 352(9): 862–864

    Article  Google Scholar 

  3. Lander E S, Schork N J. Genetic dissection of complex traits. Science, 1994, 265(5181): 2037–2048

    Article  Google Scholar 

  4. Wu X, Jiang R, Zhang M Q, Li S. Network-based global inference of human disease genes. Molecular Systems Biology, 2008, 4: 189

    Article  Google Scholar 

  5. Goh K, Cusick M E, Valle D, Childs B, Vidal M, Barabási A L. The human disease network. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104(21): 8685–8690

    Article  Google Scholar 

  6. Domazet-Loso T, Tautz D. An ancient evolutionary origin of genes associated with human genetic diseases. Molecular Biology and Evolution, 2008, 25(12): 2699–2707

    Article  Google Scholar 

  7. Gohlke J M, Thomas R, Zhang Y, Rosenstein M C, Davis A P, Murphy C, Becker K G, Mattingly C J, Portier C J. Genetic and environmental pathways to complex diseases. BMC Systems Biology, 2009, 3: 46

    Article  Google Scholar 

  8. Yu W, Clyne M, Khoury M J, Gwinn M. Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations. Bioinformatics, 2010, 26(1): 145–146

    Article  Google Scholar 

  9. Ortutay C, Vihinen M. Identification of candidate disease genes by integrating gene ontologies and protein-interaction networks: case study of primary immunodeficiencies. Nucleic Acids Research, 2009, 37(2): 622–628

    Article  Google Scholar 

  10. Wu X, Liu Q, Jiang R. Align human interactome with phenome to identify causative genes and networks underlying disease families. Bioinformatics, 2009, 25(1): 98–104

    Article  Google Scholar 

  11. Ozgür A, Vu T, Erkan G, Radev D R. Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics, 2008, 24(13): i277–i285

    Article  Google Scholar 

  12. Ideker T, Sharan R. Protein networks in disease. Genome Research, 2008, 18(4): 644–652

    Article  Google Scholar 

  13. Feldman I, Rzhetsky A, Vitkup D. Network properties of genes harboring inherited disease mutations. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(11): 4323–4328

    Article  Google Scholar 

  14. Kann M G. Protein interactions and disease: computational approaches to uncover the etiology of diseases. Briefings in Bioinformatics, 2007, 8(5): 333–346

    Article  Google Scholar 

  15. Björkholm P, Sonnhammer E L. Comparative analysis and unification of domain-domain interaction networks. Bioinformatics, 2009, 25(22): 3020–3025

    Article  Google Scholar 

  16. Adie E A, Adams R R, Evans K L, Porteous D J, Pickard B S. Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics, 2005, 6: 55

    Article  Google Scholar 

  17. Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent L C, De Moor B, Marynen P, Hassan B, Carmeliet P, Moreau Y. Gene prioritization through genomic data fusion. Nature Biotechnology, 2006, 24(5): 537–544

    Article  Google Scholar 

  18. Chen J, Bardes E E, Aronow B J, Jegga A G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Research, 2009, 37(Web Server issue): W305–W311

    Article  Google Scholar 

  19. Köhler S, Bauer S, Horn D, Robinson P N. Walking the interactome for prioritization of candidate disease genes. The American Journal of Human Genetics, 2008, 82(4): 949–958

    Article  Google Scholar 

  20. Sun J, Jia P, Fanous A H, Webb B T, Van Den Oord E J, Chen X, Bukszar J, Kendler K S, Zhao Z. A multi-dimensional evidence-based candidate gene prioritization approach for complex diseasesschizophrenia as a case. Bioinformatics, 2009, 25(19): 2595–2602

    Article  Google Scholar 

  21. Tranchevent L C, Barriot R, Yu S, Van Vooren S, Van Loo P, Coessens B, De Moor B D, Aerts S, Moreau Y. ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Research, 2008, 36(Web Server issue): W377–W384

    Article  Google Scholar 

  22. Raghavachari B, Tasneem A, Przytycka T M, Jothi R. DOMINE: a database of protein domain interactions. Nucleic Acids Research, 2008, 36(Database issue): D656–D661

    Google Scholar 

  23. Ng S K, Zhang Z, Tan S H, Lin K. InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Research, 2003, 31(1): 251–254

    Article  Google Scholar 

  24. Ng S K, Zhang Z, Tan S H, Radev D R. Integrative approach for computationally inferring protein domain interactions. Bioinformatics, 2003, 19(8): 923–929

    Article  Google Scholar 

  25. Finn R D, Marshall M, Bateman A. iPfam: visualization of proteinprotein interactions in PDB at domain and amino acid resolutions. Bioinformatics, 2005, 21(3): 410–412

    Article  Google Scholar 

  26. Van Driel M A, Bruggeman J, Vriend G, Brunner H G, Leunissen J A. A text-mining analysis of the human phenome. European Journal of Human Genetics, 2006, 14(5): 535–542

    Article  Google Scholar 

  27. Altshuler D, Daly M, Kruglyak L. Guilt by association. Nature Genetics, 2000, 26(2): 135–137

    Article  Google Scholar 

  28. Wang W, Zhang W, Jiang R, Luan Y. An approach to the discovery of associations of protein domains and complex diseases. In: Proceedings of the Seventh Asia Pacific Bioinformatics Conference. 2009, 908

  29. Wang W. Statistical modeling for analysis of biological high-throughput data and its application. Dissertation for the Doctoral Degree. Jinan: Shandong University. 2009, 51–62

    Google Scholar 

  30. Jain E, Bairoch A, Duvaud S, Phan I, Redaschi N, Suzek B E, Martin M J, McGarvey P, Gasteiger E. Infrastructure for the life sciences: design and implementation of the UniProt website. BMC Bioinformatics, 2009, 10: 136

    Article  Google Scholar 

  31. Finn R D, Tate J, Mistry J, Coggill P C, Sammut S J, Hotz H R, Ceric G, Forslund K, Eddy S R, Sonnhammer E L, Bateman A. The Pfam protein families database. Nucleic Acids Research, 2008, 36(Database issue): D281–D288

    Google Scholar 

  32. Stein A, Panjkovich A, Aloy P. 3did Update: domain-domain and peptide-mediated interactions of known 3D structure. Nucleic Acids Research, 2009, 37(Database issue): D300–D304

    Article  Google Scholar 

  33. Stein A, Russell R B, Aloy P. 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Research, 2005, 33(Database issue): D413–D417

    Google Scholar 

  34. Lee H, Deng M, Sun F, Chen T. An integrated approach to the prediction of domain-domain interactions. BMC Bioinformatics, 2006, 7: 269

    Article  Google Scholar 

  35. Brunner H G, Van Driel M A. From syndrome families to functional genomics. Nature Reviews Genetics, 2004, 5(7): 545–551

    Article  Google Scholar 

  36. Rhead B, Karolchik D, Kuhn R M, Hinrichs A S, Zweig A S, Fujita P A, Diekhans M, Smith K E, Rosenbloom K R, Raney B J, Pohl A, Pheasant M, Meyer L R, Learned K, Hsu F, Hillman-Jackson J, Harte R A, Giardine B, Dreszer T R, Clawson H, Barber G P, Haussler D, Kent W J. The UCSC genome browser database: update 2010. Nucleic Acids Research, 2010, 38(Database issue): D613–D619

    Article  Google Scholar 

  37. Robinson P N, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S. The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. The American Journal of Human Genetics, 2008, 83(5): 610–615

    Article  Google Scholar 

  38. Lussier Y A, Liu Y. Computational approaches to phenotyping: high-throughput phenomics. Proceedings of the American Thoracic Society, 2007, 4(1): 18–25

    Article  Google Scholar 

  39. Oti M, Huynen M A, Brunner H G. The biological coherence of human phenome databases. The American Journal of Human Genetics, 2009, 85(6): 801–808

    Article  Google Scholar 

  40. Rasmussen C E, Williams C K I. Gaussian Processes for Machine Learning. Cambridge: MIT Press, 2006

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Jiang.

About this article

Cite this article

Zhang, W., Chen, Y. & Jiang, R. Comparative study of network-based prioritization of protein domains associated with human complex diseases. Front. Electr. Electron. Eng. China 5, 107–118 (2010). https://doi.org/10.1007/s11460-010-0018-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11460-010-0018-x

Keywords

Navigation