Skip to main content
Log in

Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins

  • Full Length Paper
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

In this paper, AdaBoost algorithm, a popular and effective prediction method, is applied to predict the subcellular locations of Prokaryotic and Eukaryotic Proteins—a dataset derived from SWISSPROT 33.0. Its prediction ability was evaluated by re-substitution test, Leave-One-Out Cross validation (LOOCV) and jackknife test. By comparing its results with some most popular predictors such as Discriminant Function, neural networks, and SVM, we demonstrated that the AdaBoost predictor outperformed these predictors. As a result, we arrive at the conclusion that AdaBoost algorithm could be employed as a robust method to predict subcellular location. An online web server for predicting subcellular location of prokaryotic and eukaryotic proteins is available at http://chemdata.shu.edu.cn/subcell/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Eisenhaber F, Bork PW (1998) Subcellular localization of proteins based on sequence. Trends Cell Biol 8: 169–170

    Article  PubMed  CAS  Google Scholar 

  2. Nakai K (2000) Protein sorting signals and prediction of subcellular localization. Adv Protein Chem 54: 277–344

    Article  PubMed  CAS  Google Scholar 

  3. Nakai K, Kanehisa M (1991) Expert system for predicting protein localization sites in gram negative bacteria. Proteins Struct Funct Genet 1: 95–110

    Google Scholar 

  4. Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14: 897–911

    Article  PubMed  CAS  Google Scholar 

  5. Von Heijne G, Nielsen H, Engelbrecht J, Brunak S (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10: 1–6

    Article  PubMed  Google Scholar 

  6. Nakashima H, Nishikawa K (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue pair frequencies. J Mol Biol 238: 54–61

    Article  PubMed  CAS  Google Scholar 

  7. Cedano J, Aloy P, Pérez-Pons JA (1997) Relation between am ion acid composition and cellular location of proteins. J Mol Biol 266: 594–600

    Article  PubMed  CAS  Google Scholar 

  8. Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 9: 2230–2236

    Article  Google Scholar 

  9. Cai YD, Chou KC (2000) Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Cell Biol Res Commun 4: 172–173

    Article  PubMed  CAS  Google Scholar 

  10. Cai YD, Chou KC (2003) Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Commun 2: 407–411

    Article  CAS  Google Scholar 

  11. Cai YD, Chou KC (2004) Predicting subcellular localization of proteins in a hybridization space. Bioinformatics 7: 1151–1156

    Article  Google Scholar 

  12. Cai YD, Chou KC (2004) Predicting 22 protein localizations in budding yeast. Biochem Biophys Res Communi 2: 425–428

    Article  CAS  Google Scholar 

  13. Chou KC, Elrod DW (1998) Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem Biophys Res Commun 252: 63–68

    Article  PubMed  CAS  Google Scholar 

  14. Chou KC, Elord DW (1999) Prediction of membrane protein types and subcellular locations. Proteins Struct Funct Genet 34: 137–153

    Article  PubMed  CAS  Google Scholar 

  15. Chou KC, Elrod D (1999) Protein subcellular location prediction. Protein Eng 2: 107–118

    Article  Google Scholar 

  16. Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Mol Biol 48: 45765–45769

    Google Scholar 

  17. Yuan Z (1999) Prediction of protein subcellular locations using Markov chain models. FEBS Lett 1: 23–26

    Article  Google Scholar 

  18. Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet C, Ares JM, Haussler D, Chou KC (1995) A novel approach to predict protein structural classes in a (20-1)-D amino acid composition space. Proteins Struct Funct Genet 21: 319–344

    Article  Google Scholar 

  19. Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5): 1651–1686

    Article  Google Scholar 

  20. Freund Y, Schapire RE (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 1: 119–139

    Article  Google Scholar 

  21. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Machine Learn 37: 297–336

    Article  Google Scholar 

  22. Romero E (2004) Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost. Neurocomputing 57: 313–344

    Article  Google Scholar 

  23. Schapire RE (2002) The boosting approach to machine learning. An Overview MSRI Workshop on Nonlinear Estimation and Classification.

  24. Duffy N, Helmbold D (2002) A geometric approach to leveraging weak learners. Theor Comput Sci 284: 67–108

    Article  Google Scholar 

  25. Ding CHQ, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17: 349–358

    Article  PubMed  CAS  Google Scholar 

  26. Breiman L (2001) Random Forests. Machine Learn 15–32

  27. Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco

    Google Scholar 

  28. Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London

    Google Scholar 

  29. Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York

    Google Scholar 

  30. Chen NY, Lu WC, Li GZ, Yang J (2004) Support vector machine in chemistry. World Scientific Publishing Company, Singapore

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wen-Cong Lu or Yu-Dong Cai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niu, B., Jin, YH., Feng, KY. et al. Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Divers 12, 41–45 (2008). https://doi.org/10.1007/s11030-008-9073-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-008-9073-0

Keywords

Navigation