Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Identification of homology in protein structure classification

Abstract

Structural biology and structural genomics are expected to produce many three-dimensional protein structures in the near future. Each new structure raises questions about its function and evolution. Correct functional and evolutionary classification of a new structure is difficult for distantly related proteins and error-prone using simple statistical scores based on sequence or structure similarity. Here we present an accurate numerical method for the identification of evolutionary relationships (homology). The method is based on the principle that natural selection maintains structural and functional continuity within a diverging protein family. The problem of different rates of structural divergence between different families is solved by first using structural similarities to produce a global map of folds in protein space and then further subdividing fold neighborhoods into superfamilies based on functional similarities. In a validation test against a classification by human experts (SCOP), 77% of homologous pairs were identified with 92% reliability. The method is fully automated, allowing fast, self-consistent and complete classification of large numbers of protein structures. In particular, the discrimination between analogy and homology of close structural neighbors will lead to functional predictions while avoiding overprediction.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Partitioning protein space into homologous families.
Figure 2: Jack-knife evaluation of the prediction accuracy of the neural network.

Similar content being viewed by others

References

  1. Wood, T.C. & Pearson, W.R. J. Mol. Biol. 291, 977–995 (1999).

    Article  CAS  Google Scholar 

  2. Kawabata, T. & Nishikawa, K. Proteins 41, 108–122 (2000).

    Article  CAS  Google Scholar 

  3. Matsuo, Y. & Bryant, S.H. Proteins 35, 70–79 (1999).

    Article  CAS  Google Scholar 

  4. Holm, L. & Sander, C. ISMB 5, 140–146 (1997).

    CAS  PubMed  Google Scholar 

  5. Russell, R.B., Saqi, M.A., Sayle, R.A., Bates, P.A. & Sternberg, M.J. J. Mol. Biol. 269, 423–439 (1997).

    Article  CAS  Google Scholar 

  6. Smith, J.M. Nature 225, 563–564 (1970).

    Article  CAS  Google Scholar 

  7. Holm, L. & Sander, C. Science 273, 595–602 (1996).

    Article  CAS  Google Scholar 

  8. Holm, L. & Sander, C. Proteins 333, 88–96 (1998).

    Article  Google Scholar 

  9. Dietmann, S. et al. Nucleic Acids Res. 29, 55–57 (2001).

    Article  CAS  Google Scholar 

  10. LoConte, L. et al. Nucleic Acids Res. 28, 257–259 (2000).

    Article  CAS  Google Scholar 

  11. Przytycka, T., Aurora, R. & Rose, G.D. Nature Struct. Biol. 6, 672–682 (1999).

    Article  CAS  Google Scholar 

  12. Fritsche, E., Paschos, A., Beisel, H.-G., Bock, A. & Huber, R. J. Mol. Biol. 288, 989–998 (1999).

    Article  CAS  Google Scholar 

  13. Christendat, D. et al. Nature Struct. Biol. 7, 903–909 (2000).

    Article  CAS  Google Scholar 

  14. Mosbah, A. et al. J. Mol. Biol. 304, 201–217 (2000).

    Article  CAS  Google Scholar 

  15. Bishop, C.M. Neural networks for pattern recognition (Oxford University Press, Oxford;1995).

    Google Scholar 

  16. Baldi, P. & Brunak S. Bioinformatics: the machine learning approach (MIT Press, London;1998).

    Google Scholar 

  17. Theodoridis, S. & Koutroumbas, K. Pattern recognition (Academic Press, San Diego; 1999).

    Google Scholar 

  18. Fahlmann, S.E. & Lebiere, C. In Advances in neural information processing systems (ed. Touretzky, D.) 524–532 (Morgan Kaufmann, San Mateo;1990).

    Google Scholar 

  19. Chiu, H.-J., Johnson, E., Schroder, I. & Rees, D.C. Structure 9, 311–319 (2001).

    Article  CAS  Google Scholar 

  20. Bousset, L., Belrhali, H., Janin, J., Melki, R. & Morera, S. Structure 9, 39–46 (2001).

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dietmann, S., Holm, L. Identification of homology in protein structure classification. Nat Struct Mol Biol 8, 953–957 (2001). https://doi.org/10.1038/nsb1101-953

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nsb1101-953

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing