Abstract
Structural biology and structural genomics are expected to produce many three-dimensional protein structures in the near future. Each new structure raises questions about its function and evolution. Correct functional and evolutionary classification of a new structure is difficult for distantly related proteins and error-prone using simple statistical scores based on sequence or structure similarity. Here we present an accurate numerical method for the identification of evolutionary relationships (homology). The method is based on the principle that natural selection maintains structural and functional continuity within a diverging protein family. The problem of different rates of structural divergence between different families is solved by first using structural similarities to produce a global map of folds in protein space and then further subdividing fold neighborhoods into superfamilies based on functional similarities. In a validation test against a classification by human experts (SCOP), 77% of homologous pairs were identified with 92% reliability. The method is fully automated, allowing fast, self-consistent and complete classification of large numbers of protein structures. In particular, the discrimination between analogy and homology of close structural neighbors will lead to functional predictions while avoiding overprediction.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Wood, T.C. & Pearson, W.R. J. Mol. Biol. 291, 977–995 (1999).
Kawabata, T. & Nishikawa, K. Proteins 41, 108–122 (2000).
Matsuo, Y. & Bryant, S.H. Proteins 35, 70–79 (1999).
Holm, L. & Sander, C. ISMB 5, 140–146 (1997).
Russell, R.B., Saqi, M.A., Sayle, R.A., Bates, P.A. & Sternberg, M.J. J. Mol. Biol. 269, 423–439 (1997).
Smith, J.M. Nature 225, 563–564 (1970).
Holm, L. & Sander, C. Science 273, 595–602 (1996).
Holm, L. & Sander, C. Proteins 333, 88–96 (1998).
Dietmann, S. et al. Nucleic Acids Res. 29, 55–57 (2001).
LoConte, L. et al. Nucleic Acids Res. 28, 257–259 (2000).
Przytycka, T., Aurora, R. & Rose, G.D. Nature Struct. Biol. 6, 672–682 (1999).
Fritsche, E., Paschos, A., Beisel, H.-G., Bock, A. & Huber, R. J. Mol. Biol. 288, 989–998 (1999).
Christendat, D. et al. Nature Struct. Biol. 7, 903–909 (2000).
Mosbah, A. et al. J. Mol. Biol. 304, 201–217 (2000).
Bishop, C.M. Neural networks for pattern recognition (Oxford University Press, Oxford;1995).
Baldi, P. & Brunak S. Bioinformatics: the machine learning approach (MIT Press, London;1998).
Theodoridis, S. & Koutroumbas, K. Pattern recognition (Academic Press, San Diego; 1999).
Fahlmann, S.E. & Lebiere, C. In Advances in neural information processing systems (ed. Touretzky, D.) 524–532 (Morgan Kaufmann, San Mateo;1990).
Chiu, H.-J., Johnson, E., Schroder, I. & Rees, D.C. Structure 9, 311–319 (2001).
Bousset, L., Belrhali, H., Janin, J., Melki, R. & Morera, S. Structure 9, 39–46 (2001).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dietmann, S., Holm, L. Identification of homology in protein structure classification. Nat Struct Mol Biol 8, 953–957 (2001). https://doi.org/10.1038/nsb1101-953
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/nsb1101-953
This article is cited by
-
New insight in the structural features of haloadaptation in α-amylases from halophilic Archaea following homology modeling strategy: folded and stable conformation maintained through low hydrophobicity and highly negative charged surface
Journal of Computer-Aided Molecular Design (2014)
-
Systematic search for putative new domain families in Mycoplasma gallisepticum genome
BMC Research Notes (2010)
-
The CATH database
Human Genomics (2010)
-
ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets
BMC Bioinformatics (2009)
-
PURE: A webserver for the prediction of domains in unassigned regions in proteins
BMC Bioinformatics (2008)