Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

Cheng, Jianlin; Sweredoski, Michael J.; Baldi, Pierre

doi:10.1007/s10618-005-0001-y

Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

Published: 14 July 2005

Volume 11, pages 213–222, (2005)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Jianlin Cheng¹,
Michael J. Sweredoski¹ &
Pierre Baldi¹

1074 Accesses
161 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

Intrinsically disordered regions in proteins are relatively frequent and important for our understanding of molecular recognition and assembly, and protein structure and function. From an algorithmic standpoint, flagging large disordered regions is also important for ab initio protein structure prediction methods. Here we first extract a curated, non-redundant, data set of protein disordered regions from the Protein Data Bank and compute relevant statistics on the length and location of these regions. We then develop an ab initio predictor of disordered regions called DISpro which uses evolutionary information in the form of profiles, predicted secondary structure and relative solvent accessibility, and ensembles of 1D-recursive neural networks. DISpro is trained and cross validated using the curated data set. The experimental results show that DISpro achieves an accuracy of 92.8% with a false positive rate of 5%. DISpro is a member of the SCRATCH suite of protein data mining tools available through http://www.igb.uci.edu/servers/psss.html.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational Methods to Predict Intrinsically Disordered Regions and Functional Regions in Them

Intrinsic Disorder and Semi-disorder Prediction by SPINE-D

Predicting Protein Conformational Disorder and Disordered Binding Sites

References

Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. 1997. Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Res., 25(17):3389–3402.
Article PubMed Google Scholar
Baldi, P. and Pollastri, G. 2003. The principled design of large-scale recursive neural network architectures–DAG-RNNs and the protein structure prediction problem. Journal of Machine Learning Research, 4:575–602.
Article Google Scholar
Bengio, Y. and Frasconi, P. 1996. Input-output HMM's for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231–1249.
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P. 2000. The protein data bank. Nucleic Acids Research, 28:235–242.
Article PubMed Google Scholar
Dunker, A.K., Brown, C.J., Lawson, J.D., Iakoucheva, L.M. and Obradovic, Z. 2002. Intrinsic disorder and protein function. Biochemistry, 41(21):6573–6582.
Article PubMed Google Scholar
Frasconi, P., Passerini, A., and Vullo, A. 2002. A two-stage svm architecture for predicting the disulfide bonding state of cysteines. In Proc. IEEE Workshop on Neural Networks for Signal Processing, pp. 25–34.
Jones, D.T. 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292:195–202.
Article PubMed Google Scholar
Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577–2637.
Article PubMed Google Scholar
Li, X., Romero, P., Rani, M., Dunker, A., and Obradovic, Z. 1999. Predicting protein disorder for n-, c-, and internal regions. Genome Inform., 42:38–48.
Google Scholar
Linding, R., Jensen, L.J., Diella, F., Bork, P., Gibson, T.J., and Russell, R.B. 2003. Protein disorder prediction: Implications for structural proteomics. Structure, 11(11):1453–1459.
Article PubMed Google Scholar
Mika, S. and Rost, B. 2003. Uniqueprot: Creating representative protein-sequence sets. Nucleic Acids Res., 31(13):3789–3791.
Article PubMed Google Scholar
Pollastri, G., Baldi, P., Fariselli, P. and Casadio, R. 2001a. Prediction of coordination number and relative solvent accessibility in proteins. Proteins, 47:142–153.
Article Google Scholar
Pollastri, G., Przybylski, D., Rost, B., and Baldi, P. 2001b. Improving the prediction of protein secondary strucure in three and eight classes using recurrent neural networks and profiles. Proteins, 47:228–235.
Article Google Scholar
Przybylski, D. and Rost, B. 2002. Alignments grow, secondary structure prediction improves. Proteins, 46:195–205.
Article Google Scholar
Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F., and Jones, D.T. 2004. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of Molecular Biology, 337(3):635–645.
Article PubMed Google Scholar
Wootton, J. 1994. Non-globular domains in protein sequences: Automated segmentation using complexity measures. Computational Chemistry, 18:269–285.
Article MATH Google Scholar
Wright, P.E. and Dyson, H.J. 1999. Intrinsically unstructured proteins: Re-assessing the protein structure-function paradigm. Journal of Molecular Biology, 293(2):321–331.
Article PubMed Google Scholar

Download references

Acknowledgements

The authors wish to thank anonymous reviewers for helpful comments. Work supported by the Institute for Genomics and Bioinformatics at UCI and a Laurel Wilkening Faculty Innovation award, an NIH Biomedical Informatics Training grant (LM-07443-01), an NSF MRI grant (EIA-0321390), a Sun Microsystems award, a grant from the University of California Systemwide Biotechnology Research and Education Program (UC BREP) to PB.

Author information

Authors and Affiliations

School of Information and Computer Science, Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, CA, 92697, USA
Jianlin Cheng, Michael J. Sweredoski & Pierre Baldi

Authors

Jianlin Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Sweredoski
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Baldi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Baldi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, J., Sweredoski, M.J. & Baldi, P. Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data. Data Min Knowl Disc 11, 213–222 (2005). https://doi.org/10.1007/s10618-005-0001-y

Download citation

Published: 14 July 2005
Issue Date: November 2005
DOI: https://doi.org/10.1007/s10618-005-0001-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

Abstract

Access this article

Similar content being viewed by others

Computational Methods to Predict Intrinsically Disordered Regions and Functional Regions in Them

Intrinsic Disorder and Semi-disorder Prediction by SPINE-D

Predicting Protein Conformational Disorder and Disordered Binding Sites

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

Abstract

Access this article

Similar content being viewed by others

Computational Methods to Predict Intrinsically Disordered Regions and Functional Regions in Them

Intrinsic Disorder and Semi-disorder Prediction by SPINE-D

Predicting Protein Conformational Disorder and Disordered Binding Sites

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation