Function and structure of inherently disordered proteins
Introduction
Many entire proteins and localized protein regions fail to fold into a 3D structure, yet carry out function. Rather than a linear sequence-to-structure-to-function paradigm, such proteins have been described by a trinity in which function arises from different forms (structured globules, collapsed disordered ensembles, and extended disordered ensembles) and from transitions between different forms such as a disorder-to-structure transition upon binding [1]. The collapsed disordered ensembles were originally thought to be exclusively native molten globules (MGs), with collapse driven by hydrophobic interactions. Recent studies, however, agree with earlier work suggesting that water is a poor solvent for the peptide backbone; thus, polar but uncharged model sequences form compact random coils [2••, 3••], while extended random coils result when polypeptide chains contain significant net charge (Rohit Pappu, unpublished). Water being a poor solvent for polypeptides was also invoked to explain the occurrence of a fourth protein form, the pre-MG, that occurs as an intermediate between the MG and the random coil during protein unfolding [4]. Much more work is needed to understand and relate the various nonstructured protein ensembles and to determine whether the relationship between structure and function should be assembled into a trinity, a quartet, or an even more complicated arrangement.
Here we provide an overview of these proteins, including their structures, functions, and regulations. In all these aspects, the set of non-folding proteins and regions is found to differ greatly from the set of proteins that fold into globular 3D structures.
Section snippets
Prediction of non-folding proteins and regions
Since the amino acid sequence contains the information for protein folding, it was reasoned that, for proteins that do not fold into 3D structures, the amino acid sequence should also specify protein non-folding. To test this hypothesis, predictors were developed to identify sequences that fail to fold [5, 6]. The fact that predictor accuracy was significantly better than expected by chance suggested that the information for failure to fold into a 3D structure is, indeed, likely to be inherent
Frequency of disordered regions
Disorder predictions have been carried out for many whole proteomes. They indicate that the fraction of proteins with substantial amounts of disorder goes as eukaryotes ≫ archaea ∼ eubacteria, with multicellular eukaryotes having much more predicted disorder than mono-cellular eukaryotes [11]. These results were confirmed and substantially extended to include functional classification using an improved predictor of disorder [12]. Integrating the results from these and other sources gives some rules
Protein evolution
Non-folding proteins and regions might be expected to change more rapidly during evolution than structured proteins because buried amino acids are highly constrained while disordered regions are not constrained by structure. For example, plots of sequence variability (measured by sequence entropy over alignments) were found to exhibit nearly linear dependence on the inverse of the packing density, until a low packing density was reached at which point sequence variability remained roughly
Partitioning unstructured proteins and regions into groups
Grouping proteins according to structure and function has proven very useful for studying structured proteins. Associating a new protein with an existing structure–function group (by sequence and/or structure alignment) provides important basic information and quickly identifies critical experiments for further characterization. Given the broad array of disordered protein types, their lack of 3D structure, and their sequence variability, it has so far proven difficult to cluster various
Do inherently unstructured proteins retain any preference for certain structures or are they totally unstructured?
One of the key open questions regarding inherently unstructured proteins is whether, in solution, they retain some preferred structure(s), or are just a plethora of many different conformations, rather like ‘cooked spaghetti’. A recent careful study of residual structure in disordered peptides and unfolded proteins was carried out via multivariate analysis and ab initio simulation of Raman optical activity [21•]. This study showed striking differences between the structural characteristics of
Do non-folding proteins have a shorter half-life than other proteins?
Targeted turnover of proteins is a key element in the regulation of many cellular processes. The underlying physicochemical and/or sequential signals are not, however, fully understood. This is particularly pertinent in light of recent recognition that intrinsically unstructured/disordered proteins, common in eukaryotic cells, are extremely susceptible to proteolytic degradation in vitro. An in vivo high-throughput study of the half-lives of all yeast gene products [26] indicated that, in
Functionality of inherently disordered proteins and regions
Non-folding proteins and regions carry out pivotal biological functions, participating in various signaling and regulatory pathways, via specific protein–protein, protein–nucleic acid, and protein–ligand interactions [29, 30, 31, 32]. Enzymatically controlled sites of post-translational modification (PTM) such as acetylation, hydroxylation, ubiquitination, methylation, and phosphorylation, as well as sites of proteolytic attack, are frequently associated with regions of intrinsic disorder [29].
Involvement of inherently disordered proteins in diseases
The fact that many proteins are either wholly intrinsically disordered, or contain large stretches of intrinsically disordered sequences, has been followed by a growing realization that nonstructured proteins are associated with a broad range of human diseases, which led to the introduction of the D2 (disorder in disorders) concept [52]. Diseases involving protein disorder come in a variety of flavors, but we here restrict ourselves to discussing recent work concerning the amyloid diseases, in
Conclusions
The sequence-to-structure-to-function paradigm for proteins was developed from the study of enzymes. Bioinformatics studies indicate that this paradigm applies to enzymes, as well as to transport proteins.
In contrast, proteins and regions of proteins involved in signaling, control, and regulation often use inherently unstructured sequences as the basis for function. There are many structured signaling domains, but these often bind to unstructured protein partners. Moreover, there are numerous
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
This work was supported in part by the grants R01 LM007688-01A1 (to AKD and VNU) and GM071714-01A2 (to AKD and VNU) from the National Institutes of Health and from the Program of the Russian Academy of Sciences ‘Molecular and Cellular Biology’ (to VNU), by the Divadol Foundation, the Benoziyo Center for Neuroscience, the Kimmelman Center, Autism Speaks, the Israel Science Foundation, the Nalvyco Foundation, the Neuman Foundation, a research grant from Mr. Erwin Pearl, the European Commission
References (71)
- et al.
Prediction of protein disorder
Methods Mol Biol
(2008) - et al.
Evolutionary rate heterogeneity in proteins with long disordered regions
J Mol Evol
(2002) - et al.
The intracellular domain of the Drosophila cholinesterase-like neural adhesion protein, gliotactin, is natively unfolded
Proteins
(2003) - et al.
Operational definition of intrinsically unstructured protein sequences based on susceptibility to the 20S proteasome
Proteins
(2008) - et al.
What properties characterize the hub proteins of the protein–protein interaction network of Saccharomyces cerevisiae?
Genome Biol
(2006) - et al.
SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins
PLoS ONE
(2007) - et al.
Biophysical properties of the synucleins and their propensities to fibrillate: inhibition of alpha-synuclein assembly by beta- and gamma-synucleins
J Biol Chem
(2002) - et al.
Helical alpha-synuclein forms highly conductive ion channels
Biochemistry
(2007) - et al.
The protein trinity — linking function and disorder
Nat Biotechnol
(2001) - et al.
Fluorescence correlation spectroscopy shows that monomeric polyglutamine molecules form collapsed structures in aqueous solutions
Proc Natl Acad Sci U S A
(2006)