Constructing ensembles for intrinsically disordered proteins

https://doi.org/10.1016/j.sbi.2011.04.001Get rights and content

The relatively flat energy landscapes associated with intrinsically disordered proteins makes modeling these systems especially problematic. A comprehensive model for these proteins requires one to build an ensemble consisting of a finite collection of structures, and their corresponding relative stabilities, which adequately capture the range of accessible states of the protein. In this regard, methods that use computational techniques to interpret experimental data in terms of such ensembles are an essential part of the modeling process. In this review, we critically assess the advantages and limitations of current techniques and discuss new methods for the validation of these ensembles.

Highlights

► We critically discuss methods for modeling intrinsically disordered proteins. ► Both the advantages and limitations of existing methods are analyzed. ► We further outline the major challenges to modeling these proteins and discuss new methods for the validation of these approaches.

Introduction

Thermal fluctuations cause proteins to sample a variety of conformations during their biological lifetime, where the probability of each conformation is determined by the topography of the underlying energy landscape. Folded proteins exhibit energy landscapes that have a well-defined global energy minimum (Figure 1a). By contrast, intrinsically disordered proteins (IDPs) correspond to a class of polypeptides with relatively flat energy landscapes (Figure 1b) and consequently, these proteins sample a relatively large and diverse set of conformations at room temperature [1, 2]. A great deal of interest in understanding the structure of IDPs has emerged because of their proposed role in neurodegenerative disorders such as Parkinson's and Alzheimer's diseases [3, 4, 5, 6, 7, 8, 9, 10, 11]. Therefore, a detailed characterization of these systems could pave the way to the development of new therapeutics through structure based drug design [12, 13].

The earliest attempts at modeling disordered protein states were aimed at describing folded proteins under denaturing conditions [14, 15, 16, 17]. Denatured proteins and IDPs share the characteristic that experimental observables correspond to averages over a diverse ensemble of conformations. Therefore, the typical approach to constructing an ensemble for both folded proteins under denaturing conditions and IDPs is to generate a set of conformations that have ensemble averages that agree with experimental values. When formulated in this way, the approach is straightforward; that is, generate a diverse set of conformations and then find a subset of structures and their relative stabilities (or weights) that agree with experiment. In other cases, the ensemble is constructed using purely theoretical methods and the predicted data are compared to experiment [18, 19]. While important insights have been obtained using this latter approach, using experimental data to guide the construction of the ensemble helps to limit the space of possible solutions.

In practice, constructing an ensemble from experimental data is quite a challenging task because the amount of data that are typically available pales in comparison to the number of parameters needed to uniquely define the ensemble. In other words, there are typically many different ensembles that agree with any given set of experimental data. Hence the optimization problem described above leads to degenerate solutions. In light of this, how does one reliably infer a set of conformations and weights that capture the essential features of the energy landscape, from the available experimental data? In this article, we review recent advances in this area and provide discussion regarding the advantages and limitations of various techniques.

Section snippets

Sources of experimental data

To date, most of the experimental measurements that have been used to guide the construction of unfolded ensembles correspond to observables obtained via NMR spectroscopy. Examples of such measurements include chemical shifts, which provide information about local conformational preferences [20, 21, 22••], scalar couplings, which report on backbone dihedral angles [23], residual dipolar couplings (RDCs), which report on the angle of a bond relative to an external frame of reference [8, 22••,

Validation of ensemble building methods

Before discussing specific algorithms used for constructing ensembles it is useful to introduce a technique, which we will refer to as the reference ensemble method, which has become a standard tool for evaluating the performance of these methods [22••, 45•, 46, 47]. The reference ensemble method is illustrated in Figure 2. A reference ensemble is a predefined ‘truth,’ that is, a prespecified set of conformations and their statistical weights that can be used to calculate synthetic experimental

Ensemble-restrained MD simulations

Restrained MD simulations introduce a term into the potential function that biases the simulation towards regions of conformational space that agree with experimental observations. For an IDP, the restraints should be applied to an entire ensemble rather than an individual structure [45]. This is accomplished by simulating multiple replicas of the protein in parallel and calculating the biasing potential based on averages taken over all of the replicas [45•, 48]. Ganguly and Chen [29••] used

Ensemble construction using a predefined conformational library

Another method for constructing ensembles for IDPs is to first generate a library of conformations and then to select a subset of conformations from this library such that averages calculated from this subset agree with the experimental data. The initial conformational library may be generated with MD, perhaps using techniques to enhance conformational sampling (see [50] for a review) like replica exchange [51], accelerated MD [46] or quenched MD [21], by piecing together small peptide

Degeneracy and model construction

Degeneracy of the ensembles with respect to the experimental measurements is one problem that plagues the construction of IDP ensembles. At its core, the problem of degeneracy arises because in practice the number of experimental constraints is small relative to the number of degrees of freedom that are needed to uniquely specify the ensemble. Fisher et al. [22••] used the reference ensemble method to show that one can often find many sets of statistical weights (for a prespecified set of

Conclusions and future directions

Any comprehensive description of an IDP necessitates the construction of an ensemble  a finite collection of conformations and weights  that capture the essence of the conformational distribution of the protein. A variety of different approaches have been developed for constructing ensembles for IDPs, each of which has its own advantages and limitations. In the past few years, a number of advances have been made in our ability to model the conformational ensembles of IDPs. Many of these advances,

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

References (61)

  • K.J. Kohlhoff et al.

    Fast and accurate predictions of protein NMR chemical shifts from interatomic distances

    J Am Chem Soc

    (2009)
  • L. Salmon et al.

    NMR characterization of long-range order in intrinsically disordered proteins

    J Am Chem Soc

    (2010)
  • A.C. Stelzer et al.

    Constructing atomic-resolution RNA structural ensembles using MD and motionally decoupled NMR RDCs

    Methods

    (2009)
  • A.T. Frank et al.

    Constructing RNA dynamical ensembles by combining MD and motionally decoupled NMR RDCs: new insights into RNA dynamics and adaptive ligand recognition

    Nucleic Acids Res

    (2009)
  • M.K. Yoon et al.

    Residual structure within the disordered C-terminal segment of p21(Waf1/Cip1/Sdi1) and its implications for molecular recognition

    Protein Sci

    (2009)
  • W. Rieping et al.

    Inferential structure determination

    Science

    (2005)
  • K. Lindorff-Larsen et al.

    Similarity measures for protein ensembles

    PLoS ONE

    (2009)
  • J.F. Doreleijers et al.

    BioMagResBank database with sets of experimental NMR constraints corresponding to the structures of over 1400 biomolecules deposited in the Protein Data Bank

    J Biomol NMR

    (2003)
  • M. Sickmeier et al.

    DisProt: the database of disordered proteins

    Nucleic Acids Res

    (2007)
  • A. Huang et al.

    Finding order within disorder: elucidating the structure of proteins associated with neurodegenerative disease

    Future Med Chem

    (2009)
  • A.K. Dunker et al.

    The unfoldomics decade: an update on intrinsically disordered proteins

    BMC Genom

    (2008)
  • M. von Bergen et al.

    Assembly of Tau protein into Alzheimer paired helical filaments depends on local sequence motif (306 VQIVYK 311) forming beta-structure

    Proc Natl Acad Sci U S A

    (2000)
  • S. Barghorn et al.

    Structure, microtubule interactions, and paired helical filament aggregation by tau mutants of frontotemporal dementias

    Biochemistry

    (2000)
  • D. Fischer et al.

    Conformational changes specific for pseudophosphorylation at serine 262 selectively impare binding of tau to microtubules

    Biochemistry

    (2009)
  • S. Jeganathan et al.

    Global hairpin folding of tau in solution

    Biochemistry

    (2006)
  • M.D. Mukrasch et al.

    Structural polymorphism of 441-residue tau at single residue resolution

    PLoS Biol

    (2009)
  • M.D. Mukrasch et al.

    Highly populated turn conformations in natively unfolded tau protein identified from residual dipolar couplings and molecular simulation

    J Am Chem Soc

    (2006)
  • E. Mylonas et al.

    Domain conformation of tau protein studied by solution small-angle X-ray scattering

    Biochemistry

    (2008)
  • T.-M. Yao et al.

    Aggregation analysis of the microtubule binding domain in tau protein by spectroscopic methods

    J Biochem

    (2003)
  • S.Y. Huang et al.

    Ensemble docking of multiple protein structures: considering protein structural variations in molecular docking

    Proteins

    (2007)
  • Cited by (223)

    View all citing articles on Scopus
    View full text