Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: The myoglobin case

https://doi.org/10.1016/j.jmgm.2009.01.006Get rights and content

Abstract

Several molecular dynamics (MD) simulations were used to sample conformations in the neighborhood of the native structure of holo-myoglobin (holo-Mb), collecting trajectories spanning 0.22 μs at 300 K. Principal component (PCA) and free-energy landscape (FEL) analyses, integrated by cluster analysis, which was performed considering the position and structures of the individual helices of the globin fold, were carried out. The coherence between the different structural clusters and the basins of the FEL, together with the convergence of parameters derived by PCA indicates that an accurate description of the Mb conformational space around the native state was achieved by multiple MD trajectories spanning at least 0.14 μs.

The integration of FEL, PCA, and structural clustering was shown to be a very useful approach to gain an overall view of the conformational landscape accessible to a protein and to identify representative protein substates. This method could be also used to investigate the conformational and dynamical properties of Mb apo-, mutant, or delete versions, in which greater conformational variability is expected and, therefore identification of representative substates from the simulations is relevant to disclose structure–function relationship.

Introduction

Myoglobin (Mb) is a small hemo-protein involved in oxygen transport and storage, which has been widely used for functional and structural studies [1], [2], [3], [4]. Mb belongs to the globin family, which is characterized by a globular fold comprising eight α-helices (A to H, from the N- to the C-terminus) linked to each other by short loop regions [1]. Over the last decades, a number of new globins with different active-site structures has been discovered and characterized, providing interesting variants on the theme [5], [6], [7]. Moreover, the possibility of reversible heme removal and the large number of mutant and deleted versions of Mb available [8], [9], [10], [11], [12], [13], [14], [15] has extended considerably the appeal of myoglobin for structure and dynamics investigation [2], [3].

It is well known that Mb undergoes functionally relevant conformational transitions upon ligand binding, pH variation, but also in the native structure, which has been analyzed by IR spectroscopy [16], [17], NMR [18], [19], electrospray mass spectrometry [20], [21], time-resolved crystallography [22], [23] and, molecular dynamics simulations [24], [25], [26], [27], [28]. Most of these studies converge to the conclusion that full insight into conformational changes of Mb necessitates a detailed description of the conformational ensemble of the protein in the native state [29]. In this context, it is important to underline that proteins are very complex systems and that the energy surface describing the native state contains multiple minima corresponding to very similar, but slightly different conformations, or conformational substates [30], [31], [32], [33].

Proper sampling of protein energy landscape requires the generation of a large sample of molecular conformations and computational methods can be applied to this task [33], [34], [35]. However, analysis of the conformational space, even for a relatively small molecule, is very demanding due to the extremely high dimensionality of the systems [34]. Moreover, a limitation common to all computational sampling procedures is that, although they generate large conformation ensembles, it is hard to assess the extent of sampling [34], [36], [37].

Multiple molecular dynamics (MD) simulations are well suited for generating ensembles of structures [38], [39], [40], [41], [42] at a fixed temperature, and, in fact, they have been successfully used to investigate the conformational landscape near a protein native state [38], [39]. However, it has been observed that independent trajectories of the same system can sample the same local region although following distinct paths, whereas in other cases some trajectories sample more than one of the major local regions for significant time periods. Thus, extending the trajectories for a limited time period does not guarantee more extensive sampling [38], [40]. In addition, the ensemble average structure is typically taken as representative of the trajectory snapshots, even if, in the case of flexible structures, the average structure may not be adequately representative of the ensemble [36]. In fact, few studies have been devoted to a detailed analysis of the contribution of individual protein trajectories to the conformational sampling, and to the identification of representative structures from MD ensembles [34], [43], [44], [45]. Therefore, care has to be used since the average structures resulting from independent trajectories could be very different, and therefore the identification of conformations belonging to low energy basins becomes crucial.

MD simulations can provide an atomic-level picture of protein motion and a representation of the free-energy landscape (FEL) [46], [47] close to the protein native state [43], [44]. However, since it is not possible to represent the FEL as a function of 3N-6 coordinates, it is essential to choose an appropriate set of reaction coordinates that allows to distinguish among different conformational substates. Reaction coordinates are usually obtained by Principal Component Analysis (PCA), which describes the largest amplitude protein motions during a simulation [34], [35], [43], [48], [49], [50]. Once the reaction coordinates are chosen, the FEL can be derived from the probability density function [47], [51].

In fact, since the FEL is projected on a low-dimensional space, it is relevant to verify that conformations that map to the same free-energy basin are also characterized by similar three-dimensional structures. Therefore a further approach for the analysis of the conformational sampling is offered by structural family clustering, a method that collects together conformations according to geometric similarity [34], [51], [52].

In order to explore how to achieve an adequate sampling of the native conformational space through MD simulations and to derive representative structures from the MD ensemble, we performed a set of eleven independent MD simulations at 300 K to sample the conformational space of holo-myoglobin (holo-Mb) in the neighborhood of the native structure, collecting trajectories spanning 0.22 μs. In particular, principal component and FEL analyses, integrated by several cluster analyses, were carried out to gain an overall view of the conformational space accessible to this globular protein. It turns out a high coherence among the different clusters and the basins of the FEL and convergence of parameters derived by PCA, which is not achieved by lower conformational sampling. In particular, our results suggest that multiple MD trajectories spanning 140–160 ns of simulations are sufficient to ensure an adequate sampling of a protein characterized by the globin fold, as myoglobin.

Section snippets

MD simulations

MD simulations were performed using the 3.3 version of GROMACS software (www.gromacs.org), implemented on a parallel architecture, using GROMOS96 forcefield, which was used in previous MD studies of myoglobin [53] and neuroglobin [54] providing results in excellent agreement with the experimental data. The X-ray structure of the native holo-Mb (PDB entries 1A6N [55], and 1CQ2 [56]), were used as starting points for the MD simulations. Protein structures, including crystallographic water

Stability of the simulations

We performed several independent simulations (replicas) for holo-Mb in order to increase the conformational sampling, using an iterative procedure as explained in Section 2. The analyses of the MD trajectories have been carried out discarding the portions of the replicas required by the system to reach stable values of mainchain rmsd and of gyration radius (about 1 ns), in order to ensure that calculated parameters reflect the intrinsic properties of the system (Fig. S1, Supplementary).

Evaluation of the conformational sampling

Since MD

Conclusions

Multiple MD simulations with different initial atomic velocities were used to sample conformations in the proximity of the native structure of holo-myoglobin using an iterative procedure. Eleven independent 20 ns trajectories (replicas), which sample only a fraction of the conformational distribution, were concatenated, allowing to collect more than 0.22 μs of simulation at 300 K.

The coordinate space evolution of the different replicas was examined through low-dimensional projections (via

Acknowledgements

The authors thank CINECA (Project 696 – 2008) for the use of computational facilities and Marco Pasi for helpful discussions and critical reading of the manuscript.

References (70)

  • M. Anselmi et al.

    The kinetics of ligand migration in crystallized mioglobin as revealed by molecular dynamics simulations

    Biophys. J.

    (2008)
  • M. Anselmi et al.

    Molecular dynamics simulation of the neuroglobin crystal: comparison with the simulation in solution

    Biophys. J.

    (2008)
  • J. Vojtechovsky et al.

    Crystal structures of myoglobin-ligand complexes at near-atomic resolution

    Biophys. J.

    (1999)
  • S. Bhattacharya et al.

    The tautomeric state of histidines in myoglobin

    Biophys. J.

    (1997)
  • S. Maguid et al.

    Exploring the common dynamics of homologous proteins. Application to the globin family

    Biophys. J.

    (2005)
  • H. Frauenfelder et al.

    Myoglobin: the hydrogen atom of biology and a paradigm of complexity

    Proc. Natl. Acad. Sci. USA

    (2003)
  • F.G. Parak et al.

    Myoglobin, a paradigm in the study of protein dynamics

    Chemphyschem

    (2002)
  • D.E. Bikiel et al.

    Modeling heme proteins using atomistic simulations

    Phys. Chem. Chem. Phys.

    (2006)
  • K. Shikama et al.

    Structure–function relationships in unusual nonvertebrate globins

    Crit. Rev. Biochem. Mol. Biol.

    (2004)
  • J.A. Hoy et al.

    Plant hemoglobins: a molecular fossil record for the evolution of oxygen transport

    J. Mol. Biol.

    (2008)
  • M.T. Reymond et al.

    Folding propensities of peptide fragments of myoglobin

    Protein Sci.

    (1997)
  • R. Grandori et al.

    Cloning, overexpression and characterization of micro-myoglobin, a minimal heme-binding fragment

    Eur. J. Biochem.

    (2002)
  • E.A. Ribeiro et al.

    Circular permutation and deletion studies of myoglobin indicate that the correct position of its N-terminus is required for native stability and solubility but not for native-like heme binding and folding

    Biochemistry

    (2005)
  • H. Ji et al.

    The effect of heme on the conformational stability of micro-myoglobin

    FEBS J.

    (2008)
  • M. Jamin

    The folding process of apomyoglobin

    Protein Pept. Lett.

    (2005)
  • C. Nishimura et al.

    Identification of native and non-native structure in kinetic folding intermediates of apomyoglobin

    J. Mol. Biol.

    (2005)
  • H. Dyson et al.

    The role of hydrophobic interactions in initiation and propagation of protein folding

    Proc. Natl. Acad. Sci. USA

    (2006)
  • A. Ansari et al.

    Conformational relaxation and ligand binding in myoglobin

    Biochemistry

    (1994)
  • H.H. de Jongh et al.

    Amide-proton exchange of water-soluble proteins of different structural classes studied at the submolecular level by infrared spectroscopy

    Biochemistry

    (1997)
  • S. Cavagnero et al.

    Amide proton hydrogen exchange rates for sperm whale myoglobin obtained from 15N-1H NMR spectra

    Protein Sci.

    (2000)
  • D.A. Simmons et al.

    Conformational dynamics of partially denatured myoglobin studied by time-resolved electrospray mass spectrometry with online hydrogen-deuterium exchange

    Biochemistry

    (2003)
  • D.A. Simmons et al.

    Characterization of transient protein folding intermediates during myoglobin reconstitution by time-resolved electrospray mass spectrometry with on-line isotopic pulse labeling

    Biochemistry

    (2002)
  • R. Aranda et al.

    Time-dependent atomic coordinates for the dissociation of carbon monoxide from myoglobin

    Acta Crystallogr. D Biol. Crystallogr.

    (2006)
  • R. Elber et al.

    Multiple conformational states of proteins: a molecular dynamics analysis of myoglobin

    Science

    (1987)
  • M. Aschi et al.

    Conformational fluctuations and electronic properties in mioglobin

    J. Comput. Chem.

    (2004)
  • Cited by (293)

    View all citing articles on Scopus
    View full text