Introduction

The conformational dynamics of proteins play important roles in their functions and regulating ligand binding. A fundamental appreciation of how proteins work requires study of conformations and dynamics, as well as changes between states of protein motions, such as folded/unfolded and ligand-bound/-free states. For example, protein allosteric effects may be related by either or both conformational and dynamical changes [13]. Molecular dynamics (MD) simulations provide powerful tools for the exploration of the conformational energy landscape accessible to protein molecules because multiple conformations are difficult to probe experimentally [46]. Moreover, recent computer-aided drug discovery studies have focused on protein flexibility in molecular docking processes [79]. Since most docking software prefers proteins to be rigid to avoid intensive computational effort, a promising strategy is to dock ligands into protein conformation ensembles obtained from MD simulations [10, 11]. Several programs provide general tools or special modules for analysis of MD results and clustering conformations, but most are based on Cartesian coordinates [1217]. T-Analyst uses internal bond-angle-torsion (BAT) coordinates, which are efficient in capturing side-chain rotamers and most low-frequency motions [1820]. Our program provides useful tool to analyze MD trajectories. For example, users can easily view proper rotameric states of dihedrals from the output files instead of plotting and correcting them manually. This program allows for efficient analysis of MD simulations to study protein flexibility and extract structural information for virtual screening.

Method

Data manipulation and output

T-Analyst reads NAMD, Amber or CHARMM trajectory files and Amber topology file. The CHARMM-type topology files can be converted to Amber topology files easily with the freely available CHAMBER program [21]. The program implements amber2accent to transform Cartesian coordinates to BAT coordinates [22]. To eliminate repeats, only heavy-atom side-chain torsion angles and ω, φ and ψ angles in backbones are considered. Users can choose angles and residues for analysis, and a dihedral distribution and its rotations during a simulation are output as .agr-format files, which can be viewed with Xmgr/Grace. Moreover, results generated by T-Analyst can be visualized by freely available packages such as VMD, Xmgr/Grace or R [12, 23, 24]. Users can output a series of files, such as distributions of all or selected torsion angles in a protein, or sorted and/or unsorted standard deviations and entropy. T-Analyst also groups different conformations based on rotameric states of residues of interest and outputs coordinates of grouped conformations into different trajectory files. The program also computes pairwise cross-correlation coefficients for all pairwise dihedral angles that users selected. By examining the output correlation map, dihedrals that correlate with each other can be observed.

Torsion angle correction

To correctly reflect rotameric states and capture the angle periodicity, T-analyst detects the angle population in margins, such as ±180° and 360°/0°, depending on how users output the angles. If discontinuity in a dihedral distribution is detected (see Fig. 1a, c), the output angle range will be corrected; thus, energy wells can be illustrated and defined properly (Fig. 1b, d). For example, Fig. 1c illustrates a rotameric state of side-chain Ile 153 of ligand-bound tryptophan synthase (TRPS), whereby one energy well is split into two wells near −180° and +180°. T-Analyst automatically detects the peak discontinuity at each edge of −180° and +180° and determines the range of each discontinued peak. Later, the whole population will be divided into left set (−180° to 0°) and right set (0° to +180°) and be counted in each set. T-Analyst angle correction is then applied by moving the discontinued peak on the set with smaller population to the set with larger population. For example, in Fig. 1c, the set from −180° to 0° has the smaller population, so the peak near the −180° margin was shifted by +360° and merged with the other margin at +180°.

Fig. 1
figure 1

Torsion angle distribution of side-chain χ2 of Ile153 of TRPS. Torsion angle: a before angle correction and b after angle correction. Histogram of torsion angle distribution output by T-Analyst: c before angle correction and d after angle correction

Configurational entropy calculation and dihedral correlation

T-Analyst calculates configurational entropy, Sconf, for each torsional degree of freedom by the Gibbs entropy formula: TSconf(i) = −RT Σ Pi ln(Pi), where Pi is the probability distribution of angle i, R is the gas constant and T is the absolute temperature. T-Analyst calls Xmgr/Grace to generate histogram for each degree of freedom. The bin size for each Pi is 0.5° for ω angle, 1° for φ and ψ angles and 5° for side-chain dihedrals. The value of TSconf(i) has unit of kcal/mol, which allows for direct comparison with energy calculations. Summing TSconf(i) provides a quick approximation of the entropic contribution of a system, although coupling between torsions is ignored [25].

T-Analyst also computes the quasi-harmonic approximation (QH) from BAT coordinates. The covariance matrix C can be computed, with the probability distribution functions approximated by a multidimensional Gaussian distribution function [26]. The configurational entropy from QH is computed by TSQH = 1/2 nRT + 1/2 RT ln[(2π)n det(C)], where n is the number of torsions. Although QH assumes that the probability distribution function is Gaussian, which is accurate for torsions that have only one rotameric state, TSQH provides an upper bound limit for the configurational entropy [25, 27]. Moreover, the off-diagonal elements of the covariance matrix indicate the degree of significance of the coupling between the given torsions. Entropy computed from only the diagonal elements of the covariance matrix, TSQH_diag, is also computed. If TSQH equals TSQH_diag, then there is no coupling among these torsions.

The extent to which pairs of dihedrals are correlated with one another can be assessed by examining the magnitude of their cross-correlation coefficients. T-Analyst computes a correlation matrix of dihedrals and calls levelplot function in R to plot a correlation map. Users can select dihedral angles, e.g. backbone φ and ψ angles of selected residues, to plot a correlation matrix. Typical characteristics of a correlation map include a line of strong cross-correlation along the diagonal (where matrix element i = j), and off-diagonal cross-correlations. The high diagonal values are set to 1.00. Off-diagonal correlations can be either positive or negative, and non-zero values may indicate potentially interesting correlations between two close proximity or non-contiguous regions of a protein system.

Clustering protein conformations

Our program clusters protein conformations on the basis of user-selected rotameric states of residues. Although RMSD-based clustering methods are mostly applied to group conformations with significant differences, small fluctuations are challenging to detect with classical RMSD-based clustering methods. This module is particularly useful for choosing representative conformations based on side-chain rotations of key residues. Users can input specific torsions with rotameric states of interest and the range of each torsion to run T-Cluster, the second part of T-Analyst. T-Analyst will provide all the combinations of groups for further analysis. The program is sensitive to dihedral rotations and can efficiently group user picked backbone or side-chain dihedrals into separate trajectory files. A report file is also generated to record information about each group.

Molecular dynamics simulations

Molecular dynamics simulations on ligand-free and ligand-bound TRPS were performed using the NAMD package [23]. Standard simulation procedures (e.g. ([28]) were followed using the Amber 10 package and ff03 Amber force field and general Amber force field [6, 29]. Initial coordinates of TRPS were taken from PDB code 2J9X and 3CEP [30, 31]. Briefly, after preparation of the system by sequential steps of energy minimization and equilibration, the 30 ns production runs were carried out at 298 K and 1 atm. The systems were solvated by a 12 Å TIP3P water box. Snapshots of the atomic coordinates were recorded every 1 ps. As T-Analyst does not require too many frames to run the analysis, snapshots were saved every 20 ps for T-Analyst and 1,500 frames were used. Molecular dynamics simulations on HIV-1 protease were initiated from crystallographic coordinates with a semi-open flap conformation (PDB code 1HHP) [32]. Amber ff99 force field was used for the protein. Aqueous solvation was modeled implicitly by using the Generalized Born approach [33] and temperature was maintained at 298 K by using Langevin dynamics. Standard simulation procedures (e.g. [34]) were followed with the Amber 9 package. Since the free protease predominantly populates the semi-open conformation, we took a 1.5 ns MD simulation and saved it as one 1,500 frame trajectory which had one flap open state for our analysis.

Analysis of MD simulations

Analysis of protein conformations

Analysis of small backbone fluctuations, as well as conformational changes, involves investigating loop or side-chain motions. T-Analyst adopts torsion angle analysis, which allows for accurate expression of bond rotations. For example, Fig. 1a shows a rotameric state of side-chain Ile 153 of ligand-bound TRPS. Standard deviations of this torsion before and after angle correction are 136.4° and 49.4°, respectively. Large differences in standard deviations usually indicate changes in rotameric states. Of note, proper angle correction is necessary for computing accurate rotamers and their standard deviations.

With careful superposition, flexible regions in proteins can be revealed by computing root mean square deviation and fluctuation (RMSD and RMSF, respectively) in Cartesian coordinates [35]. Sometimes one or a few residues may serve as “hinges,” which mainly control movements of a flexible region. While the RMSD/RMSF method may not be sensitive enough to detect hinges, T-Analyst can illustrate torsion angles in hinges, which are associated with protein motions. User can compare multiple torsion distributions with protein motion to detect the hinge region. Figure 2a shows MD simulations of HIV-1 protease with open flaps, and the distance between flap tips is in Fig. 2c. Highly flexible regions such as flaps and elbows are shown in the RMSF plot (Fig. 2b) but determining whether any residues behave as hinges is not straightforward [34]. Data from T-Analyst imply that the flap β-hairpins move relatively rigidly, but some residues of the elbow, such as Gly 40, act as hinges (Fig. 2d).

Fig. 2
figure 2

Analysis of the hinge regions of HIV-1 protease. a An open conformation of HIV-1 protease obtained from the MD simulation. Flaps: residues 43–58, yellow; elbows: residues 37–42; hinge: Gly 40, red. b RMSF of Cα atoms. c Distance between two flap tips: residues Ile 50 and Ile 50′. d Ψ angle distribution of Gly 40′; values within the dashed lines, between 720 and 880 ps, correspond to open flap conformations shown in c

One common method to express protein flexibility is by showing their rotameric states computed from corrected torsion angle distributions. When comparing the rotameric states for proteins between different states, e.g. ligand-bound/-free states, folded/unfolded states, users can have valuable information regarding the protein conformational or flexibility changes between different states. Different ligand mechanisms, such as induced fit or conformational selection (population shift) can be studied [1, 36] .

Here we use TRPS, a protein with strong allosteric regulation of substrate binding, as an example. In both ligand-free and ligand-bound TRPS, the major peak of Glu 49 remains for both states to keep its favorite rotameric state (Fig. 3a, e), which is often assumed [37]. However, instead of dropping the minor peak, the peak changed while the substrate binds. We found that the change may be due to different hydrogen-bonds formations. In the ligand-bound state, Glu 49 form stable hydrogen-bonds with ligand 3-indole-d-glycerol-3′-phosphate (IGP), Tyr 173 and Val 23, and a residue near Glu 49, Tyr 175, also forms a hydrogen-bond with IGP. These interactions stabilize the local environment for the ligand which is necessary for catalysis. In the ligand-free state, since there is no ligand to interact with the protein in the binding site, more free space is allowed for side-chains to move around, though less stable hydrogen-bonds between side-chains can still formed. Without the ligand, the hydrogen-bond between Glu 49 and Tyr 173 is absent and Tyr 173 flips away from the binding site. Further detailed analysis shows that the distance between Tyr173:OH and Glu49:CD changes from 3.6 ± 0.16 Å in ligand-bound state (red dashed lines in Fig. 4a) to 17.5 ± 0.63 Å in ligand-free state (red dashed lines in Fig. 4b). As a result, instead of completely diminishing one rotameric state of Glu 49, the interaction induces a new rotameric state to alter the local environment to accommodate the ligand. Similarly, the major peak of residue Ile 153 in Fig. 3b, f remains in the same position in both states, but it is more populated in the ligand-bound state. Upon IGP binding, the methyl group of Ile 153 interacts with the non-polar region of the ligand and stays in one peak. Therefore, the population of the rotameric states is shifted.

Fig. 3
figure 3

Clustering conformations of TRPS. ad Ligand-free state, eh ligand-bound state. Torsion angle distribution of side-chain χ3 of Glu 49 in a ligand-free and e ligand-bound state. Torsion angle distribution of side-chain χ2 of Ile 153 in b ligand-free and f ligand-bound state. c, g RMSD of Glu 49 and Ile 153 after structure alignment. d, h conformational groups computed by T-Analyst over the course of MD simulations. Each small diamond represents one frame

Fig. 4
figure 4

Snapshots of surrounding residues of Glu 49 in a ligand-bound and b ligand-free state. Grey dotted lines indicate stable hydrogen-bonds in a and green dotted lines indicate weak hydrogen-bonds in b. Red dashed lines indicate the distance between Tyr173:OH and Glu49:CD. The circle labeled with IGP in b indicates the position of the ligand binding pocket

Clustering protein conformations

Docking potential ligands to target protein is one of the key steps in drug design and discovery process. During ligand–protein docking process, the position of side-chains in binding site can affect ligand–protein interaction significantly and directly, especially when using rigid protein conformations. Here we use the program to group conformations and then select representative conformations for performing docking studies to the α-subunit of TRPS. TRPS is a potential antibiotic target, and two torsion angles, Glu 49 and Ile 153, are known to directly involve in ligand–protein binding (see Fig. 3). Figure 3a, b show the distribution of side-chain torsion angles in both residues in ligand-free state which is output by T-Analyst after angle correction. Two populations are shown in each of the torsion angle distributions. T-Analyst can group the trajectories into four groups which are characterized by—group 1 (Glu 49-a, Ile 153-a), group 2 (Glu 49-a, Ile 153-b), group 3 (Glu 49-b, Ile 153-a), group 4 (Glu 49-b, Ile 153-b). Figure 3c, g show the RMSD distribution of Glu 49 and Ile 153 with simulation time. Figure 3d, h are the group distributions for the two torsion angles. Notably, although different protein conformations are usually clustered based on computed RMSD, the value is less sensitive to small scale conformational changes, such as side-chain rotations in a protein’s active site. As illustrated in Fig. 3c, g, there is no clue from RMSD that may be used for grouping. In contrast, T-Analyst also provides information regarding jumping between groups during a course of MD simulation, see Fig. 3d, h.

Configurational entropy

Protein allosteric effects or post-translational modifications such as phosporylation do not always involve substantial conformational changes. Recent experiments confirmed that in some cases, visual inspection of the active/inactive states may not reveal differences in the shape of the ligand binding site, but changes in protein dynamics [3]. Therefore, the magnitude of configuration entropy computed from dihedral degrees of freedom provides a direct way to examine protein flexibility.

The total entropy of a molecule in solution can be separated into two parts: solvent entropy associated with water motion and configurational entropy (Sconf) associated with molecular motion. The latter can be used to represent protein flexibility. Therefore, quantifying the configurational entropy, especially changes, could help explain important biochemical processes such as protein folding and ligand–protein binding. Figure 5 shows Sconf calculated from selected backbone and side-chain dihedrals in the binding site of TRPS. T-Analyst suggested that the flexibility of a loop region changes significantly when the ligand binds into the pocket.

Fig. 5
figure 5

Entropy distributions in the binding site of TRPS. a Entropy distribution of Ψ torsion angles of residues within 8Å of ligand IGP and b side-chain torsion angles that directly interact with IGP

The calculations can provide quantitative information regarding entropy gain or loss. For example, in Fig. 5b, the side-chain entropies of Tyr 175 and Thr 183 drop notably, around 0.7 and 0.6 kcal/mol, respectively. In the ligand-bound state, Tyr 175 forms a stable hydrogen-bond with ligand IGP which stabilized the phenol ring, and Thr 183 also forms hydrogen-bonds with Asp 60 and Gly 61. In ligand-free state, both side-chains move more freely as there is no stable hydrogen-bond formed.

Although in most cases, one can observe entropy decrease upon ligand binding, there may be some exceptions which are of particular interest and worth further analysis. The entropy changes computed by T-Analyst provide guidance for users to pick up regions of a protein to do detail dynamic analysis. For example, T-Analyst showed an entropy gain in the second side-chain of Asp 60 in the presence of ligand, though not largely. Based on the information, we carried out further investigation near Asp 60, and found that two oxygen atoms in carboxyl group can form hydrogen-bonds with Tyr 102 and Thr 183 in both ligand-bound and ligand-free states. Interestingly, in the ligand-bound state, the two oxygen atoms of Asp 60 can flip very often but retain two stable hydrogen-bonds with Tyr 102 and Thr 183 alternatively. The presence of IGP provides a more hydrophilic environment around the carboxyl group of Asp 60; thus, the oxygen atoms can flip more freely. As a result, the local entropy increases without losing hydrogen-bonding and reducing electrostatic attraction. In contrast, when IGP is absent, the local environment around Asp 60 is mainly hydrophobic. Therefore, Asp 60 forms hydrogen-bond with Thr 183, the other hydrophilic residue, and the residues become less flexible.

Dynamical cross-correlation between dihedral angles

T-Analyst also performs cross-correlation analysis of a trajectory. The resulting cross-correlation map allows the identification of the correlated and anti-correlated motions involved in an entire protein or user selected dihedrals. Similar to calculating standard deviations, dihedral angles used to generate a correlation plot also need to run angle correction, or discontinuities in margins (±180° or 360°/0°) can cause errors when computing their cross-correlation coefficients.

Here we use HIV-1 protease as a test case, which continues to be one of the primary targets of anti-AIDS drug discovery. Understanding the dynamics of free HIV-1 protease has profound implications for designing new therapeutic agents, such as allosteric inhibitors. Figure 6a shows the structure anatomy of the protein. Yellow boxes in the correlation map, computed by the backbone φ and ψ angles with T-Analyst (see Fig. 6b), show the highly correlated regions within the HIV protease monomer. For example, boxes I, IV, VIII and X indicate flexible and concerted motions within fulcrum, flap, cantilever and loop regions. Other boxes highlighted in Fig. 6b indicate the correlations between flap elbow, flap tip, cantilever, fulcrum and loop regions. It has been suggested that motions of the flap tip and elbow are highly correlated, which are illustrated in our plot as well [34]. Interestingly, correlations between the flap elbow and cantilever are identified (Box VI), and the motions of flap tip and cantilever are also correlated (Box VII), even though not quite strongly. We compared our plot to another correlation map generated by the use of the backbone Cα and N atoms in the Cartesian coordinates with the Bio3D package (Fig. 6c) [38]. Both plots suggested similar correlations. However, because of the nature of Cartesian coordinates, patterns of correlations are harder to distinguish in Fig. 6c.

Fig. 6
figure 6

Structure anatomy of HIV-1 protease and correlation maps of one subunit of HIV-1 protease in BAT coordinates and Cartesian coordinates. a Structure anatomy of HIV-1 protease. Color indicates distinct regions. Flaps: residues 43–58, red; flap tips: residues 49–52, yellow; flap elbows: residues 37–42, magenta; cantilevers: residues 59–75, green; fulcrums: residues 10–23, orange; loop area near flaps: residues 76–84, grey; and interleaved β-strand motif forming the dimer interface: residues 1–4 and 96–99, blue/cyan. b Φ and Ψ torsion angle correlation map generated by T-Analyst using BAT coordinates. Yellow lines and boxes indicate correlations between I: fulcrum; II: flap elbow, III: flap with flap elbow, IV: flap, V: flap tip, VI: flap elbow with cantilevel, VII: flap with cantilevel, VIII: cantilevel, IX: fulcrum with loop, and X: loop. c Correlation map generated by the use of the backbone Cα and N atoms in the Cartesian coordinates with the Bio3D package

Conclusions

MD simulations provide invaluable conformational and dynamical landmarks useful for designing new experiments and for theoretical studies. The current analysis method we describe, T-Analyst, can help exam protein motion, identify structural and dynamic features, reveal changes of flexibility in different states, and group conformations based on dihedral rotamers. Analyzing the growing MD data may be the most time-consuming step in simulation studies, and our program facilitates this work. The program can be freely downloaded from http://research.chem.ucr.edu/groups/chang/tools.htm.