Journal of Molecular Biology
Regular articleProtein folding simulations with genetic algorithms and a detailed molecular description1
Introduction
In order to determine protein structure from amino acid sequence, two central problems must be overcome: some form of free energy function must be developed that is able to distinguish between the functional conformation and all the rest, and a conformational search method must be devized that can find that functional conformation using available computing resources.
The most realistic method to date for studying protein motion is Cartesian space molecular dynamics, using an explicit description of all the protein and solvent molecule atoms Allen and Tildesley 1989, Brooks et al 1991. Force fields at this level are highly evolved Aqvist et al 1985, Weiner et al 1984, Brooks et al 1983, Hagler 1985, Lee et al 1995, Kang et al 1996, and a number of studies have shown that there are significant free energy minima near to the functional conformation Brunne et al 1995, Kitson et al 1993. It is not known whether, given enough computer time, simulations started from arbitrary points would converge to the functional conformation. The very large computing requirements have so far prevented simulation of the behavior of proteins or even peptides for more than a few nanoseconds, whereas a period of microseconds to milliseconds would be needed to reproduce observed in vitro folding behavior of peptides (Williams et al., 1996) and small proteins (e.g. Kuszewski et al., 1994). The principal limitation on the simulation scale is that the energy surface with a full atom description is very fine grained. That is, the energy changes rapidly as a function of atomic position. This problem becomes more acute as a molecular system becomes more compact.
To overcome these difficulties, two parallel strategies have been pursued. First, the description of the system has been simplified, to smooth the energy surface. Simplification ranges from the use of implicit solvation models (Lazaridis et al., 1995), slightly (Brooks & Head-Gordon, 1991), moderately (Srinivasan & Rose, 1995) or drastically (Sun et al., 1995) reduced side-chain descriptions, approximate main-chain descriptions (Brooks & Head-Gordon, 1991), and lattice models (Kolinski & Skolnick, 1994), extending to one lattice point per two residues (Park & Levitt, 1996). It is clear that there must be a price to pay for these simplifications, but so far there have been few studies Mounge et al 1995, DeBolt and Skolnick 1996, Huang et al 1996 to establish to what degree the different simplifications affect the ability to identify the functional conformation reliably. We have taken a relatively conservative approach. An implicit solvent description is used, together with all oxygen, nitrogen, carbon and sulfur atoms, as well as polar hydrogen atoms. An empirical force field, including a Coulomb law description of electrostatics with partial atomic charges is used. Advantage is taken of potential of mean force methods (Avbelj & Moult, 1995b) to parameterize this and other terms against experimental protein structures. The ability of this force field to identify conformations close to the functional one has been investigated (Braxenthaler et al., 1996).
The second strategy to over come the search problem is to take larger steps in the conformational space. Covalent geometry (i.e. bond lengths and bond angles, and planarity of conjugated atoms sets) is approximately constant for subgroups of protein atoms, so that conformational freedom may be expressed in terms of the values of the dihedral angles around single bonds. Although molecular dynamics in this space allows a larger time step, greater computational cost partly offsets the gains (Rice & Brünger, 1994). The restrictive move size and the cost of calculating derivatives in traditional molecular dynamics is overcome by using a Monte Carlo (MC) or genetic algorithm (GA) procedure. Much larger changes in conformation can be obtained through the MC procedure. Large changes in one or more dihedral angles are introduced, and the resulting conformation accepted if the evaluated free energy decreases, or if the increase is not too large. A number of MC studies of peptide folding have been made Abagyan and Totrov 1994, Avbelj and Moult 1995a. The disadvantage is that, because the space of a protein molecule is densely packed with atoms, many moves will be rejected as energetically unacceptable.
We have previously investigated the effectiveness of a torsion space MC procedure at finding the experimental conformation of fragments of protein molecules (Avbelj & Moult, 1995a). The method was partly successful, in that in a number of cases low free energy, native-like conformations were generated. However, competing lower free energy but less native-like conformations were often encountered, and the lowest free energy structures were not as low as the corresponding minimized experimental structures. Three possible explanations for these limitations are: (1) Since these are fragments of proteins, the preferred conformation may not necessarily be that found in the full context. (2) The free energy function may have a large number of false minima. (3) The search may not have converged to the lowest free energy conformation possible. The fact that the experimental structures consistently had slightly lower values supported the hypothesis that this last factor played a significant role. We therefore sought a more effective way of searching in torsion space in the compact states of a protein molecule.
Genetic Algorithms (GA) offer one way of searching more effectively in crowded spaces. They were first introduced by Holland (1975) to simulate the processes of natural selection at the genetic level, but rapidly gained acceptance as general search methods (Goldberg, 1989). Possible solutions to a search problem are represented by genes, and a fitness function is used to evaluate the fitness of each gene. A population of genes evolves by three mechanisms. (1) Point mutations, in which the value at a specific place on a single gene is changed. (2) Crossovers, in which a portion of one gene replaces the equivalent portion of another one. (3) Survival of the fittest genes from one generation to the next. Tests on two and three-dimensional lattice models of proteins Unger and Moult 1993a, Unger and Moult 1993b have demonstrated that GAs are more effective than MC at finding the global energy minimum. It is still less clear how well the method performs on more detailed descriptions of a protein chain.
Dandekar and Argos 1992, Dandekar and Argos 1994 used genetic algorithms to fold Cα backbone models of proteins in real space off lattice. A simplified bit-string encoding of φ/ψ space, allowing each φ/ψ pair to adapt one of seven possible angle combinations (Rooman et al., 1991) was used. Each side-chain was represented by a sphere of 1.9 Å and the global energy or fitness function consisted of hydrogen bonding, secondary structure preference and a hydrophobic scatter term, each of which were scaled according to heuristic constants. The method generally relied on the correct pre-assignment of secondary structure for success.
Sun et al. (1995) used a method similar to that of Dandekar and Argos 1992, Dandekar and Argos 1994 in that all secondary structure was explicitly introduced. A full backbone representation was used, with one virtual atom per side-chain. A simulation consisted of mutations of five degree torsion changes and crossovers at non-secondary structure positions. This procedure was able to find conformations which had a low root-mean-square (RMS) deviation to the experimental structure in five of the seven test cases.
Sun (1993) has presented the most detailed GA for protein folding to date. A full atom polypeptide backbone together with one atom at the centroid of each side-chain was used. The GA was driven by a dictionary of di,, tri-, tetra-, and penta-peptides. A peptide segment was chosen from the database of peptides by sequence. The search method was applied to two small proteins melletin and pancreatic polypeptide. The conformation of a small cyclical peptide was also reproduced. All the structures were generated with an RMS deviation of 1.7 Å within the final population in the GA simulation. These results were impressive, although the proteins simulated were present in the dictionary which was used to derive the GA. The search method is said to be approximately 100 to 200 times more efficient than MC simulated annealing protocols.
A more detailed review of the use of GAs for protein structure prediction may be found in Pedersen & Moult (1996).
The search method, presented here, is tested on a set of 28 fragments of protein structures, up to 14 residues long. The fragments were selected on the basis of experimental data and energetic criteria indicating a preference to adopt a native-like structure independent of the presence of the rest of the protein. They were identified from a survey of the current literature on the folding of proteins and protein fragments. These fragments have the advantage of being relatively small, and therefore present a more tractable search problem than complete proteins. A second advantage is that generation of a native-like conformation in an objective search provides added information about the context independence of a fragment, and its possible role in early folding.
The structure of the rest of the paper is as follows. In the first section, the force field is briefly introduced. Next, details of the GA implementation are described, together with a study of optimization of the GA parameters. Then the results of determining the conformation of a standard set of protein fragments are presented, followed by an analysis of some of the factors contributing to problem cases. Finally, we discuss the significance of the results for the further development of search methods and force fields, and explore the implications of the results for the mechanisms of protein folding.
Section snippets
Force field
The force field is the same as that used in a previous MC study Avbelj and Moult 1995a, Avbelj and Moult 1995b, Avbelj 1992. These papers should be consulted for full details. A short summary is given here.
All nitrogen, oxygen, carbon and polar hydrogen atoms are represented explicitly. Hydrogen atoms on aliphatic and aromatic carbon atoms are merged with those atoms in the usual united atom approximation. All intramolecular electrostatic contributions are calculated using atomic partial
Dihedral angle library
For each residue type, a set of observed φ, ψ, and χ angles was compiled from a library of 255 non-redundant protein structures (Holm & Sander, 1994; 20/20 set, release of October, 1994). A processed version of the Brookhaven files in this library is available at (Braxenthaler et al., 1996). Any protein containing a fragment to be simulated was removed from the database (i.e. the procedure was “Jack Knifed”). These angle sets were used to bias selected angles towards the experimental
GA parameter optimization
Trial GA simulations were performed to establish the best set of conditions. The following parameters were optimized: length of population generating MC simulation, population size, fraction of conformations transferred directly to the next generation, frequency of ΔGlocal standard deviation recalculation, extent of joint search, and extent of side-chain annealing. All trial simulations were performed on residues 10 to 22 of the bacterial Ribonuclease Barnase (Baudet & Janin, 1991), referred to
Discussion
There are three questions to be addressed. How effective is the genetic algorithm as a search technique for peptides and proteins? How well does the free energy function perform? What are the implications of the results for early events in protein folding?
Acknowledgements
We thank NIST for use of the IBM SP2 parallel computer, and NIST computing staff for supporting us in its use. This work was partly funded by grants DOC/NIST 60NANBD1594 and NIH GM40134.
References (69)
- et al.
Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins
J. Mol. Biol.
(1994) - et al.
Molecular dynamics simulations of protein unfolding and limited refoldingcharacterization of partially unfolded states of ubiquitin in 60methanol and in water
J. Mol. Biol.
(1995) - et al.
A molecular dynamics study of the C-terminal fragment of the l7/l12 ribosomal proteinsecondary structure motion in a 150 picosecond trajectory
J. Mol. Biol.
(1985) - et al.
Crystal structure of a barnase complex at 1.9 Å resolution
J. Mol. Biol.
(1991) - et al.
The protein databanka computer based archival file for macromolecular structures
J. Mol. Biol.
(1977) - et al.
Folding the main-chain of small proteins with the genetic algorithm
J. Mol. Biol.
(1994) - et al.
Folding of peptide fragments comprising the complete sequence of proteins. Models for initiation of protein folding. I. Myohermerythrin
J. Mol. Biol.
(1992) - et al.
Folding of peptide fragments comprising the complete sequence of proteins. Models for initiation of protein folding. I. Plastocyanin
J. Mol. Biol.
(1992) - et al.
Analysis of the accuracy and implications of simple models for predicting the secondary structure of globular proteins
J. Mol. Biol.
(1978)
Co-operative interactions during protein folding
J. Mol. Biol.
Using a hydrophobic contact potential to evaluate native and near-native folds generated by molecular dynamics simulations
J. Mol. Biol.
Enthalpic contribution to protein stabilityinsights from atom-based calculations and statistical mechanics
Advan. Protein Chem.
Accurate crystal molecular dynamics simulations using particle-mesh-EwaldRNA dinucleotides ApU and GpC
Chem. Phys. Letters
The folding of an enzyme: V. H/2H exchange-nuclear magnetic resonance studies on the folding pathway of barnase: complementarity to and agreement with protein engineering studies
J. Mol. Biol.
Energy functions that discriminate X-ray and near native folds from well-constructed decoys
J. Mol. Biol.
Genetic algorithms for protein structure prediction
Curr. Opin. Struct. Biol.
Prediction of protein backbone conformation based on seven structural assignments
J. Mol. Biol.
An N-terminal fragment of Barnase has residual helical structure similar to that of a refolding intermediate
J. Mol. Biol.
Recent developments in the theory of protein foldingsearching for the global energy minimum
Biophys. Chem.
Environment and exposure to solvent of protein atoms
J. Mol. Biol.
Effect of mutations on the performance of genetic algorithms suitable for protein folding simulations, in computer aided innovation of new materials
Comput.-Aided Innovat. New Mater.
Genetic algorithms for protein folding simulations
J. Mol. Biol.
Computer Simulation of Liquids
pH-induced denaturation of proteinsa salt bridge contributes 3-5 kcal/mol to the free energy of folding of T4 lysozyme
Biochemistry
Use of a potential of mean force to analyse free energy contributions in protein folding
Biochemistry
Determination of the conformations of folding initiation sites in proteins by computer simulation
Proteins: Struct. Funct. Genet.
The role of electrostatic screening in determining protein main chain conformational preferences
Biochemistry
Folding of protein G B1 domain studied by the conformational characterization of fragments comprising its secondary structure elements
Eur. J. Biochem.
The role of dipole interactions in determining polypeptide conformation
J. Am. Chem. Soc.
Carb biocomputing web pages, force field and parameters
CHARMMa program for macromolecular energy, minimization, and dynamics calculations
J. Comput. Chem.
Virtual rigid body dynamics
Biopolymers
G. Nemethy on proteinsa theoretical perspective of dynamics structure and thermodynamics
Advan. Chem. Phys. Bull. Mathemat. Biol.
Cited by (117)
Stochastic numerical approach for solving second order nonlinear singular functional differential equation
2019, Applied Mathematics and ComputationAPL: An angle probability list to improve knowledge-based metaheuristics for the three-dimensional protein structure prediction
2015, Computational Biology and ChemistrySoft computing methods for the prediction of protein tertiary structures: A survey
2015, Applied Soft Computing JournalCitation Excerpt :For the representation, authors use some amino acid features,e.g., partial charge and van der Waals bond. In [82], a global free energy function of an unfolded conformation is calculated as the sum of threes types of energy: local backbone electrostatic energy, which represents the sum of the interactions between NH and CO groups among amino acids, intra-molecular electrostatic energy, that reflects the rest of intra-molecular interactions, and solvation free energy, that takes into account different interactions as hydrogen bonding, ion–dipole, and dipole–dipole attractions or van der Waals forces. Finally, in [127], author develops a tabu search algorithm for the torsion angles prediction.
Three-dimensional protein structure prediction: Methods and computational strategies
2014, Computational Biology and ChemistryCitation Excerpt :Eq. (1) presents the CHARMM force field (MacKerell et al., 1998). Energy surface search techniques: methods for Ab initio prediction include Molecular Dynamics simulations of proteins and protein-substrate complexes (van Gunsteren and Berendsen, 1990; Rapaport, 2004; Koza, 1992); Monte Carlo simulations that do not use forces but rather compare energies, via the use of Boltzmann probabilities (Simons et al., 1997); Genetic Algorithms which are based on populations of solutions by iterative cycles of operations (Holland, 1993; Pokala and Handel, 2000) and try to improve on the sampling and the convergence of Monte Carlo approaches (Pedersen and Moult, 1997; Tuffery et al., 1991; Bowie and Eisenberg, 1994), and exhaustive and semi-exhaustive lattice-based studies which are based on using a crude/approximate fold representation (such as two residues per lattice point) and then exploring all or large amounts of the conformational space given the crude representation. There are many computational packages used in Ab initio protein structure simulations.
An efficient hybrid Taguchi-genetic algorithm for protein folding simulation
2009, Expert Systems with Applications
- 1
Edited by F. E. Cohen