Journal of Molecular Biology
Regular articleConservation helps to identify biologically relevant crystal contacts1
Introduction
Most crystal contacts are artifacts of crystallization that would not occur in solution or in the physiological state. But some of the observed contacts may be biologically relevant. Determining which contacts are biological and which are not is often difficult, particularly when, as frequently seems to be the case for entries in the Protein Data Bank (PDB),1 the oligomeric state of the protein is uncertain or unknown.2
Biological contacts, which here refer to any site of in vivo recognition between macromolecules, have received more attention than non-biological contacts or comparisons of the two. Biological interfaces have been characterized in terms of their geometric features, such as planarity, shape-complementarity and circularity, in terms of their chemistry, such as hydrophobicity, preference for certain amino acid residues, and in terms of residue conservation.3, 4, 5, 6, 7, 8 Although a number of studies have sought to predict the location of biological interfaces based on some of these parameters9, 10 or to dock partners (see Sternberg et al.,11 and references therein), few have attempted to discriminate between biological and non-biological contacts,12 a problem faced by anyone who interprets X-ray data.
Most proteins solved by X-ray analysis and deposited in the PDB have three or more crystal contacts, and some have over 20. The sum of these contacts typically buries around 30 % of the protein surface to ensure crystal stability.13
A number of features distinguish biological from non-biological contacts. Biologically relevant interactions tend to be more specific than non-biological ones, although this can be hard to detect in the crystal.14 The promiscuity of non-biological contacts in pancreatic ribonuclease has been demonstrated by Crosio et al.15 They showed that almost any residue on the surface of the protomer can be part of a crystal contact and that the same residue involved in two alternative contacts may interact with a different set of partners. Biological contacts tend to be larger than non-biological ones and usually constitute the biggest contact in the crystal.13, 14, 16, 17 The amino acid composition of non-biological contacts is much like that of the surface as a whole,13 although observed distributions vary slightly with ionic strength of solvent.18 Biological contacts are split on the issue of composition. Transient contacts, such as those formed in signal transduction are composed similarly to the rest of the surface, whereas oligomeric contacts have a composition intermediate between the surface and the protein core.7, 19 Some groups have mutated residues on the surface in order to engineer non-biological contacts and so improve crystal stability.20
Automatic discrimination of biological from non-biological contacts is desirable, and is attempted in the Protein Quaternary Structure database2 (PQS†). Because the contact size is such a powerful discriminant, the PQS uses accessible surface area (ASA) of the buried contact area to distinguish biological from non-biological contacts, along with a number of other physical measures, which are not rigorously optimized. The method developed for PQS, when assessed against solution data for a non-redundant subset of proteins, distinguished correctly between true and false homodimers 78% of the time (Hannes Ponstingl, personal correspondence).
Ponstingl et al.12 rigorously tested the utility of ASA and statistical “pair potentials” as discriminants. Pair potentials are putative energies derived from a statistical analysis of observed frequencies of atom pairs at a given separation. These have been used before for predicting the location of putative biological contacts21 and for discriminating between computer-docked protein complexes.22 Ponstingl et al. analysed a dataset of 172 proteins, with 76 homodimers and 96 monomers. Straight ASA produced a correct classification 84.6 % of the time. Their pair potential correctly classified proteins in their dataset 87.5 % of the time. A modified ASA score that considered the difference in size between the two largest contacts gave an accuracy of 88.9 %.
Conservation has been used successfully to explore patterns of energy and define functional residues at protein binding sites.23, 24, 25, 26 Recently we reported that, within a small and extensively researched dataset, oligomeric interfaces exhibit significant residue conservation compared with comparable-sized regions of the protein surface.8 There is a clear rationale for why biological interfaces should be conserved: the amount by which they vary is circumscribed by the importance and specificity of their physiological role, and the degree of variability required to disrupt them. Conversely, we would expect no such selective evolutionary pressure on non-biological contacts, which are the result of human experiments and not the product of evolution.27 The above suggests conservation may be useful in discriminating between biological contacts, which we assume will be conserved, and non-biological ones, which we assume will not. Moreover, since the measures of conservation and size are orthogonal, it is possible that combining them will provide a truly powerful discriminator.
We assess the utility of size and conservation in addressing the following two questions.
- (1)
Is a given crystal contact biological?
- (2)
Given all contacts in the crystal of a homodimer, which is the biological one?
These questions are different from those posed in earlier studies that have attempted to distinguish between homodimers and monomers. They more directly test the utility of conservation in identifying biological relevance of a contact. We develop algorithms that use one or both measures to answer each of the questions above. We compare efficacy of these algorithms, as well as the relative contribution of size and conservation to their predictive power.
Section snippets
Results
We investigated size and conservation of crystal contacts in 53 families of homodimers and 65 families of monomers. A contact was defined as the set of residues on a protomer that lose their accessibility upon complexation with a partner. Contact conservation was measured probabilistically as PCons. On this scale, values close to zero indicate extremely high conservation (i.e. improbable by chance) and values close to unity indicate extreme low conservation (i.e. high variability in evolution).
Discussion
The results show biological crystal contacts are typically larger and more conserved than non-biological ones. Our analysis of contact size agrees with that of previous studies. It finds biological contacts are invariably large and usually the largest contact made in a crystal. Figure 1 suggests there may be some upper bound on the size of non-biological contacts. The reason for this could be principally biophysical. Consider two interacting protomer surfaces. Small sites of interaction that
Conclusion
Conservation alone provides information, which is orthogonal to that of size, that is powerful to help predict the biological relevance of a crystal contact. Conservation and size provide a potent combination for discriminating biological from non-biological contacts. Ultimately, size remains the most powerful discriminator, but conservation can discriminate between borderline cases.
Neural networks generalize the information from homodimer data well, using it to correctly infer biological
Dataset
The dataset of Ponstingl et al.12 was used to provide a starting point for further filtering. This comprised 172 non-homologous protein crystal structures of which 76 were homodimers and 96 were monomers. Atom coordinates were taken from the PDB.1 A program written by Hannes Ponstingl was used to generate hypothetical contacts for each structure. It works by applying crystallographic symmetry operations to a given protomer chain to recreate atom coordinates in the asymmetric unit of the
Acknowledgements
W.V. is funded by a BBSRC special studentship. We thank Hannes Ponstingl, Adrian Shepherd, Roman Laskowski, Irene Nooren and Thomas Kabir for helpful discussions and assistance with data preparation.
References (53)
- et al.
PQSa protein quaternary structure file server
Trends Biochem. Sci.
(1998) - et al.
Surface, subunit interfaces and interior of oligomeric proteins
J. Mol. Biol.
(1988) - et al.
Protein-protein interactionsa review of protein dimer structures
Prog. Biophys. Mol. Biol.
(1995) - et al.
Prediction of protein-protein interaction sites using patch analysis
J. Mol. Biol.
(1997) - et al.
Predictive docking of protein-protein and protein-DNA complexes
Curr. Opin. Struct. Biol.
(1998) - et al.
Crystal packing in six crystal forms of pancreatic ribonuclease
J. Mol. Biol.
(1992) - et al.
Ionic strength and intermolecular contacts in protein crystals
J. Cryst. Growth
(2000) - et al.
Analysis of protein-protein interaction sites using surface patches
J. Mol. Biol.
(1997) - et al.
Studies on engineering crystallizability by mutation of surface residues of human thymidylate synthase
J. Cryst. Growth
(1992) - et al.
A soft, mean-field potential derived from crystal contacts for predicting protein-protein interactions
J. Mol. Biol.
(1998)
An evolutionary trace method defines binding surfaces common to protein families
J. Mol. Biol.
Identification of functional surfaces of the zinc binding domains of intracellular receptors
J. Mol. Biol.
ConSurfan algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information
J. Mol. Biol.
Lectins
Curr. Opin. Struct. Biol.
The differential pattern of tissue-specific expression of ruminant pancreatic type ribonucleases may help to understand the evolutionary history of their genes
Gene
Raster3Dphotorealistic molecular graphics
Methods Enzymol.
The Protein Data Bank
Nucl. Acids. Res.
Principles of protein-protein recognition
Nature
Hydrophobic patches on protein subunit interfacescharacteristics and prediction
Proteins: Struct. Funct. Genet.
The atomic structure of protein-protein recognition sites
J. Mol. Biol.
Protein-protein interfacesanalysis of amino acid conservation in homodimers
Proteins: Struct. Funct. Genet.
A role for surface hydrophobicity in protein-protein recognition
Protein Sci.
Discriminating between homodimeric and monomeric proteins in the crystalline state
Proteins: Struct. Funct. Genet.
Protein-protein crystal-packing contacts
Protein Sci.
Specific versus non-specific contacts in protein crystals
Nature Struct. Biol.
Protein-protein interactions at crystal contacts
Proteins: Struct. Funct. Genet.
Cited by (0)
- 1
Edited by F. Cohen