Journal of Molecular Biology
Volume 313, Issue 2, 19 October 2001, Pages 399-416
Journal home page for Journal of Molecular Biology

Regular article
Conservation helps to identify biologically relevant crystal contacts1

https://doi.org/10.1006/jmbi.2001.5034Get rights and content

Abstract

Some crystal contacts are biologically relevant, most are not. We assess the utility of combining measures of size and conservation to discriminate between biological and non-biological contacts. Conservation and size information is calculated for crystal contacts in 53 families of homodimers and 65 families of monomers. Biological contacts are shown to be usually conserved and typically the largest contact in the crystal. A range of neural networks accepting different combinations and encodings of this information is used to answer the following questions: (1) is a given crystal contact biological, and (2) given all crystal contacts in a homodimer, which is the biological one? Predictions for (1) are performed on both homodimer and monomer datasets. The best performing neural network combined size and conservation inputs. For the homodimers, it correctly classified 48 out of 53 biological contacts and 364 out of 366 non-biological contacts, giving a combined accuracy of 98.3 %. A more robust performance statistic, the phi-coefficient, which accounts for imbalances in the dataset, gave a value of 0.92. Taking all 535 non-biological contacts from the 65 monomers, this predictor made erroneous classifications only 4.3 % of the time. Predictions for (2) were performed on homodimers only. The best performing network achieved a prediction accuracy of 98.1 % using size information alone. We conclude that in answering question (1) size and conservation combined discriminate biological from non-biological contacts better than either measure alone. For answering question (2), we conclude that in our dataset size is so powerful a discriminant that conservation adds little predictive benefit.

Introduction

Most crystal contacts are artifacts of crystallization that would not occur in solution or in the physiological state. But some of the observed contacts may be biologically relevant. Determining which contacts are biological and which are not is often difficult, particularly when, as frequently seems to be the case for entries in the Protein Data Bank (PDB),1 the oligomeric state of the protein is uncertain or unknown.2

Biological contacts, which here refer to any site of in vivo recognition between macromolecules, have received more attention than non-biological contacts or comparisons of the two. Biological interfaces have been characterized in terms of their geometric features, such as planarity, shape-complementarity and circularity, in terms of their chemistry, such as hydrophobicity, preference for certain amino acid residues, and in terms of residue conservation.3, 4, 5, 6, 7, 8 Although a number of studies have sought to predict the location of biological interfaces based on some of these parameters9, 10 or to dock partners (see Sternberg et al.,11 and references therein), few have attempted to discriminate between biological and non-biological contacts,12 a problem faced by anyone who interprets X-ray data.

Most proteins solved by X-ray analysis and deposited in the PDB have three or more crystal contacts, and some have over 20. The sum of these contacts typically buries around 30 % of the protein surface to ensure crystal stability.13

A number of features distinguish biological from non-biological contacts. Biologically relevant interactions tend to be more specific than non-biological ones, although this can be hard to detect in the crystal.14 The promiscuity of non-biological contacts in pancreatic ribonuclease has been demonstrated by Crosio et al.15 They showed that almost any residue on the surface of the protomer can be part of a crystal contact and that the same residue involved in two alternative contacts may interact with a different set of partners. Biological contacts tend to be larger than non-biological ones and usually constitute the biggest contact in the crystal.13, 14, 16, 17 The amino acid composition of non-biological contacts is much like that of the surface as a whole,13 although observed distributions vary slightly with ionic strength of solvent.18 Biological contacts are split on the issue of composition. Transient contacts, such as those formed in signal transduction are composed similarly to the rest of the surface, whereas oligomeric contacts have a composition intermediate between the surface and the protein core.7, 19 Some groups have mutated residues on the surface in order to engineer non-biological contacts and so improve crystal stability.20

Automatic discrimination of biological from non-biological contacts is desirable, and is attempted in the Protein Quaternary Structure database2 (PQS†). Because the contact size is such a powerful discriminant, the PQS uses accessible surface area (ASA) of the buried contact area to distinguish biological from non-biological contacts, along with a number of other physical measures, which are not rigorously optimized. The method developed for PQS, when assessed against solution data for a non-redundant subset of proteins, distinguished correctly between true and false homodimers 78% of the time (Hannes Ponstingl, personal correspondence).

Ponstingl et al.12 rigorously tested the utility of ASA and statistical “pair potentials” as discriminants. Pair potentials are putative energies derived from a statistical analysis of observed frequencies of atom pairs at a given separation. These have been used before for predicting the location of putative biological contacts21 and for discriminating between computer-docked protein complexes.22 Ponstingl et al. analysed a dataset of 172 proteins, with 76 homodimers and 96 monomers. Straight ASA produced a correct classification 84.6 % of the time. Their pair potential correctly classified proteins in their dataset 87.5 % of the time. A modified ASA score that considered the difference in size between the two largest contacts gave an accuracy of 88.9 %.

Conservation has been used successfully to explore patterns of energy and define functional residues at protein binding sites.23, 24, 25, 26 Recently we reported that, within a small and extensively researched dataset, oligomeric interfaces exhibit significant residue conservation compared with comparable-sized regions of the protein surface.8 There is a clear rationale for why biological interfaces should be conserved: the amount by which they vary is circumscribed by the importance and specificity of their physiological role, and the degree of variability required to disrupt them. Conversely, we would expect no such selective evolutionary pressure on non-biological contacts, which are the result of human experiments and not the product of evolution.27 The above suggests conservation may be useful in discriminating between biological contacts, which we assume will be conserved, and non-biological ones, which we assume will not. Moreover, since the measures of conservation and size are orthogonal, it is possible that combining them will provide a truly powerful discriminator.

We assess the utility of size and conservation in addressing the following two questions.

  • (1)

    Is a given crystal contact biological?

  • (2)

    Given all contacts in the crystal of a homodimer, which is the biological one?

These questions are different from those posed in earlier studies that have attempted to distinguish between homodimers and monomers. They more directly test the utility of conservation in identifying biological relevance of a contact. We develop algorithms that use one or both measures to answer each of the questions above. We compare efficacy of these algorithms, as well as the relative contribution of size and conservation to their predictive power.

Section snippets

Results

We investigated size and conservation of crystal contacts in 53 families of homodimers and 65 families of monomers. A contact was defined as the set of residues on a protomer that lose their accessibility upon complexation with a partner. Contact conservation was measured probabilistically as PCons. On this scale, values close to zero indicate extremely high conservation (i.e. improbable by chance) and values close to unity indicate extreme low conservation (i.e. high variability in evolution).

Discussion

The results show biological crystal contacts are typically larger and more conserved than non-biological ones. Our analysis of contact size agrees with that of previous studies. It finds biological contacts are invariably large and usually the largest contact made in a crystal. Figure 1 suggests there may be some upper bound on the size of non-biological contacts. The reason for this could be principally biophysical. Consider two interacting protomer surfaces. Small sites of interaction that

Conclusion

Conservation alone provides information, which is orthogonal to that of size, that is powerful to help predict the biological relevance of a crystal contact. Conservation and size provide a potent combination for discriminating biological from non-biological contacts. Ultimately, size remains the most powerful discriminator, but conservation can discriminate between borderline cases.

Neural networks generalize the information from homodimer data well, using it to correctly infer biological

Dataset

The dataset of Ponstingl et al.12 was used to provide a starting point for further filtering. This comprised 172 non-homologous protein crystal structures of which 76 were homodimers and 96 were monomers. Atom coordinates were taken from the PDB.1 A program written by Hannes Ponstingl was used to generate hypothetical contacts for each structure. It works by applying crystallographic symmetry operations to a given protomer chain to recreate atom coordinates in the asymmetric unit of the

Acknowledgements

W.V. is funded by a BBSRC special studentship. We thank Hannes Ponstingl, Adrian Shepherd, Roman Laskowski, Irene Nooren and Thomas Kabir for helpful discussions and assistance with data preparation.

References (53)

  • O. Lichtarge et al.

    An evolutionary trace method defines binding surfaces common to protein families

    J. Mol. Biol.

    (1996)
  • O. Lichtarge et al.

    Identification of functional surfaces of the zinc binding domains of intracellular receptors

    J. Mol. Biol.

    (1997)
  • A. Armon et al.

    ConSurfan algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information

    J. Mol. Biol.

    (2001)
  • M. Vijayan et al.

    Lectins

    Curr. Opin. Struct. Biol.

    (1999)
  • M.P. Sasso et al.

    The differential pattern of tissue-specific expression of ruminant pancreatic type ribonucleases may help to understand the evolutionary history of their genes

    Gene

    (1999)
  • E.A. Merritt et al.

    Raster3Dphotorealistic molecular graphics

    Methods Enzymol.

    (1997)
  • H.M. Berman et al.

    The Protein Data Bank

    Nucl. Acids. Res.

    (2000)
  • C. Chothia et al.

    Principles of protein-protein recognition

    Nature

    (1975)
  • P. Lijnzaad et al.

    Hydrophobic patches on protein subunit interfacescharacteristics and prediction

    Proteins: Struct. Funct. Genet.

    (1997)
  • L. Lo Conte et al.

    The atomic structure of protein-protein recognition sites

    J. Mol. Biol.

    (1999)
  • W.S.J. Valdar et al.

    Protein-protein interfacesanalysis of amino acid conservation in homodimers

    Proteins: Struct. Funct. Genet.

    (2001)
  • L. Young et al.

    A role for surface hydrophobicity in protein-protein recognition

    Protein Sci.

    (1994)
  • H. Ponstingl et al.

    Discriminating between homodimeric and monomeric proteins in the crystalline state

    Proteins: Struct. Funct. Genet.

    (2000)
  • O. Carugo et al.

    Protein-protein crystal-packing contacts

    Protein Sci.

    (1997)
  • J. Janin

    Specific versus non-specific contacts in protein crystals

    Nature Struct. Biol.

    (1997)
  • J. Janin et al.

    Protein-protein interactions at crystal contacts

    Proteins: Struct. Funct. Genet.

    (1995)
  • Cited by (0)

    1

    Edited by F. Cohen

    View full text