The characterization of amino acid sequences in proteins by statistical methods

https://doi.org/10.1016/0022-5193(68)90069-6Get rights and content

Abstract

Three different but related comprehensive statistical analyses of amino acid sequences in proteins are described. The goal in each case is to search for evidence of significant sequence structure in individual proteins relative to a purely random arrangement of the amino acid residues and to attempt to relate any significant structure uncovered to the secondary and/or tertiary configuration of the protein.

In the first of these analyses, which is reviewed briefly in an appendix, amino acids are divided into subgroups according to a variety of side chain physical properties (e.g. polarity, hydrophobicity). Deviations from randomness are expressed in terms of correlation indices ϱij(c) which are composition normalized doublet frequencies. Here i and j denote membership in a particular group for the physical property chosen and c denotes the “lag”, that is the number of residues along the chain separating the doublet.

The other more refined analyses are described in some detail. For both of these each amino acid in a given protein is replaced by its appropriate value on a continuous physical property scale. Six such scales are employed: bulkiness, polarity, RF, pI, pK1 and hydrophobicity. The resulting amino acid index sequences are treated as discrete series and are analyzed first by means of serial correlation methods and subsequently by employing spectral analysis techniques. Periodicities exhibited in these series are evaluated statistically and speculations are made concerning the connection between such structure and protein configuration.

Although more than forty individual proteins whose primary sequences are known have been analyzed by these methods, results for the cytochrome c series, the hemoglobins and lysozyme are emphasized in the present paper. In the case of the cytochrome c family of proteins several relationships between primary sequence structure and “evolutionary order” are discussed. In addition, the results of several homogeneity studies are described in which the sequence structure of various portions of a given protein chain are compared.

References (24)

  • R.D.B. Fraser et al.

    J. molec. Biol

    (1965)
  • A. Krzywicki et al.

    J. Theoret. Biol

    (1967)
  • A.M. Liquori et al.

    J. molec. Biol

    (1967)
  • M.F. Perutz

    J. molec. Biol

    (1965)
  • C. Ramakrishnan et al.

    Biophys. J

    (1965)
  • J.S. Bendat et al.

    Measurement and Analysis of Random Data

    (1966)
  • R.B. Blackman et al.

    The Measurement of Power Spectra

    (1959)
  • E.J. Cohn et al.

    Proteins, Amino Acids and Peptides

    (1943)
  • J.T. Edsall et al.

    Biophysical Chemistry

    (1958)
  • R.A. Fisher
  • E.J. Hannan

    Time Series Analysis

    (1960)
  • A. Krzywicki et al.

    C. r. hebd. Séanc. Acad. Sci. Paris

    (1966)
  • Cited by (0)

    This work was supported in part by a contract [NOnr 228(21)] between the Office of Naval Research and the University of Southern California.

    1

    The authors are pleased to acknowledge the efforts of S. T. Imrich from the Science Center of North American Rockwell Corporation, who programmed the serial correlation and group analysis procedures and helped computationally in many ways throughout the program.

    Present address: The Weizmann Institute of Science, Rehovoth, Israel.

    View full text