An Assessment of Amino Acid Exchange Matrices in Aligning Protein Sequences: The Twilight Zone Revisited

https://doi.org/10.1006/jmbi.1995.0340Get rights and content

Abstract

The sensitivity of most protein sequence alignment methods depends strongly on the quality of the comparison matrices used. These matrices, which assign weights or similarity scores to every possible amino acid substitution pair, are utilized to differentiate amongst the various possible alignments of two or more sequences. There are many ways to generate these exchange weights and new matrices are constantly published. There has been no overall assessment of these various matrices when applied in different alignment techniques and over many protein folds and families, both close and distant and with the use of several gap penalty values. In this work, a set of amino acid sequences matched by superposition of known protein tertiary topologies is used to test the alignment accuracy of the different method/matrix/penalty combinations. The comparisons show relatively similar results for the top scoring matrices, a preference for the global alignment method of Needleman and Wunsch, and the importance of matrix modification and optimized gap penalties. The relationship between the percentage identity in a resulting alignment and the level of correctness to be expected are given for the top-performing matrix, resulting in a better definition of the so-called "twilight zone". Estimates are made for the probability that two sequences, aligned at a certain level of residue percentage identity, are in fact unrelated.

References (0)

Cited by (163)

  • SVM and SVR-based MHC-binding prediction using a mathematical presentation of peptide sequences

    2016, Computational Biology and Chemistry
    Citation Excerpt :

    As a consequence of multiplication, with this way of encoding is possible loss of information regarding the separation of positive and negative sets for an AA that often occurs in one of the two sets and which has a negative score for substitution. In order to suppress whether such cases occurred in our models, instead of the standard BLOSUM62 matrix, we used VOGG matrix (Vogt et al., 1995) the modification of the BLOSUM62 matrix which has only positive values. In this way, all cases are separated without loss of information because by multiplying with positive values the ratio of frequencies does not change.

  • Optimization techniques in molecular structure and function elucidation

    2009, Computers and Chemical Engineering
    Citation Excerpt :

    Structural alignment (Bourne & Shindyalov, 2003) has been emerging recently as a technique that provides even more meaningful information for comparing proteins than sequence-based approaches. In fact, structural alignment is now used to assess various sequence alignment methods (Orengo et al., 1997; Vogt, Etzold, & Argos, 1995). Qualitatively, the structural alignment problem requires assigning amino acid residues of a given protein to amino acid residues of another given protein, in a way that highlights similarities between the two proteins.

  • Factors influencing estimates of coordinate error for molecular replacement

    2020, Acta Crystallographica Section D: Structural Biology
View all citing articles on Scopus
f1

Corresponding author

View full text