Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures

https://doi.org/10.1016/j.jmb.2004.12.032Get rights and content

We report the largest and most comprehensive comparison of protein structural alignment methods. Specifically, we evaluate six publicly available structure alignment programs: SSAP, STRUCTAL, DALI, LSQMAN, CE and SSM by aligning all 8,581,970 protein structure pairs in a test set of 2930 protein domains specially selected from CATH v.2.4 to ensure sequence diversity.

We consider an alignment good if it matches many residues, and the two substructures are geometrically similar. Even with this definition, evaluating structural alignment methods is not straightforward. At first, we compared the rates of true and false positives using receiver operating characteristic (ROC) curves with the CATH classification taken as a gold standard. This proved unsatisfactory in that the quality of the alignments is not taken into account: sometimes a method that finds less good alignments scores better than a method that finds better alignments. We correct this intrinsic limitation by using four different geometric match measures (SI, MI, SAS, and GSAS) to evaluate the quality of each structural alignment. With this improved analysis we show that there is a wide variation in the performance of different methods; the main reason for this is that it can be difficult to find a good structural alignment between two proteins even when such an alignment exists.

We find that STRUCTAL and SSM perform best, followed by LSQMAN and CE. Our focus on the intrinsic quality of each alignment allows us to propose a new method, called “Best-of-All” that combines the best results of all methods. Many commonly used methods miss 10–50% of the good Best-of-All alignments.

By putting existing structural alignments into proper perspective, our study allows better comparison of protein structures. By highlighting limitations of existing methods, it will spur the further development of better structural alignment methods. This will have significant biological implications now that structural comparison has come to play a central role in the analysis of experimental work on protein structure, protein function and protein evolution.

Introduction

The problem of aligning, or establishing a correspondence between, residues of two protein structures is fundamental in computational structural biology. In 1960, Perutz et al.1 showed, using structural alignment, that myoglobin and hemoglobin have similar structures even though their sequences differ. Functionally these two proteins are similar and are involved with the storage and transport of oxygen, respectively. Since then, researchers have continued to look for structural similarity in hope of detecting shared functionality. Because structural similarity is conserved more than sequence similarity, it can be used as a more powerful “telescope” to look back to earlier evolutionary history.

Structural alignment is carried out between two known structures, and is typically based on the Euclidean distance between corresponding residues, instead of the distance between amino acid “types” used in sequence alignment. Structural alignment methods are useful for organizing and classifying known structures.2, 3 Furthermore, for a newly determined structure, fast methods that correctly identify known structures that align with it are indispensable. Lastly, structural alignment methods provide the gold standard for sequence alignment.4, 5 Consequently, many methods for protein structure alignment have been developed, including those described by Taylor & Orengo,6 Subbiah et al.,7 Holm & Sander,8 Holm & Park,9 Kleywegt,10 Shindyalov & Bourne,11 Kedem et al.,12 Yang & Honig,13 Krissinel & Henrick14, 44 and those cited in a review by Koehl.15

Many studies compare sequence alignment methods.5, 16, 17 Given a separable scoring function, the optimal sequence alignment can always be found using dynamic programming.18 Unfortunately, it is not easy to find the parameters of a scoring function that best captures the similarity between amino acid residues. This has led to many studies of substitution matrices that produce biologically meaningful sequence alignments.19, 20 A common task of sequence alignment techniques is to scan existing databases of protein sequences in hope of detecting homologs of a newly found protein sequence. The exponential growth in the size of these sequence databases has led to the development of popular programs, such as BLAST21 or FASTA22 that employ faster heuristics yet find sub-optimal alignments. A large body of literature compares the performances of these heuristic sequence alignment methods. To address the underlying difficulty in identifying the best of many sub-optimal sequence alignments, many of these studies use structure similarity as a gold standard. For example, Brenner et al.17 use the hierarchical protein classification SCOP,23 which is based on structural and sequence similarity, as their gold standard for comparing sequence alignment programs. Others5, 24, 25 also evaluate sequence alignment programs using structural alignment.

When aligning two structures, the situation is reversed: while it is harder to find the optimal alignment, judging which is best among several alignments of a pair of structures is easy. Finding the optimal structural alignment is harder because the rotation and translation of one of the two structures with respect to the other must be found in addition to the alignment itself. Although an approximate optimal solution can be computed,26 it is expensive and all methods available to-date are heuristic. Similarity in structural alignment is geometric and captured by the cRMS deviation of the aligned atoms (generally the CA atoms). Other properties of structural alignments that are likely to be significant are the number of matched residues, and the number and length of alignment gaps. Clearly, better alignments match more residues, have fewer gaps and are more similar (of lower cRMS). Since these alignment properties are not independent (shortening the alignment or introducing many gaps can decrease the cRMS deviation), researchers have devised alignment scores that attempt to balance these values. In this study, we use SAS,7 SI,27 MI27 and GSAS, which is our variant of SAS that penalizes gaps. Deciding which alignment is the most geometrically similar is an easier question than evaluating if an alignment is biologically significant.28, 29

Previous evaluations of structural alignment methods use the CATH2 or SCOP23 classifications as a gold standard, and verify that pairs of structures that are classified the same are similar, whereas all other structure pairs are not. Novotny et al.30 assessed the performance of structural alignment methods, as part of their evaluation of structural alignment (or fold comparison) servers. Their study uses CATH as the gold standard, and queries the servers' databases using approximately 70 query structures. Sierk & Pearson29 compare receiver operating characteristic (ROC) curves31 to evaluate the success of different methods in detecting domains of the same homology or topology, as defined by CATH; they test one query in each of 86 families. Using SCOP as the standard, Leplae & Hubbard32 built a web-server that evaluates structural alignment methods by comparing their ROC curves. Descriptions of structural alignment methods sometimes include evaluations of the methods. For example, Gerstein & Levitt33 evaluated STRUCTAL using SCOP; Shindyalov & Bourne3 compared CE to DALI; Shapiro & Brutlag34 evaluated FoldMiner, VAST and CE by comparing their ROC curves, using SCOP.

In this study we conduct a large-scale computer experiment to compare protein structural alignment methods. We consider a set of 2930 sequence diverse domains from CATH v.2.42 and align all pairs of structures. The methods we test are (listed chronologically): (1) SSAP,6 (2) STRUCTAL,7, 33 (3) DALI,8, 9 (4) LSQMAN,10 (5) CE,11 and (6) SSM.14 Each alignment has a significance (or native) score assigned by the program that found it and four geometric match measures (SAS, SI, MI, GSAS). We first evaluate the above methods by comparing the ROC curves based on their native score, and the four geometric match measures. We take the view that structural alignment methods are, in effect, optimizers of the geometric match measures, and create a better optimizer, the “Best-of-All” method, which takes the best alignment found by all methods. Using this approach, we evaluate the performance of the programs by directly comparing the geometric match measures of their alignments. We also elaborate on problems in assuming the lack of structural similarity between structures that are classified differently by the gold standard. We end by taking another look at hard cases, i.e. ones where only one of the methods succeeds, and analyzing the behavior of the different methods on the four CATH classes (“Mainly α”, “Mainly β”, “Mixed α/β”, “Few Secondary Structure”). The total computer time used in this experiment is over 20,000 hours on a 2.8 GHz processor.

Our main conclusion is that structural alignment methods should be evaluated by comparing the alignments they find. We show that ROC curves are of limited value and that their ranking of the methods is not consistent with the ranking implied by the quality of the alignments the methods find. We also highlight the problems inherent in comparing similarity of pairs of objects based on a hierarchical classification (namely a tree) of those objects: a classification attempts to group objects that share some property but does not guarantee that pairs of objects in different classes are indeed different. We find that the objective geometric match measures provide more relevant information than the native scores given by each particular method. Of the different measures we use, GSAS and SAS perform best in separating good alignments from less good ones. We consider the set of the pairs that are in the same CATH fold class as well as the set of all pairs and find that certain structural alignment methods consistently outperform other methods. More specifically we find that the Best-of-All method (i.e. a combination of all six methods under study) finds the best results, followed by STRUCTAL and SSM. Finally, we identify a set of structurally similar pairs that are found only by a single method, providing a useful test set that will allow structural alignment methods to be improved.

Section snippets

Results

We compare six structural alignment methods by aligning all 4,290,985 pairs in a set of 2930 sequence-diverse CATH v.2.42 domains (2930×2929/2 pairs as (i,j) and (j,i) are treated once). These methods are implemented in the following programs: (1) SSAP,6 (2) STRUCTAL,7, 33 (3) DALI using DaliLite (v.1.8),8, 9 (4) LSQMAN,10 (5) CE,11 and (6) SSM.14 The set of structures†, includes 769 fold classes. The programs output the alignment native

Reservations about ROC curves

In our opinion, the number of disadvantages of using the ROC curves methodology for comparing structural alignment methods exceeds the number of potential advantages. ROC curves seem like a reasonable way to compare structural alignment methods because each curve evaluates a method with respect to an agreed gold standard (in the case of this study, CATH2); moreover, the methods' self-evaluation (i.e. their native scores) can be used without needing any additional geometric match measure. The

Geometric match measures of structural alignments

We aim to compare different structural alignments of pairs of structures by comparing their geometric match measures. There are two types of comparisons: (1) alignments found by different programs for the same pair(s) of structures. Here, the match measures must depend on the geometric and other properties of the particular alignments (cRMS, number of matched residues, number of gaps, and length of the proteins). (2) Alignments found by the same program. Here, we compare different alignments

Acknowledgements

This work was supported by the National Institutes of Health (GM063817) and National Science Foundation (CCR-00-86013). M.L. was also supported by the Fondation de l‘Ecole Normale Supérieure Chaires de Blaise Pascal. We are grateful to the authors of the structural alignment methods for making their programs available.

References (44)

  • M. Gribskov et al.

    Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching

    Comput. Chem.

    (1996)
  • A. Harrison et al.

    Quantifying the similarities within fold space

    J. Mol. Biol.

    (2002)
  • D. Kihara et al.

    The PDB is a covering set of small protein structures

    J. Mol. Biol.

    (2003)
  • Y. Satow et al.

    Phosphocholine binding immunoglobulin Fab McPC603

    J. Mol. Biol.

    (1986)
  • M.F. Perutz et al.

    Structure of myoglobin: a three-dimensional Fourier synthesis at 5.5 Angstrom resolution, obtained by X-ray analysis

    Nature

    (1960)
  • C.A. Orengo et al.

    CATH—a hierarchic classification of protein domain structures

    Structure

    (1997)
  • I.N. Shindyalov et al.

    An alternative view of protein fold space

    Proteins: Struct. Funct. Genet.

    (2000)
  • J.D. Thompson et al.

    BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs

    Bioinformatics

    (1999)
  • J.M. Sauder et al.

    Large scale comparison of protein sequence alignment algorithms with structure alignments

    Proteins: Struct. Funct. Genet.

    (2000)
  • L. Holm et al.

    DaliLite workbench for protein structure comparison

    Bionformatics

    (2000)
  • G.J. Kleywegt

    Use of non-crystallographic symmetry in protein structure refinement

    Acta Crystallog. sect. D

    (1996)
  • I.N. Shindyalov et al.

    Protein structure alignment by incremental combinatorial extension (CE) of the optimal path

    Protein Eng.

    (1998)
  • Cited by (240)

    • Sequence and structure alignments in post-AlphaFold era

      2023, Current Opinion in Structural Biology
    • A Sweep of Earth's Virome Reveals Host-Guided Viral Protein Structural Mimicry and Points to Determinants of Human Disease

      2021, Cell Systems
      Citation Excerpt :

      Protein structure models were built using NEST (Petrey et al., 2003; Xiang and Honig, 2001). Structural neighbor search was run on the set of modeled mosquito proteins using Ska (Petrey et al., 2003; Yang and Honig, 2000) and a structural alignment score (SAS) < 2.5Å was used as a cut-off to identify structurally similar proteins (Kolodny et al., 2005). The modeling and structural neighbor search pipeline reported structural neighbors for 10,895 (65%) mosquito proteins.

    • Extending molecular docking desktop applications with cloud computing support and analysis of results

      2019, Future Generation Computer Systems
      Citation Excerpt :

      The MDRR will search the database for previous docking results that have similar input files (receptor, ligand, and docking configuration) to the currently used one. In this implementation we have chosen to use DeepAlign, although using several structural alignment tools and combining the results can be beneficial [42]. A new thread, composed of a part that runs DeepAlign and part that compares the DeepScore value from the DeepAlign result to the user-provided threshold, is launched for each receptor pair.

    • Homologous protein detection

      2018, Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics
    View all citing articles on Scopus
    View full text