Journal of Molecular Biology
Regular articleGleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches1
Introduction
Protein structure determination inevitably lags far behind the explosive quantitative and qualitative (thanks to the determination of genome sequences of taxonomicaly diverse organisms) growth of sequence databases. It has been observed, however, that newly determined structures increasingly tend to fall into already known structural foldsMurzin 1996, Murzin 1998. This indicates that the number of folds (the basic types of globular domains) is finite and is unlikely to exceed a few thousandChothia 1992, Orengo et al 1994. Moreover, while it is difficult to estimate the total number of folds with a greater precision, it seems clear that for most of the widespread folds, representative structures are already available. Thus, it is highly probable that for any new protein sequence that does not have a significant compositional bias and, accordingly, is likely to form a globular domain(s)(Wootton, 1994), a structure with the same fold is present in the protein data bank (PDB;Bernsteinet al., 1977). In order to obtain structural information about a given protein domain, all one needs is to establish a reliable alignment with the sequence of one of the domains with a known structure. More frequently than not, however, this task is not trivial. Major transitions in the evolution of life appear to have been accompanied (or in part driven) by the origin of new protein families from preexisting ones when sequences rapidly diverge, while the structure remains basically conserved(Doolittle, 1995). This erosion of sequence information in the course of evolution is the major obstacle in making structural predictions using homology inferred from sequence similarity. Accordingly, a number of unexpected connections between protein families originally thought to be unrelated have been recently established by comparison of experimentally determined three-dimensional structuresHolm and Sander 1996, Holm and Sander 1997, Murzin 1996, Murzin 1998, Murzin and Bateman 1997.
In order to maximize the rate of structural prediction from protein sequences, increasing sensitivity of sequence comparison methods is critical. The subtle relationships discovered by structure-structure comparison may be considered the golden standard for sequence analysis methods. Those methods that are sufficiently powerful to detect at least some of the connections originally perceived as “structural only” should be expected to routinely produce non-trivial structural predictions. Most of the advanced sequence database search methods utilize information contained in multiple alignments. The recently developed PSI (Position-Specific Iterating)-BLAST method constructs a multiple alignment from the BLAST hits, converts it into a position-specific weight matrix and iterates the search using this matrix as the queryAltschul et al 1997, Altschul and Koonin 1998. Several in-depth studies of protein families as well as benchmarking experiments suggest that given the new level of protein sequence diversity coming from whole genome sequencing, this method may significantly increase our ability to detect subtle sequence similarities and, in particular, to make non-trivial structure predictions (Aravind and Koonin 1998, Aravind et al 1998; Huyneyet al., 1998;Mushegian et al 1997, Rychlewski et al 1998, Wolf et al 1999).
Here, using several previously described cases of relationships between protein families that have been deemed to be detectable only by structure-structure comparison, we show that with appropriate starting points, PSI-BLAST is capable of detecting, at the sequence level, many of these subtle similarities. We demonstrate that typically, the best starting points for the iterative search are those that produce the greatest diversity of hits in the first BLAST pass. We then investigate several new examples of unexpected structural inferences for highly conserved protein domains that have important functional and evolutionary implications.
Section snippets
The strategy for protein superfamily analysis using PSI-BLAST
For assessing the ability of PSI-BLAST to detect subtle similarity between proteins, we chose several cases where a relationship originally has been discovered by structure-structure comparison and has been deemed undetectable at the sequence level(Table 1). The examples include the classical case of structural similarity between actins, the HSP70 class of molecular chaperones and sugar kinases(Borket al., 1992), as well as more recently described relationships, such as those between antibiotic
Databases
Standard database searches were performed using the non-redundant (NR) protein database at the NCBI. The structural databases used here were PDB and SCOP (Structural Classification of Proteins;Murzin et al 1995, Hubbard et al 1999). SCOP employs a manual process to identify structural relationships between proteins and classifies them into a four-level hierarchy. This hierarchy from top to bottom reflects the protein structural class in terms of secondary structural elements (α-helices and
Acknowledgements
We are grateful to Michael Rozanov for his participation in the early stage of the HSP70 superfamily analysis.
1999 U.S. Government
References (91)
- et al.
Do aligned sequences share the same fold?
J. Mol. Biol.
(1997) - et al.
Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases
Trends Biochem. Sci.
(1998) RNA editing in mitochondria ofLeishmania tarentolaeandCrithidia fasciculata
Semin. Cell Biol.
(1993)- et al.
The Protein Data Banka computer-based archival file for macromolecular structures
J. Mol. Biol.
(1977) - et al.
Drosophila kelchmotif is derived from a common enzyme fold
J. Mol. Biol.
(1994) - et al.
The suppressor of hairless protein participates in notch receptor signaling
Cell
(1994) - et al.
Position-based sequence weights
J. Mol. Biol.
(1994) - et al.
Three-dimensional structure of cyclodextrin glycosyltransferase fromBacillus circulansat 3.4 Å resolution
J. Mol. Biol.
(1989) - et al.
DNA polymerase β belongs to an ancient nucleotidyltransferase superfamily
Trends Biochem. Sci.
(1995) - et al.
New structure-novel fold?
Structure
(1997)
Homology-based fold predictions forMycoplasma genitaliumproteins
J. Mol. Biol.
nolO and noeI (HsnIII) ofRhisobiumsp. NFR234 are involved in 3-O-carbamoylation and 2-O-methylation of Nod factors
J. Biol. Chem.
Apyrases (ATP disphosphohydrolases, EC 3.6.1.5)function and relationship to ATPases
Biochim. Biophys. Acta
Yeast protein controlling inter-organelle communication is related to bacterial phosphatases containing the Hsp70-type ATP-binding domain
Trends Biochem. Sic.
Prediction and analysis of coiled-coil structures
Methods Enzymol.
Chaperone-assisted protein folding
Curr. Opin. Struct. Biol.
O-sialoglycoprotease fromPasteurella haemolytica
Methods Enzymol.
Structural classification of proteinsnew superfamilies
Curr. Opin. Struct. Biol.
How far divergent evolution goes in proteins
Curr. Opin. Struct. Biol.
SCOPa structural classification of proteins database for the investigation of sequences and structures
J. Mol. Biol.
Cutting apart V(D)J recombination
Curr. Opin. Genet. Dev.
Exopolyphosphate phosphatase and guanosine pentaphosphate phosphatase belong to the sugar kinse/actin/hsp 70 superfamily
Trends Biochem. Sci.
The role of adehylyltransferase and uridylyltransferase in the regulation of glutamine synthetase inEscherichia coli
Curr. Top. Cell Reg.
Fold and function predictions forMycoplasma genitaliumproteins
Fold. Design
Closing the gap on DNA ligase
Structure
Crystal structure of an ATP-dependent DNA ligase from bacteriophage T7
Cell
Initiation of V(D)J recombination in a cell-free system
Cell
Golgi localization and functional expression of human uridine diphosphatase
J. Biol. Chem.
Non-globular domains in protein sequencesautomated segmentation using complexing measures
Comput. Chem.
Analysis of composiitonally biased regions in sequence databases
Methods Enzymol.
A neutral glycoprotease ofPasteurella haemolyticaA1 specifically cleaves O-sialoglycoproteins
Infect. Immun.
Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system
Nature
Gapped BLAST and PSI-BLASTa new generation of protein database search programs
Nucl. Acids Res.
Phosphoesterase domains associated with DNA polymerases of diverse origins
Nucl. Acids Res.
DNA polymerase β-like nucleotidyltransferase superfamilyidentification of three new families, classification and evolutionary history
Nucl. Acids Res.
Toprim-a conserved catalytic domain in tyupe IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins
Nucl. Acids Res.
A genome-based approach for the identification of essential bacterial genes
Nature Biotechnol.
Further characterization ofRenibacterium salmoninarumextracellular products
Appl. Environ. Microbiol.
Crystal structure of the RAG1 dimerization domain reveals multiple zinc-binding motifs including a ovel zinc binuclear cluster
Nature Struct. Biol.
Structural similarities between topoisomerases that cleave one or both DNA strands
Proc. Natl Acad. Sci. USA
Benzoyl-coenzyme A reductase (dearomatizing), a key enzyme of anaerobic aromatic metabolism. ATP dependence of the reaction, purification and some properties of the enzyme fromThauera aromaticastrain K172
Eur. J. Biochem.
Predicting functions from protein sequences-where are the bottlenecks?
Nature Genet.
An ATPase domain common to prokaryotic cell cycle proteins, sugar kinases, actin, and hsp70 heat shock proteins
Proc. Natl Acad. Sci. USA
The V(D)J recombination activating protein RAG2 consists of a six-bladed properller and a PHD fingerlike domain, as revealed by sequence analysis
Cell Mol. Life Sci.
The stringent response
Cited by (393)
The coordinated action of the enzymes in the L-lysine biosynthetic pathway and how to inhibit it for antibiotic targets
2023, Biochimica et Biophysica Acta - General SubjectsmTORC1: Upstream and Downstream
2022, Encyclopedia of Cell Biology: Volume 1-6, Second EditionMAR1 links membrane adhesion to membrane merger during cell-cell fusion in Chlamydomonas
2021, Developmental CellSearching protein space for ancient sub-domain segments
2021, Current Opinion in Structural Biology
- 1
Edited by J. M. Thornton