[22] Using CLUSTAL for multiple sequence alignments
Abstract
We have tested CLUSTAL W in a wide variety of situations, and it is capable of handling some very difficult protein alignment problems. If the data set consists of enough closely related sequences so that the first alignments are accurate, then CLUSTAL W will usually find an alignment that is very close to ideal. Problems can still occur if the data set includes sequences of greatly different lengths or if some sequences include long regions that are impossible to align with the rest of the data set. Trying to balance the need for long insertions and deletions in some alignments with the need to avoid them in others is still a problem. The default values for our parameters were tested empirically using test cases of sets of globular proteins where some information as to the correct alignment was available. The parameter values may not be very appropriate with nonglobular proteins.
We have argued that using one weight matrix and two gap penalties is too simplistic to be of general use in the most difficult cases. We have replaced these parameters with a large number of new parameters designed primarily to help encourage gaps in loop regions. Although these new parameters are largely heuristic in nature, they perform surprisingly well and are simple to implement. The underlying speed of the progressive alignment approach is not adversely affected. The disadvantage is that the parameter space is now huge; the number of possible combinations of parameters is more than can easily be examined by hand. We justify this by asking the user to treat CLUSTAL W as a data exploration tool rather than as a definitive analysis method. It is not sensible to automatically derive multiple alignments and to trust particular algorithms as being capable of always getting the correct answer. One must examine the alignments closely, especially in conjunction with the underlying phylogenetic tree (or estimate of it) and try varying some of the parameters. Outliers (sequences that have no close relatives) should be aligned carefully, as should fragments of sequences. The program will automatically delay the alignment of any sequences that are less than 40% identical to any others until all other sequences are aligned, but this can be set from a menu by the user. It may be useful to build up an alignment of closely related sequences first and to then add in the more distant relatives one at a time or in batches, using the profile alignments and weighting scheme described earlier and perhaps using a variety of parameter settings.
We give one example using SH2 domains. SH2 domains are widespread in eukaryotic signalling proteins where they function in the recognition of phosphotyrosine-containing peptides.41 In the chapter by Bork and Gibson ([11], this volume), Blast and pattern/profile searches were used to extract the set of known SH2 domains and to search for new members. (Profiles used in database searches are conceptually very similar to the profiles used in CLUSTAL W: see the chapters [11] and [13] for profile search methods.) The profile searches detected SH2 domains in the JAK family of protein tyrosine kinases,42 which were thought not to contain SH2 domains. Although the JAK family SH2 domains are rather divergent, they have the necessary core structural residues as well as the critical positively charged residue that binds phosphotyrosine, leaving no doubt that they are bona fide SH2 domains.
The five new JAK family SH2 domains were added sequentially to the existing alignment of 65 SH2 domains using the CLUSTAL W profile alignment option. Figure 6 shows part of the resulting alignment. Despite their divergent sequences, the new SH2 domains have been aligned nearly perfectly with the old set. No insertions were placed in the original SH2 domains. In this example, the profile alignment procedure has produced better results than a one-step full alignment of all 70 SH2 domains, and in considerably less time. In this example, it is roughly five times faster to add the new sequences one at a time to the existing SH2 alignment than it is to recalculate the full alignment. It is also more accurate and gives the user greater control.
References (42)
- D.J Bacon et al.
J. Mol. Biol.
(1986) - G.J Barton et al.
J. Mol. Biol.
(1987) - D.G Higgins et al.
Gene
(1988) - S Pascarella et al.
J. Mol. Biol.
(1992) - A Krogh et al.
J. Mol. Biol.
(1994) - S Henikoff et al.
J. Mol. Biol.
(1994) - W Bains
Nucleic Acids Res.
(1986) - E Sobel et al.
Nucleic Acids Res.
(1986) - M.S Johnson et al.
J. Mol. Evol.
(1986) - W.R Taylor
CABIOS
(1987)
J. Mol. Evol.
J. Mol. Evol.
Nucleic Acids Res.
SIAM J. Appl. Math.
Mol. Biol. Evol.
CABIOS
CABIOS
Nature (London)
Mol. Biol. Evol.
Cited by (1429)
The membrane-cytoplasmic linker defines activity of FtsH proteases in Pseudomonas aeruginosa clone C
2024, Journal of Biological ChemistryPandemic Pseudomonas aeruginosa clone C strains encode two inner-membrane associated ATP-dependent FtsH proteases. PaftsH1 is located on the core genome and supports cell growth and intrinsic antibiotic resistance, whereas PaftsH2, a xenolog acquired through horizontal gene transfer from a distantly related species, is unable to functionally replace PaftsH1. We show that purified PaFtsH2 degrades fewer substrates than PaFtsH1. Replacing the 31-amino acid–extended linker region of PaFtsH2 spanning from the C-terminal end of the transmembrane helix-2 to the first seven highly divergent residues of the cytosolic AAA+ ATPase module with the corresponding region of PaFtsH1 improves hybrid-enzyme substrate processing in vitro and enables PaFtsH2 to substitute for PaFtsH1 in vivo. Electron microscopy indicates that the identity of this linker sequence influences FtsH flexibility. We find membrane-cytoplasmic (MC) linker regions of PaFtsH1 characteristically glycine-rich compared to those from FtsH2. Consequently, introducing three glycines into the membrane-proximal end of PaFtsH2’s MC linker is sufficient to elevate its activity in vitro and in vivo. Our findings establish that the efficiency of substrate processing by the two PaFtsH isoforms depends on MC linker identity and suggest that greater linker flexibility and/or length allows FtsH to degrade a wider spectrum of substrates. As PaFtsH2 homologs occur across bacterial phyla, we hypothesize that FtsH2 is a latent enzyme but may recognize specific substrates or is activated in specific contexts or biological niches. The identity of such linkers might thus play a more determinative role in the functionality of and physiological impact by FtsH proteases than previously thought.
Agrobacterium cucumeris sp. nov. isolated from crazy roots on cucumber (Cucumis sativus)
2023, Systematic and Applied MicrobiologyThree plant rhizogenic strains O132T, O115 and O34 isolated from Cucumis sp. L. were assessed for taxonomic affiliation by using polyphasic taxonomic methods. Based on the results of the sequence analysis of the 16S rRNA and multilocus sequence analysis (MLSA) of the three housekeeping genes atpD, recA and rpoB, all the strains were clustered within the genus Agrobacterium where they form a novel branch. Their closest relative was Agrobacterium tomkonis (genomospecies G3). Moreover, digital DNA-DNA hybridization (dDDH) and average nucleotide identity (ANI) comparisons between strains O132T and O34 and their closest relatives provided evidence that they constitute a new species, because the obtained values were significantly below the threshold considered as a borderline for the species delineation. Whole-genome phylogenomic analysis also indicated that the cucumber strains are located within the separate, well-delineated biovar 1 sub-clade of the genus Agrobacterium. Furthermore, the physiological and biochemical properties of these strains allowed to distinguish them from their closest related species of the genus Agrobacterium. As a result of the performed overall characterization, we propose a new species as Agrobacterium cucumeris sp. nov., with O132T (=CFBP 8997T = LMG 32451T) as the type strain.
Quantum mechanical analysis of excitation energy transfer couplings in photosystem II
2023, Biophysical JournalWe evaluated excitation energy transfer (EET) coupling (J) between all pairs of chlorophylls (Chls) and pheophytins (Pheos) in the protein environment of photosystem II based on the time-dependent density functional theory with a quantum mechanical/molecular mechanics approach. In the reaction center, the EET coupling between Chls PD1 and PD2 is weaker (|J(PD1/PD2)| = 79 cm−1), irrespective of a short edge-to-edge distance of 3.6 Å (Mg-to-Mg distance of 8.1 Å), than the couplings between PD1 and the accessory ChlD1 (|J(PD1/ChlD2)| = 104 cm−1) and between PD2 and ChlD2 (|J(PD2/ChlD1)| = 101 cm−1), suggesting that PD1 and PD2 are two monomeric Chls rather than a “special pair”. There exist strongly coupled Chl pairs (|J| > ∼100 cm−1) in the CP47 and CP43 core antennas, which may be candidates for the red-shifted Chls observed in spectroscopic studies. In CP47 and CP43, Chls ligated to CP47-His26 and CP43-His56, which are located in the middle layer of the thylakoid membrane, play a role in the “hub” that mediates the EET from the lumenal to stromal layers. In the stromal layer, Chls ligated to CP47-His466, CP43-His441, and CP43-His444 mediate the EET from CP47 to ChlD2/PheoD2 and from CP43 to ChlD1/PheoD1 in the reaction center. Thus, the excitation energy from both CP47 and CP43 can always be utilized for the charge-separation reaction in the reaction center.
Can spike fragments of SARS-CoV-2 induce genomic instability and DNA damage in the guppy, Poecilia reticulate? An unexpected effect of the COVID-19 pandemic
2022, Science of the Total EnvironmentThe identification of SARS-CoV-2 particles in wastewater and freshwater ecosystems has raised concerns about its possible impacts on non-target aquatic organisms. In this particular, our knowledge of such impacts is still limited, and little attention has been given to this issue. Hence, in our study, we aimed to evaluate the possible induction of mutagenic (via micronucleus test) and genotoxic (via single cell gel electrophoresis assay, comet assay) effects in Poecilia reticulata adults exposed to fragments of the Spike protein of the new coronavirus at the level of 40 μg/L, denominated PSPD-2002. As a result, after 10 days of exposure, we have found that animals exposed to the peptides demonstrated an increase in the frequency of erythrocytic nuclear alteration (ENA) and all parameters assessed in the comet assay (length tail, %DNA in tail and Olive tail moment), suggesting that PSPD-2002 peptides were able to cause genomic instability and erythrocyte DNA damage. Besides, these effects were significantly correlated with the increase in lipid peroxidation processes [inferred by the high levels of malondialdehyde (MDA)] reported in the brain and liver of P. reticulata and with the reduction of the superoxide dismutase (SOD) and catalase (CAT) activity. Thus, our study constitutes a new insight and promising investigation into the toxicity associated with the dispersal of SARS-CoV-2 peptide fragments in freshwater environments.
Genome-wide identification and analysis of the CNGC gene family in upland cotton under multiple stress conditions
2023, Journal of Cotton Research