Journal of Molecular Biology
Regular ArticleProtein Structure Comparison by Alignment of Distance Matrices
Abstract
With a rapidly growing pool of known tertiary structures, the importance of protein structure comparison parallels that of sequence alignment. We have developed a novel algorithm (DALI) for optimal pairwise alignment of protein structures. The three-dimensional co-ordinates of each protein are used to calculate residue—residue (Cα—Cα;) distance matrices. The distance matrices are first decomposed into elementary contact patterns, e.g. hexapeptide—hexapeptide submatrices. Then, similar contact patterns in the two matrices are paired and combined into larger consistent sets of pairs. A Monte Carlo procedure is used to optimize a similarity score defined in terms of equivalent intramolecular distances. Several alignments are optimized in parallel, leading to simultaneous detection of the best, second-best and so on solutions. The method allows sequence gaps of any length, reversal of chain direction and free topological connectivity of aligned segments. Sequential connectivity can be imposed as an option. The method is fully automatic and identifies structural resemblances and common structural cores accurately and sensitively, even in the presence of geometrical distortions. An all-against-all alignment of over 200 representative protein structures results in an objective classification of known three-dimensional folds in agreement with visual classifications. Unexpected topological similarities of biological interest have been detected, e.g. between the bacterial toxin colicin A and globins, and between the eukaryotic POU-specific DNA-binding domain and the bacterial λ repressor.
References (0)
Cited by (3605)
Unraveling the Aurora kinase A and Epstein-Barr nuclear antigen 1 axis in Epstein Barr virus associated gastric cancer
2023, VirologyAurora kinase A (AURKA) is one of the crucial cell cycle regulators associated with gastric cancer. Here, we explored Epstein Barr Virus-induced gastric cancer progression through EBV protein EBNA1 with AURKA. We found that EBV infection enhanced cell proliferation and migration of AGS cells and upregulation of AURKA levels. AURKA knockdown markedly reduced the proliferation and migration of the AGS cells even with EBV infection. Moreover, MD-simulation data deciphered the probable connection between EBNA1 and AURKA. The in-vitro analysis through the transcript and protein expression showed that AURKA knockdown reduces the expression of EBNA1. Moreover, EBNA1 alone can enhance AURKA protein expression in AGS cells. Co-immunoprecipitation and NMR analysis between AURKA and EBNA1 depicts the interaction between two proteins. In addition, AURKA knockdown promotes apoptosis in EBV-infected AGS cells through cleavage of Caspase-3, -9, and PARP1. This study demonstrates that EBV oncogenic modulators EBNA1 possibly modulate AURKA in EBV-mediated gastric cancer progression.
WASCO: A Wasserstein-based Statistical Tool to Compare Conformational Ensembles of Intrinsically Disordered Proteins
2023, Journal of Molecular BiologyThe structural investigation of intrinsically disordered proteins (IDPs) requires ensemble models describing the diversity of the conformational states of the molecule. Due to their probabilistic nature, there is a need for new paradigms that understand and treat IDPs from a purely statistical point of view, considering their conformational ensembles as well-defined probability distributions. In this work, we define a conformational ensemble as an ordered set of probability distributions and provide a suitable metric to detect differences between two given ensembles at the residue level, both locally and globally. The underlying geometry of the conformational space is properly integrated, one ensemble being characterized by a set of probability distributions supported on the three-dimensional Euclidean space (for global-scale comparisons) and on the two-dimensional flat torus (for local-scale comparisons). The inherent uncertainty of the data is also taken into account to provide finer estimations of the differences between ensembles. Additionally, an overall distance between ensembles is defined from the differences at the residue level. We illustrate the potential of the approach with several examples of applications for the comparison of conformational ensembles: (i) produced from molecular dynamics (MD) simulations using different force fields, and (ii) before and after refinement with experimental data. We also show the usefulness of the method to assess the convergence of MD simulations, and discuss other potential applications such as in machine-learning-based approaches. The numerical tool has been implemented in Python through easy-to-use Jupyter Notebooks available at https://gitlab.laas.fr/moma/WASCO.
The majority of SARS-CoV-2 therapeutic development work has focussed on targeting the spike protein, viral polymerase and proteases. As the pandemic progressed, many studies reported that these proteins are prone to high levels of mutation and can become drug resistant. Thus, it is necessary to not only target other viral proteins such as the non-structural proteins (NSPs) but to also target the most conserved residues of these proteins. In order to understand the level of conservation among these viruses, in this review, we have focussed on the conservation across RNA viruses, conservation across the coronaviruses and then narrowed our focus to conservation of NSPs across coronaviruses. We have also discussed the various treatment options for SARS-CoV-2 infection. A synergistic melding of bioinformatics, computer-aided drug-design and in vitro/vivo studies can feed into better understanding of the virus and therefore help in the development of small molecule inhibitors against the viral proteins.
The opportunities and challenges posed by the new generation of deep learning-based protein structure predictors
2023, Current Opinion in Structural BiologyThe function of proteins can often be inferred from their three-dimensional structures. Experimental structural biologists spent decades studying these structures, but the accelerated pace of protein sequencing continuously increases the gaps between sequences and structures. The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences.
In this review, we give an overview of the impact of this new generation of structure prediction tools, with examples of the impacted field in the life sciences. We discuss the novel opportunities and new scientific and technical challenges these tools present to the broader scientific community. Finally, we highlight some potential directions for the future of computational protein structure prediction.
Introducing mirror-image discrimination capability to the TSR-based method for capturing stereo geometry and understanding hierarchical structure relationships of protein receptor family
2023, Computational Biology and ChemistryWe have developed a Triangular Spatial Relationship (TSR)-based computational method for protein structure comparison and motif discovery that is both sequence and structure alignment-free. A protein 3D structure is modeled by all possible triangles that are constructed with every three Cα atoms of amino acids as vertices. Every triangle is represented using an integer (a key). The keys are calculated by a rule-based formula which is a function of a representative length, a representative angle, and the vertex labels associated with amino acids. A 3D structure is thereby represented by a vector of integers (TSR keys). Global or local structure comparisons are achieved by computing all keys or a set of keys, respectively. Many enzymatic reactions and notable marketed drugs are highly stereospecific. Thus, in this paper, we propose a modified key calculation formula by including a mechanism for discriminating mirror-image keys to capture stereo geometry. We assign a positive or a negative sign to the integers representing mirror-image keys. Applying the new key calculation function provides the ability to further discriminate mirror-image keys that were previously considered identical. As the result, applying the mirror-image discrimination capability (i) significantly increases the number of distinct keys; (ii) decreases the number of common keys; (iii) decreases structural similarity; (iv) increases the opportunity to identify specific keys for each type of the receptors. The specific keys identified in this study for the cases of without (not applying) and with (applying) mirror-image discrimination can be considered as the structure signatures that exclusively belong to a certain type of receptors. Applying mirror-image discrimination introduces stereospecificity to keys for allowing more precise modeling of ligand - target interactions. The development of mirror-image TSR keys of Cα atom, in conjunction with the integration of Cα TSR keys with all-atom TSR keys for amino acids and drugs, will lead to a new and promising computational method for aiding drug design and discovery.
Utilizing the scale-invariant feature transform algorithm to align distance matrices facilitates systematic protein structure comparison
2024, Bioinformatics