Pitfalls in QSAR
Introduction
The development of quantitative structure-activity relationships (QSARs) is a science that has grown up without a defined framework, series of rules, or guidelines for methodology. The goal of QSAR is to develop models on a training set of compounds, these models will then allow for the prediction of the biological activity of related chemicals. Ideally these models should be simple, transparent and mechanistically comprehensible. At the end of the day, however, QSARs are predictive techniques based on the relationship, for a series of chemicals, between some form of biological activity and some measure(s) of physico-chemical or structural properties. As such, there are a number of limitations to the use and application of QSARs. It is the concern of the authors that these are often not appreciated, or may be forgotten by the developers of QSAR. To assist the developer of QSARs, and as the basis of this paper, lists of ‘essentials’ and ‘desirables’ for QSARs are listed in Table 1, Table 2, respectively.
There are three components to any QSAR, namely the biological data, physico-chemical and/or structural properties, and some form of statistical technique that relates the two. The aim of this paper is to review the potential pitfalls in the development of QSARs in relation to each of these three areas. It should be noted at the outset of this article that these comments represent the views of the authors, following a number of years not only developing their own QSARs, but also appraising the literature and being involved in the peer-review process of journal papers. The purpose of this article is not to be critical of extant literature, but to illustrate pitfalls. To do so, in most cases examples have been taken from the authors' own work.
Section snippets
Biological data
Knowledge of the information on which models are based is essential for the development of any predictive system. The function of a QSAR is to predict biological activity, which may be in terms of a pharmacological, toxicological or pesticide response. To enable predictions of biological activity to be made, QSARs are predictive models which are based, originally, upon some biological data. Too often, however, QSARs are developed for which little, or nothing, is known regarding the information
Descriptors of physico-chemical properties
The assumption of a QSAR is that the biological activity of a chemical is dependent, in some manner, on the physcio-chemical and/or structural properties of the chemical. Such a relationship becomes quantitative when a series of chemicals are considered. It is not the purpose of this section to review physico-chemical descriptors per se, excellent reviews exist elsewhere (cf. [27], [28], [29]), but to make some observations regarding the use of descriptors for the development of QSARs. Many of
Statistical analyses
A statistical technique is required to forge the link between the biological activities of a series of chemicals and their physico-chemical properties. Commonly these techniques range from linear least squares regression analysis, through to multivariate techniques including the use of principal component analysis and partial least squares, as well as a variety of neural networks. Different techniques are required for continuous and categoric data. All techniques have advantages and
Conclusions
A large number of pitfalls encountered in the development of QSARs are described in this paper. These range from issues with data quality, to appropriate use of physico-chemical descriptors and statistical techniques. The variety of potential pitfalls emphasises that QSAR is a multi-disciplinary practice. It requires biologists, chemists, and statisticians who have a feel for what they are attempting to do. Problems normally arise when specialists in one field make assumptions about subjects
Acknowledgements
Stimulating discussions with Dr Alex Tropsha from the School of Pharmacy, University of North Carolina at Chapel Hill, USA, are gratefully acknowledged.
References (51)
Toxicology
(2001)- et al.
Chem. Health Saf.
(1999) - et al.
Chemosphere
(1999) - et al.
Sci. Tot. Environ.
(1997) Sci. Tot. Environ.
(1991)- et al.
Sci. Tot. Environ.
(1998) - et al.
Eur. J. Pharm. Sci.
(1999) - et al.
J. Invest. Dermatol.
(1969) - et al.
J. Pharm. Sci.
(1995) - et al.
Aquat. Toxicol.
(2001)
J. Mol. Struct. (Theochem)
J. Mol. Graphics Mod.
Environ. Toxicol. Chem.
Toxicol. Meth.
SAR QSAR Environ. Res.
Water Pollut. Res. J. Can.
Analyst
SAR QSAR Environ. Res.
Environ. Health Perspect.
SAR QSAR Environ. Res.
Environ. Health Perspect.
Environ. Health Perspect.
Handbook of Carcinogenicity Potency and Genotoxicity Databases
Anal. Chem.
Cited by (322)
QSAR facilitating safety evaluation and risk assessment
2023, QSAR in Safety Evaluation and Risk AssessmentQuantitative structure-activity relationships (QSARs) in medicinal chemistry
2023, Cheminformatics, QSAR and Machine Learning Applications for Novel Drug DevelopmentQuantum similarity description of a unique classical and quantum QSPR algorithm in molecular spaces: the connection with Boolean hypercubes, algorithmic intelligence, and Gödel's incompleteness theorems
2023, Chemical Reactivity: Volume 1: Theories and PrinciplesSemi-automated harmonization and selection of chemical data for risk and impact assessment
2022, ChemosphereCitation Excerpt :For example, a strict selection of only high-quality data is required in a regulatory safety assessment context, disregarding low-quality information. Likewise, only high-quality data are considered when developing extrapolations for substances without available information, i.e., predictive approaches (Aurisano et al., 2019; Cronin and Schultz, 2003; Posthuma et al., 2019). A more inclusive approach (i.e., high data coverage but reduced average data quality) is suitable for screening level prioritization or substitution of chemicals across thousands of substances or for characterizing hundreds of chemicals associated with a given product life cycle (Aurisano et al., 2021a, 2021b, 2022; Fantke et al., 2020, 2021a; Tickner et al., 2019).
Prediction of degradability of micropollutants by sonolysis in water with QSPR - a case study on phenol derivates
2022, Ultrasonics SonochemistryCitation Excerpt :To the best of our knowledge, this is the first QSPR model applied to sonolysis as Advanced Oxidation Process in wastewater treatment and in water research the first model evaluated with extensive amounts of statistical methods, e.g. multiple validation methods, tests for chance correlation and multicollinearity. To address some problems mentioned in previous publications [40,41,43,47], the experimental data was obtained with a standardized setup and protocol under reproducible condition to ensure the homogeneity of the experimental dataset. The introduced workflow possesses a variety of statistical methods to ensure the stability and reliability of the obtained model to reduce many problems as mentioned before.
Topical drug delivery: History, percutaneous absorption, and product development
2021, Advanced Drug Delivery Reviews