In silico ADME modelling: prediction models for blood–brain barrier permeation using a systematic variable selection method
Graphical abstract
Predictive models for blood–brain barrier permeation were derived using 116 diverse compounds, 324 molecular descriptors, VSMP, a systematic variable selection method and multiple linear regression. Validation tests demonstrate that the models possess excellent predictive power and can be applied to virtual screening studies.
Introduction
The escalating cost of drug discovery and development1 is due to the liabilities associated with the drugs that are often not identified until a compound reaches the clinic, the most expensive phase of pharma R&D. The liabilities in question include non-optimum values of absorption, distribution, metabolism and excretion, as well as toxicity, usually referred to as ADMET.2, 3, 4, 5, 6, 7, 8, 9, 10 Blood–Brain Barrier (BBB) Permeation of new chemical entities (NCEs) is one of the most important ADMET properties considered in drug discovery and development. The BBB, a complex cellular system consisting of endothelial cells of the brain capillaries, plays the role of maintaining the homeostasis of the central nervous system (CNS) by separating the brain from the systemic blood circulation.11 The distribution of potential drugs between the blood and the brain depends on the ability of compounds to penetrate the BBB. Lipophilic drugs can easily cross the BBB by passive diffusion; however, polar molecules normally do not cross the BBB, but sometimes active transport process facilitates their permeation.
In the search of new drugs targeted at CNS disease, the ideal drug candidates must be able to penetrate BBB effectively. On the other hand, peripherally acting drugs must have limited ability to cross BBB to avoid adverse CNS effects. In experiments, the relative affinity of a drug to the blood or brain tissue can be expressed in terms of the blood–brain partition coefficient, logBB = log(Cbrain/Cblood), where Cbrain and Cblood are the equilibrium concentrations of the drug in the brain and the blood, respectively. Experimental determination of BBB permeation is time consuming, expensive and requires a sufficient quantity of the pure compounds, often in radiolabelled form. This stringent criterion makes it not suitable for high-throughput screening of large compound libraries.3, 4 A reliable and easily applicable computational model12, 13 for predicting BBB permeation of drug candidates can help in early identification of compounds with poor BBB penetration profile, prior even to chemical synthesis and will therefore have a significant impact on drug discovery and development.
Various authors14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 have attempted to predict BBB permeation with molecular properties such as lipophilicity (logP), topological indices, polar surface area, quantum chemical descriptors, etc., some of the reported models suffer from the use of computationally intensive calculations, making them not amenable for virtual screening of large libraries of compounds.15, 18 The most common statistical methods applied in these studies are multiple linear regression14, 15, 16, 18, 19, 22, 23, 24, 25, 26, 27 (MLR), principal component analysis20 (PCA) and partial least squares regression17 (PLS). While the above statistical methods are successfully employed to develop predictive ADMET models, surprisingly, automated variable selection methods have received less attention, except for few scattered reports.5, 25, 29 We believe that automated variable selection approaches30, 31, 32, 33 provide the following advantages over the variable reduction methods such as PLS, PCA, etc., (a) they can provide multiple models, based on different combinations of properties, thus providing the end user with multiple solutions and (b) they help in better understanding the mechanism of the modelled phenomenon.5 These advantages prompted us to study the applications of the variable selection methods to generate predictive ADMET models and also to understand the molecular properties that influence the various ADMET properties.
In this paper, we describe the derivation of novel QSPR models for BBB permeation using VSMP,32 an efficient systematic variable selection method along with MLR. Significantly, this is the first report as of date describing the application of VSMP for predictive ADME model generation to the best of our knowledge. Further, we report the performance of the QSPR models based on internal and external validations using three datasets taken from the literature and compared them with other published computational approaches. The application of the models for the whole medicinal chemical space is demonstrated by performing virtual screening experiments, on structurally new and diverse datasets. As the models reported herein are based on computed properties, they appear as valuable tools for virtual screening, where selection and prioritization of candidates is required.
Section snippets
QSPR models for BBB Permeation using VSMP
QSPR models are typically generated from manually selected compounds and molecular properties, often chosen by intuition and experience. The recent advances in the field of computational chemistry have resulted in the easy calculation of many molecular descriptors34, 35, 36, 37 with potential applications in QSPR studies. Consequently, the process of selecting the best combination of descriptors, having high significance to a biological property becomes extremely difficult, particularly from a
Conclusion
In this paper, we have described global predictive models based on three and four descriptors, for blood–brain barrier (BBB) permeation using the largest dataset of 116 diverse drugs and drug-like compounds and 324 molecular descriptors as of date, to the best of our knowledge. For this aim, VSMP, a systematic variable selection method along with multiple linear regression (MLR), is employed for the first time. Unlike many of the reported approaches, we have reported multiple models with
Datasets and BBB permeation property
A database of 116 drugs and drug-like molecules with known logBB data from various literature sources14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 is created (Supplementary material) and the members of this database constitute training set, test sets I and II.
Training set
Eighty-eight drugs and drug-like molecules constitute the training set. The names and their corresponding logBB values are shown in Table 2. The training set consists of compounds with a wide range of molecular size and
Acknowledgements
We thank Drs. Rajgopal Srinivasan and B. Gopalakrishnan, Advanced Technology Centre, Tata Consultancy Services Limited and Mr. Akash Khandelwal for constructive discussions during the preparation of the manuscript.
References and notes (53)
- et al.
Adv. Drug Delivery Rev.
(1997) - et al.
Curr. Opin. Chem. Biol.
(2001) - et al.
J. Pharma. Sci.
(1997) - et al.
J. Pharma. Sci.
(1998) J. Pharma. Sci.
(1999)Eur. J. Med. Chem.
(2004)- et al.
Drug Discovery Today
(1998) - et al.
J. Biomol. Screening
(1999) - et al.
Eur. J. Pharmacol. Sci.
(1993) Pharm. Sci. Technol. Today
(1998)