Machine learning for the prediction of phenols cytotoxicity

(1) * Latifa Douali Mail (Department of Computer Science, Regional Centre of Training and Education (CRMEF) Marrakech-Safi, Morocco)
*corresponding author


Quantitative structure-activity relationships (QSAR) are relevant techniques that assist biologists and chemists in accelerating the drug design process and help understanding many biological and chemical mechanisms. Using classical statistical methods may affect the accuracy and the reliability of the developed QSAR models. This work aims to use a machine learning approach to establish a QSAR model for phenols cytotoxicity prediction. This issue concern many chemists and biologists. In this investigation, the dataset is diverse, and the cytotoxicity data are sparse. Multi-component description of the compounds has then been considered. A set of molecular descriptors fed the deep neural network (DNN) and served to train the DNN. The established DNN model was able to predict the cytotoxicity of the phenols at high precision. The correlation coefficient at the fitting stage was higher than other statistical methods reported in the literature or developed in the present work, specifically multiple linear regression (MLR) and shallow artificial neural networks (ANN), and was equal to 0.943. The predictive capability of the model, as estimated by the coefficient of determination on an external predictive dataset, was significantly high and was about 0.739. This finding could help implement many molecular descriptors relevant to describing the compounds, representing the effects governing the phenols' cytotoxicity toward Tetrahymena pyriformis, avoiding overfitting and outlier exclusion.


Cytotoxicity; Deep neural networks; Phenols; QSAR; Tetrahymena pyriformis; Risk assessment



