IN SILICO TOXICOLOGY IN DRUG DEVELOPMENT
In silico toxicology predicts chemicals hazard based on computational models, the underlying assumption of which is the relationship between a chemical structure and its biological activity. These models can be created by experts (e.g., structural alerts or read across) or created automatically (e.g., machine learning techniques). In silico toxicology models are applied in the earliest stages of drug development, where a compound needs only to exist virtually to be testable, as well as in risk assessment in time-critical cases where in vitro or in vivo testing is not feasible. Regulatory agencies currently accept in silico predictions within the ICH M7 guideline for impurity testing and within gap filling for the REACH legislation.
Expert methods use the knowledge and experience of experts to predict the toxicity for single compounds as well as whole compound classes. There are two main approaches used in expert methods: read-across (used to infer toxicity from other related compounds) and structural alerts (highlight potential hazards and help understanding the underlying mechanism).
​
Machine learning is a branch of artificial intelligence that uses sophisticated algorithms to give computers the ability to learn from the data and make predictions. A combination of algorithms, such as genetic algorithm, random forest, artificial neural network, and other machine learning algorithms may be used to optimize traditional QSAR models in predicting a drug’s toxicity or other biological activities.
​
Traditional machine learning usually refers to techniques such as k-nearest neighbors (kNNs), random forests, or support vector machines. To train traditional machine learning models, datasets of 100 compounds or more should be used. For training, the molecules have to be transformed into a suitable representation such as descriptors or fingerprints. A model can be either a regression model, predicting a continuous variable such as the LD50, or a classification model, such as a model for mutagenicity.
​
Neural networks and deep learning advantages for bioactivity/toxicity predictions are the flexibility with regard to the structural representation and the possibility of multi-task predictions. Deep learning method is able to extract meaningful features by itself, thus not needing any feature generation beforehand. By providing images, molecular graphs, 3D grids, or SMILES strings, the network can learn the necessary properties or patterns by itself, based on the assumption that all information needed is encoded in the structure.
​