bims-keminf Biomed News
on Cheminformatics
Issue of 2026–04–19
five papers selected by
Anish Gomatam, Liverpool John Moores University



  1. Regul Toxicol Pharmacol. 2026 Apr 15. pii: S0273-2300(26)00077-2. [Epub ahead of print] 106104
      Read-across is an expert-driven new approach methodology (NAM) used to fill gaps in chemical toxicity data. While qualitative read-across is widely used, quantitative read-across (qRAx) for deriving points of departure (PODs) has received limited attention. We compared the applicability domain, consistency, and conservatism of PODs derived from qRAx, in vitro data, and in silico predictions. Specifically, we first identified 41 substances evaluated by the U.S. EPA's Provisional Peer-Reviewed Toxicity Value (PPRTV) program for qRAx-derived oral chronic PODs. For these same substances, we generated PODs using: (i) in vitro-to-in vivo extrapolation from ToxCast bioactivity; (ii) database-calibrated in silico assessment from ToxValDB; and (iii) three quantitative structure-activity relationship (QSAR) models. Success rates for generating PODs varied considerably: qRAx 83% (34/41), ToxCast 22% (9/41), ToxValDB 66% (27/41), and QSAR 46-100% (19-41/41). qRAx yielded the most conservative PODs in the largest number of cases (44-54%). Combining multiple NAMs including at least one of ToxValDB or a QSAR model results in coverage exceeding 90% and including both in a tiered approach produces PODs that are on average within one order of magnitude of qRAx-derived values. We conclude that well-calibrated in silico methods can rapidly derive PODs with defined uncertainty, supporting time-sensitive health risk decisions.
    Keywords:  New Approach Methodologies; Point of Departure; QSAR; Read-across; Risk Assessment; in vitro-to-in vivo extrapolation
    DOI:  https://doi.org/10.1016/j.yrtph.2026.106104
  2. SAR QSAR Environ Res. 2026 Apr 15. 1-18
      Accurate prediction of acetylcholinesterase (AChE) inhibitory activity is important in drug discovery and environmental toxicology because AChE inhibition represents a key mechanism underlying neurotoxicity associated with pharmaceuticals and environmental contaminants. In this study, machine learning approaches were used to develop predictive models for AChE inhibitory activity using experimentally measured bioactivity data for small molecules targeting human AChE. A curated dataset containing 5795 molecules was compiled from BindingDB to support reliable model development. Fifteen predictive models were evaluated, including twelve individual machine learning and deep learning models and three hybrid fusion models, using multiple molecular representations such as physicochemical descriptors derived from RDKit and PaDEL and graph-based molecular structures. Among the individual models, tree-based ensemble methods demonstrated strong baseline performance, indicating that physicochemical descriptors capture important chemical features associated with AChE inhibition. Graph neural networks, particularly Graph Isomorphism Network effectively learn structural patterns related to inhibitory activity. To integrate complementary molecular information, a late-fusion hybrid framework combining descriptor-based predictions and graph-based representations was implemented using leakage-safe stacking with a Ridge regression meta-learner. Across ten independent train-test splits, the best-performing hybrid model integrating PaDEL-based XGBoost and GIN achieved r2 = 0.7400 ± 0.0138, demonstrating improved and stable predictive performance over individual models.
    Keywords:  AChE activity; QSAR; hybrid model; machine learning ensembles; physicochemical descriptors; predictive toxicology
    DOI:  https://doi.org/10.1080/1062936X.2026.2647201
  3. Int J Mol Sci. 2026 Mar 25. pii: 2968. [Epub ahead of print]27(7):
      Quantitative structure-activity/property relationship (QSAR/QSPR) is a well-established methodology widely used to model molecular properties based on structure and is applied in fields such as drug design and environmental protection. The knowledge and procedures developed and used in QSPR modelling will be applied to the validation of protein folding rate models. Understanding the protein folding process is considered one of the most important scientific topics, and identifying the fundamental factors responsible for protein folding has been the subject of intensive research over the past 30 years. Among the structural descriptors determining the protein folding rate, the length of the protein sequence, the content of regular secondary structures, and the average contact row distance between amino acids in the 3D structure are the most important. Comparative studies of different methods for predicting protein folding rates are occasionally published, and we conducted one such study. We found that the experimental data in literature databases and the data available online are inconsistent and scattered. This is partly due to differences in experimental data and protein sequence lengths, but more so due to the questionable quality of the models themselves. We observed very large deviations in the predictions of ln(kf) by some of the analysed models implemented as web servers. The root mean square errors (RMSEs) of some of the analysed models in predicting ln(kf) for a new external set of proteins are much larger than the RMSEs obtained for the same models on the training sets. External validation demonstrates that protein folding rate models available on web servers have accuracy for external protein sets comparable to that of a simple model based solely on the logarithm of protein chain length. This finding, which highlights the importance of external model validation as recommended by the OECD guidelines for QSAR validation, is fundamental and offers a new perspective for improving protein folding rate models by applying the knowledge and procedures used in the QSPR methodology.
    Keywords:  OECD guidelines; QSAR; QSPR; comparative analysis; experimental data deviation; external validation; model validation; protein folding rates; sequence length; web server
    DOI:  https://doi.org/10.3390/ijms27072968
  4. Anal Sci. 2026 Apr 13.
      Predicting odors from molecular structures is a long-standing challenge in chemoinformatics, especially in cases where structurally similar compounds, such as optical isomers, exhibit distinct odor perceptions. To address this, we developed a multi-stage odor prediction framework that integrates both molecular structures and olfactory receptor (OR) binding information. Recognizing that human olfaction is mediated by complex receptor-ligand interactions, we divided the process into three mechanistic stages: (1) prediction of molecular binding to ORs (classification), (2) estimation of binding strength (regression), and (3) prediction of odor presence based on receptor responses (classification). We further introduced a novel interpretability metric, Positive likeness, which estimates the contribution of specific receptors to the likelihood of each odor label. Using this framework, we demonstrated the ability to distinguish odor differences between optical isomers and to identify ORs that are potentially responsible for the perception of specific odor attributes. The model also enabled extrapolative odor prediction for molecules with unknown odor annotations, leveraging receptor information and label propagation. Our results highlight the importance of receptor-level descriptors in enhancing predictive performance and biological interpretability. This study provides a foundation for receptor-guided odor modeling and supports applications in fragrance design and sensory informatics.
    Keywords:  Machine learning; Multi-label classification; Odor prediction; Olfactory receptors; Positive likeness; Receptor binding profiles
    DOI:  https://doi.org/10.1007/s44211-026-00900-6
  5. ACS Omega. 2026 Apr 07. 11(13): 20400-20410
      Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer that accounts for 95% of cases of pancreatic cancer. It develops in the ducts and shows high drug resistance. In this study, we proposed a framework to predict the pharmacokinetic (PK) properties of repurposed drugs for PDAC using artificial intelligence (AI). Initially, the molecular features of repurposable drugs for PDAC were generated through three types of molecular descriptors: RDKit, MACCS, and ECFP6. Then, the corresponding absorption (Caco-2 cell permeability), distribution (volume of distribution), metabolism (CYP2C9 inhibitor), excretion (half-life), and toxicity (hERG) properties of the drugs were obtained from ADMETlab 3.0. We constructed AI models such as multilayer perceptron (MLP), random forest (RF), extreme gradient boosting (XGB), and one-dimensional convolutional neural network with different combinations of molecular descriptors as the input. The performance of the models was evaluated on an open-access data set, Therapeutics Data Commons (TDC), and using evaluation metrics. Our results show that the highest-performing molecular descriptor combination and AI models vary with respect to the PK properties. Models on the PDAC data set achieved a mean absolute error (MAE) of 0.18 (MACCS+XGB), Spearman correlation (SC) of 0.39 (MACCS+RF), area under the precision-recall curve (AUPRC) of 59.44% (MACCS+ECFP6+MLP), SC of 0.68 (RDKit+ECFP6+XGB), and SC of 0.77 (MACCS+RF) for absorption, distribution, metabolism, excretion, and toxicity, respectively. The corresponding values on the TDC data set are an MAE of 0.26 (RDKit+MACCS+MLP), an SC of 0.62 (MACCS+ECFP6+MLP), an AUPRC of 67.11% (RDKit+MACCS+ECFP6+1D-CNN), an SC of 0.39 (MACCS+RF), and an SC of 0.92 (RDKit+XGB/RDKit+MACCS+RF). These results suggest that combining molecular fingerprints with AI can effectively model PK properties. This approach supports the use of AI for accelerating drug repurposing, especially for disease conditions.
    DOI:  https://doi.org/10.1021/acsomega.5c11405