bims-aukdir 2026-06-07 papers

bims-aukdir

Biomed News

on Automated knowledge discovery in diabetes research

Issue of 2026–06–07
twenty-two papers selected by
Mott Given

Prospective Data Curation Enables High-Performance Artificial Intelligence for Diabetic Retinopathy Screening in a Resource-Limited Setting.
Electronic health record-derived machine learning model for hypoglycemia risk prediction in type 2 diabetes mellitus patients: development and validation.
Diabetes Management Through Glucose Dynamics Analysis Network: A Novel Approach for Accurate Blood Glucose Level Forecasting.
Development and external validation of an interpretable machine learning model for diagnosing coronary heart disease in patients with type 2 diabetes and MASLD.
Use of technology and AI tools in type 2 diabetes.
Development of autoencoder-guided attention-LSTM models for predicting nocturnal hypoglycemia risk in Type 1 Diabetes.
Detection of referable diabetic retinopathy using machine learning on routine clinical data.
Identification of BMI-related high-risk feature combinations for diabetes among young adults with normal baseline fasting plasma glucose using interpretable machine learning: a health check-up cohort study.
Predicting Anxiety in Individuals with Diabetes: A Comparative Analysis of Machine Learning Algorithms.
Personalized Type 1 Diabetes Management: Reinforcement Learning-Based Insulin Dosing and Glucose Forecasting.
Development of a machine learning-based classification model for diabetic foot in patients with type 2 diabetes: an exploratory analysis with SHAP interpretation.
Construction of precision clinical-proteomics risk model based on machine learning for predicting heart failure in type II diabetes mellitus.
Editorial: Future horizons in diabetes: integrating gut microbiota, AI, and personalized care.
The Validity and Reliability of Artificial Intelligence Chatbots in the Self-management of Diabetes.
Machine learning and SHAP interpretation for predicting coronary heart disease-diabetes comorbidity with dietary antioxidants.
Explainable ensemble machine learning for predicting diabetes mellitus and identifying key risk factors: a population-based study in northern Bangladesh.
Single-cell Sequencing and Machine Learning Identify Amino Acid Metabolism-related Biomarkers and Regulatory Mechanisms in Diabetic Foot Ulcers.
Development and temporal external validation of a high-specificity XGBoost rule-in model for diabetes in middle-aged and older Korean adults.
[Artificial intelligence in diabetic retinopathy screening].
Personalized non-invasive continuous glucose monitoring via multiparameter-informed machine learning.
Multimodal deep learning fusion model for assessment of fetal lung development in gestational diabetes mellitus and pre-eclampsia.
Proteomic signatures of early retinal neurodegeneration in type 2 diabetes mellitus.

Ophthalmol Sci. 2026 Jun;6(6): 101191

Prospective Data Curation Enables High-Performance Artificial Intelligence for Diabetic Retinopathy Screening in a Resource-Limited Setting.

Cameron M Ashrafzadeh, Milan Bahi, Amira Mostafa, Mostafa El Manhaly, Mohamed Ghoneim, Bassma Al-Bayoumy, Enas Khamis, Merna Mostafa, Heba Abdel Aziz, Nadine Khaled, Ahmed Souka, Paolo S Silva, Lloyd Paul Aiello, Mohamed Ashraf.

   Purpose: To determine whether a high-quality, prospectively curated dataset can, by itself, enable the development of robust and clinically effective artificial intelligence as a medical device (AIaMD) models for diabetic retinopathy (DR) screening, even with minimal artificial intelligence (AI) infrastructure. This study evaluates whether careful data curation, standardized acquisition, and rigorous grading processes can yield high-performing AI models for more-than-mild DR (MTM) and diabetic macular edema (DME) detection from ultra-widefield (UWF) fundus images.
Design: An evaluation of diagnostic technology.
Subjects: Patients with diabetes receiving imaging at Alexandria iCare Retina Reading Center.
Methods: A total of 152 025 UWF color images were collected between February 2022 and June 2024 at a UWF image reading center during routine screening. This was a cross-sectional study with patient consent obtained at the time of imaging. After excluding noncolor or ungradable images, 26 232 UWF color images from 5394 diabetic patients were used to train 2 Inception V3-based convolutional neural networks: one to detect MTM and another for OCT-confirmed DME. Images were split 80:10:10 by patient into training, validation, and test sets. Models were trained on red-green channel inputs using standard augmentation and Adam optimization (learning rate 0.0005). Prospective validation was performed on 12 698 additional images from 3096 patients collected between July and December 2024, following identical imaging and adjudication protocols.
Main Outcome Measures: Area under the curve (AUC), sensitivity, and specificity.
Results: In baseline testing, the MTM model achieved an AUC of 0.962 ± 0.003, sensitivity of 0.922 ± 0.001, and specificity of 0.873 ± 0.010; the DME model achieved an AUC of 0.879 ± 0.014, sensitivity of 0.809 ± 0.022, and specificity of 0.790 ± 0.023. In the prospective dataset, the MTM model maintained strong performance (AUC 0.949; sensitivity 0.86-0.89; specificity 0.85-0.89), while the DME model yielded an AUC of 0.821 with balanced sensitivity (0.76) and specificity (0.73). Gradient-weighted class activation mapping visualizations confirmed focus on clinically relevant lesions.
Conclusions: This study demonstrates that rigorous prospective data collection and quality control can produce high-performing AIaMDs even with limited AI engineering resources. Locally curated datasets aligned with regional populations, equipment, and workflows can yield reliable, regulation-ready tools that advance equitable DR screening in resource-limited settings.
Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords:  Artifical Intelligence; Diabetic Retinopathy; Diabetic macular edema; Screening

DOI:  https://doi.org/10.1016/j.xops.2026.101191
BMC Endocr Disord. 2026 May 30.

Electronic health record-derived machine learning model for hypoglycemia risk prediction in type 2 diabetes mellitus patients: development and validation.

Qian Ran, Xia Qi, Li Liu, Yunqiu Luo, Hong Cheng, Weiwei Xu, Xili Zhao.

   BACKGROUND: Hypoglycemia is a serious complication of diabetes. Early recognition of hypoglycemia can improve clinical prognosis, however, traditional diagnostic tools are often limited. Machine learning offers a promising approach for predicting adverse outcomes in diabetic patients.
OBJECTIVE: This study aims to develop and validate machine learning-based models to predict the risk of hypoglycemia in type 2 diabetes mellitus (T2DM) patients.
METHODS: A cohort study design was employed. Clinical data were collected from the electronic health record system. The dataset was randomly partitioned into training and validation subsets using a 7:3 ratio. Four machine learning algorithms, logistic regression (LR), Extreme Gradient Boosting (XGBoost), random forest (RF), and support vector machine (SVM) were implemented to develop hypoglycemia risk prediction models. Predictive performance was assessed using sensitivity, specificity, accuracy, precision, F1 score, and the area under the receiver operating characteristic curve (AUC).
RESULTS: 831 T2DM patients were included, the hypoglycemia incidence was 22.0%. In the training cohort, the AUC for the LR, XGBoost, SVM, and RF models were 0.82, 0.86, 0.84, and 0.80, and corresponding AUCs were 0.76, 0.78, 0.72, and 0.75 in the validation cohort. The XGBoost demonstrated the highest overall predictive performance. Feature importance analysis based on the XGBoost model identified creatinine, triglycerides, albumin, HbA1c, C-peptide, aspartate aminotransferase, hemoglobin, and sulfonylurea use as the most influential predictors of hypoglycemia risk.
CONCLUSIONS: The XGBoost model exhibited superior predictive performance for achieving the higher AUC, F1 score, greater accuracy, sensitivity and specificity. This model enables effective identification of T2DM patients who may require intensified monitoring or targeted interventions to prevent hypoglycemic events.
CLINICAL TRIAL NUMBER: Not applicable.

Keywords:  EHR; Electronic health record; Hypoglycemia; Machine learning model; Prediction model; T2DM; Type 2 diabetes mellitus

DOI:  https://doi.org/10.1186/s12902-026-02340-9
Diabetes Obes Metab. 2026 Jun 03.

Diabetes Management Through Glucose Dynamics Analysis Network: A Novel Approach for Accurate Blood Glucose Level Forecasting.

Pallavi M Sawant, Rajesh B Ghongade.

   BACKGROUND: Accurate real-time prediction of blood glucose (BG) levels is essential for improving insulin-dosing decision support systems, including closed-loop insulin delivery and bolus calculators. However, existing deep learning models often suffer from high computational complexity, limited utilization of physiological factors, and inadequate handling of temporal glucose dependencies.
METHODS: This study proposes Glucose Dynamics Analysis Network (GlucoDiaNet), a hybrid framework for BG prediction integrating spline interpolation for missing value handling, a Dilated Convolutional Residual Network (DilaConv-ResNet) for spatial-temporal feature extraction, Adamax optimization for feature selection and hyperparameter tuning, and a Bidirectional Long Short-Term Memory network for bidirectional sequence learning. The model was evaluated using the OhioT1DM dataset across multiple prediction horizons ranging from 30 to 60 min.
RESULTS: At the 30-min prediction horizon, GlucoDiaNet achieved a Root Mean Squared Error (RMSE) of 5.2435 mg/dL, Mean Absolute Error (MAE) of 4.3622 mg/dL, R2 value of 0.9948, and Mean Squared Error (MSE) of 29.3056. The proposed model consistently outperformed baseline models including LSTM, GRU, and TCN across both short- and long-term forecasting tasks while maintaining robust predictive performance at extended prediction intervals.
CONCLUSION: GlucoDiaNet effectively enhances blood glucose prediction by integrating efficient preprocessing, deep temporal modeling, and optimization strategies. The proposed framework demonstrates strong potential for future deployment in real-time and wearable diabetes monitoring systems, subject to further hardware-level validation and computational efficiency analysis.

Keywords:  BiLSTM; blood glucose prediction; continuous glucose monitoring; deep learning models; diabetes management

DOI:  https://doi.org/10.1111/dom.70930
Front Endocrinol (Lausanne). 2026 ;17 1830594

Development and external validation of an interpretable machine learning model for diagnosing coronary heart disease in patients with type 2 diabetes and MASLD.

Chunxia Deng, Ling Feng, Tingting Li, Suosu Wei, Huiming Zhu, Jie Lu.

   Introduction: Patients with type 2 diabetes mellitus (T2DM) and metabolic dysfunction-associated steatotic liver disease (MASLD) face substantially elevated coronary heart disease (CHD) risk, yet no machine learning diagnostic models exist specifically for this population. This study aimed to develop and validate an interpretable machine learning model for identifying CHD in T2DM-MASLD patients.
Methods: Using data from 1,269 patients (development cohort) and 1,058 patients (external validation cohort) from two Chinese hospitals, we compared seven machine learning algorithms. Angiographically confirmed CHD served as the diagnostic endpoint. Nine features were selected by univariate analysis, LASSO regression, and the Boruta algorithm. The best-performing model was selected based on comprehensive evaluation of discrimination, calibration, and clinical utility. Model interpretability was assessed using SHapley Additive exPlanations (SHAP), and external validation was performed in an independent cohort.
Results: Feature selection identified nine predictors: total cholesterol (TC), chest distress, apolipoprotein B (ApoB), male sex, triglycerides (TG), age, chest pain, red cell distribution width (RDW), and cardiac troponin (cTn). The XGBoost model achieved the best performance, with an AUC of 0.896 (95% CI, 0.862-0.930) in internal validation and 0.865 (95% CI, 0.837-0.893) in external validation, with excellent calibration (Brier score: 0.112). To facilitate clinical application, a freely accessible web-based calculator was developed for real-time individualized CHD risk prediction.
Discussion: This is the first interpretable machine learning model externally validated for CHD diagnosis in T2DM-MASLD patients, demonstrating robust performance using nine routinely available clinical parameters. The model's interpretability through SHAP analysis enhances clinical trust and supports individualized risk communication between physicians and patients to guide decisions regarding coronary angiography.

Keywords:  SHAP; XGBoost; coronary heart disease; external validation; machine learning; metabolic dysfunction-associated steatotic liver disease; type 2 diabetes mellitus

DOI:  https://doi.org/10.3389/fendo.2026.1830594
Best Pract Res Clin Endocrinol Metab. 2026 May 22. pii: S1521-690X(26)00052-7. [Epub ahead of print] 102130

Use of technology and AI tools in type 2 diabetes.

Jothydev Kesavadev, Mohamed Abdul Khader M, Anjana Basanth.

The growing burden and heterogeneity of type 2 diabetes mellitus (T2DM) require management strategies that extend beyond episodic clinic visits. Digital health technologies mHealth platforms, wearables, telemedicine, continuous glucose monitoring (CGM), and connected insulin delivery now generate high-frequency, real-world data that can support continuous care. Evidence suggests that the use of mHealth platforms among individuals with T2DM is associated with an approximate 0.5% reduction in HbA1c, highlighting their potential clinical benefit(1). When coupled with artificial intelligence (AI), these systems shift diabetes management toward prediction, personalization, and proactive intervention. This review appraises the clinical evidence for technology-enabled T2DM care and examines how AI improves risk stratification, detection of dysglycemia patterns, treatment optimization, and behavioural support. We discuss implementation challenges related to data quality, interoperability, governance, explainability, bias, regulatory oversight, and equity, with attention to low- and middle-income contexts. We propose a roadmap for translating AI-enabled diabetes ecosystems into routine care while maximizing effectiveness, safety, and accessibility.

DOI: https://doi.org/10.1016/j.beem.2026.102130
IEEE Trans Biomed Eng. 2026 Jun 01. PP

Development of autoencoder-guided attention-LSTM models for predicting nocturnal hypoglycemia risk in Type 1 Diabetes.

Konstantinos Tziavaras, Maria Athanasiou, Konstantina S Nikita.

OBJECTIVE: Nocturnal hypoglycemia (NH) is a major, often undetected risk for individuals with Type 1 Diabetes (T1DM). Current prediction models lack sufficient lead time for proactive, pre-sleep interventions. This study develops a novel, interpretable deep learning approach for predicting NH events up to 12 hours in advance.
METHODS: We propose a hybrid model that infuses physiological knowledge (simulated glucose absorption, insulin kinetics, and subcellular insulin signaling) into an attention-based Long Short-Term Memory (LSTM) network. An Autoencoder-based Attention Weight Mapping (A2WM) framework is introduced to trace model attention back to the original 24-hour inputs (CGM, meals, insulin) for interpretability. The model was developed using the OhioT1DM dataset (open-loop pump users) and externally validated on the SMARTDIAB dataset (open-loop pump users).
RESULTS: On the OhioT1DM test set, the model achieved an AUC of 0.9387, recall of 0.9333, and specificity of 0.8548. On the external SMARTDIAB cohort, it demonstrated robust generalizability, achieving an AUC of 0.8684, recall of 0.6316, and specificity of 0.9615. Decision Curve Analysis confirmed a high net clinical benefit, and the A2WM interpretations identified clinically relevant temporal features.
CONCLUSION: The proposed physiologically-informed, attention based LSTM model can accurately and interpretably predict 12 hour NH risk from historical data and scheduled basal insulin.
SIGNIFICANCE: This work provides a clinically valuable tool for proactive NH management, offering an extended lead time that empowers T1DM patients to implement preventive strategies (e.g., pre-sleep meal or insulin adjustments) well before hypoglycemia occurs.

DOI: https://doi.org/10.1109/TBME.2026.3698300
Front Med (Lausanne). 2026 ;13 1807809

Detection of referable diabetic retinopathy using machine learning on routine clinical data.

Young Joon Jeon, Jae Shin Song, Shubham Borghare, Youngju Lee, Young Wook Choi, Junghan Song, Soo Lim, Se Joon Woo.

   Background: Early detection of referable diabetic retinopathy (RDR) is crucial to prevent vision loss. We developed and validated a machine learning (ML) model using clinical and laboratory variables to predict RDR without ophthalmic imaging.
Methods: We enrolled 562 adults with diabetes who underwent fundus examination at a single tertiary center from June 2015 to December 2023, retrospectively and prospectively. RDR was defined as moderate nonproliferative diabetic retinopathy or worse, or diabetic macular edema. Predictors included demographic factors, diabetes duration, glycemic control, blood pressure, lipid profiles, and kidney function markers. Patients were randomly divided into training (n = 175) and validation (n = 387) sets. Four ML models were trained, and performance was evaluated using the area under the receiver operating characteristic curve (AUROC). Predictor importance was assessed using Shapley Additive Explanations (SHAP).
Results: In the validation set, the random forest achieved the highest performance, with an AUROC of 0.932 (95% confidence interval, 0.90-0.96), sensitivity of 85.8%, specificity of 91.2%, and accuracy of 87.9%. SHAP ranked 15 predictors, with age showing the highest importance, followed by diabetes duration, fasting glucose, body mass index, diastolic blood pressure, height, smoking history, Cystatin C, systolic blood pressure, hemoglobin A1c, weight, estimated glomerular filtration rate, total cholesterol, insulin use, and sex.
Conclusion: A random forest model using routinely available clinical data identified RDR without fundus imaging. It may serve as a practical tool for early detection of RDR in resource-limited settings, enabling timely referral and supporting integration into clinical decision support systems.

Keywords:  artificial intelligence; clinical decision support system; diabetic retinopathy; machine learning; random forest

DOI:  https://doi.org/10.3389/fmed.2026.1807809
Front Endocrinol (Lausanne). 2026 ;17 1850071

Identification of BMI-related high-risk feature combinations for diabetes among young adults with normal baseline fasting plasma glucose using interpretable machine learning: a health check-up cohort study.

Zhen Xu, Ying Zhang, Huachun Zhang.

  Body mass index (BMI) is an easily obtainable indicator for diabetes risk screening, but its residual risk value among young adults with normal fasting plasma glucose (FPG) remains insufficiently understood. This cohort study investigated the association between BMI and incident diabetes, its nonlinear risk pattern, and BMI-related risk structures among young adults with normal baseline FPG. Data were obtained from the Rich Healthcare Group health check-up database in China. Participants aged <40 years without diabetes at baseline, with complete BMI data and at least one follow-up visit, were included; those with baseline FPG <5.6 mmol/L were defined as the primary analytic population. Cox regression and restricted cubic spline analysis were used to examine the association between BMI and incident diabetes. Four machine learning models were compared, with logistic regression selected as the primary interpretable model and XGBoost used as an exploratory nonlinear model. SHapley Additive exPlanations were applied to interpret model-derived variable contributions. A total of 103,693 participants were included, and 266 incident diabetes events occurred during a median follow-up of 2.99 years. BMI was independently associated with incident diabetes in the multivariable Cox model (HR = 1.284, 95% CI: 1.250-1.319; P <0.001). Restricted cubic spline analysis showed a significant nonlinear association, with risk increasing more steeply beyond approximately 28 kg/m². In the validation set, logistic regression and XGBoost achieved ROC-AUC values of 0.812 and 0.817, respectively; however, their low PR-AUC values indicated limited ability to identify true positive cases under the very low event rate. SHAP analysis identified BMI as the most influential predictor in the exploratory XGBoost model and suggested possible model-derived joint contribution patterns involving triglycerides and systolic blood pressure, but formal Cox-based interaction testing did not confirm statistically significant multiplicative interactions. These findings suggest that BMI-related diabetes risk among normoglycemic young adults is nonlinear and embedded within a broader metabolic risk structure. Combining conventional regression with interpretable machine learning may support earlier identification and refined risk stratification of young adults at increased diabetes risk before fasting glucose becomes abnormal.

Keywords:  SHAP; body mass index; diabetes; interpretable machine learning; normal fasting plasma glucose; young adults

DOI:  https://doi.org/10.3389/fendo.2026.1850071
Probl Endokrinol (Mosk). 2026 May 20. 72(2): 54-60

Predicting Anxiety in Individuals with Diabetes: A Comparative Analysis of Machine Learning Algorithms.

H Bourkhime, N Qarmiche, S Benmaamar, N Lazar, M Omari, M Berraho, N Tachfouti, S El Fakir, H El Ouahabi, N Otmani.

Diabetes is a long-term costly burden that increases the vulnerability of individuals to develop anxiety disorders. Consequently, effective management of diabetes anxiety in diabetics can significantly improve overall patient care. This paper presents a comparative analysis of three machine learning algorithms, namely Logistic Regression (LR), Support Vector Machine (SVM), and Decision Tree (DT), in predicting anxiety among diabetics. A Moroccan dataset was utilized, and a grid search approach was employed for hyperparameters tuning.The findings demonstrate promising results in terms of the algorithms' performance. The Decision Tree algorithm exhibited the highest accuracy, achieving an impressive 96% in predicting anxiety among diabetics. SVM followed with an accuracy rate of 69%, while LR achieved 61%. These outcomes provide valuable insights for further research endeavors aimed at refining the prediction models.In conclusion, the study highlights the potential of machine learning algorithms in predicting anxiety disorders among individuals with diabetes. The high accuracy demonstrated by the Decision Tree model suggests its potential as a reliable tool in clinical settings. Further investigations are warranted to validate these results and explore the applicability of these models in real-world scenarios, ultimately enhancing the management and well-being of individuals with diabetes and comorbid anxiety disorders.

DOI: https://doi.org/10.14341/probl13459
JMIR Diabetes. 2026 Jun 03. 11 e79195

Personalized Type 1 Diabetes Management: Reinforcement Learning-Based Insulin Dosing and Glucose Forecasting.

Ernest M Taku, Vibhuti Gupta, Ashutosh Singhal.

   Background: Optimizing insulin dosing and predicting future glucose levels for people with type 1 diabetes is challenging due to the dynamic nature of glucose metabolism. Traditional static insulin regimens fail to adapt to individual variability in diet, physical activity, stress, and metabolic fluctuations, leading to suboptimal glycemic control. Reinforcement learning (RL) offers a promising alternative by enabling personalized, real-time insulin adjustments that improve the balance between hyperglycemia and hypoglycemia.
Objective: This study aims to develop a deep Q-network (DQN)-based RL system that dynamically personalizes insulin dosing recommendations using continuous glucose monitoring data, meal intake, and physical activity levels. By leveraging real-time data, the model adapts to patients' evolving physiological states, enhancing glucose control and patient safety.
Methods: We used the OhioT1DM dataset (2018 and 2020), which includes 8 weeks of continuous glucose measurements, insulin dosing records, and physical activity data for twelve people with type 1 diabetes. The RL agent was designed with a state representation consisting of recent blood glucose levels, insulin doses, and lifestyle factors over a 2-hour window. The 2-hour window was selected based on the known pharmacodynamic profile of rapid-acting insulin (peak action within 90-120 min), as well as the typical lag in glycemic response following meals or exercise. This window size captures both recent and delayed physiological effects while balancing data density and model stability. The action space included discrete insulin dose recommendations (eg, 0.5 U, 1 U, and 1.5 U). A reward function incentivized glucose levels within the target range (70-180 mg/dL) while penalizing extreme deviations. The DQN model was trained to maximize reward by learning optimal dosing strategies through iterative trial and error.
Results: Performance evaluation was conducted using both qualitative and quantitative metrics. Time-series analysis compared actual and predicted glucose levels, demonstrating effective glucose regulation. The RL model achieved a mean glucose level of 80.06 mg/dL, with a reward score of 10 during evaluation, indicating that most glucose predictions were maintained within the desired clinical range. This suggests the model has learned to regulate blood glucose effectively through adaptive insulin dosing. The root mean square error (12.39 mg/dL) was slightly higher than the mean absolute error (9.85 mg/dL), indicating stable predictions. Additionally, the percentage time in target range was 64.06%, suggesting that the model-maintained glucose within the clinically safe range for a majority of the time.
Conclusions: The DQN-based RL model demonstrated its effectiveness in personalized insulin dosing while minimizing the risk of hypo- and hyperglycemia. This suggests the model has learned to regulate blood glucose effectively through adaptive insulin dosing. This approach represents a significant advancement over conventional methods, offering a scalable and adaptive strategy for real-world diabetes management, along with enhancing clinical trust and transparency through explainability techniques.

Keywords:  adaptive insulin regimens; artificial intelligence; deep Q-network; health care; machine learning; personalized insulin dosing; reinforcement learning

DOI:  https://doi.org/10.2196/79195
Front Med (Lausanne). 2026 ;13 1806349

Development of a machine learning-based classification model for diabetic foot in patients with type 2 diabetes: an exploratory analysis with SHAP interpretation.

Yuting Pei, Zixin Zhang, Xianglan Hu, Tianshi Wei, Hengjun Liu, Ziyang Liu, Xiangyu Li, Wanqing Liu, Xiaofei Liu, Zhikui Tian.

   Background: Diabetic foot (DF) is one of the most severe complications of type 2 diabetes mellitus (T2DM), contributing to over 85% of diabetes-related lower limb amputations and a 5-year mortality rate comparable to certain cancers. Current diagnostic approaches face challenges including over-reliance on single-indicator screening, limited multimodal data integration, and lack of model interpretability.
Methods: A dataset integrating five modalities-sociodemographic characteristics, physiological indicators, traditional Chinese medicine (TCM) tongue features, plantar hardness metrics, and laboratory biomarkers-was prospectively collected from 391 patients (124 T2DM, 267 DF) at a single tertiary hospital between May 2019 and October 2022. The final model was constructed using 18 clinical features from sociodemographic, physiological, and laboratory modalities. Seven machine learning algorithms were developed and compared, and SHapley Additive exPlanations (SHAP) were used for interpretability analysis.
Results: LightGBM achieved optimal performance (accuracy: 88.61%, sensitivity: 87.76%, specificity: 90.00%, AUC: 0.9519). Key classification features included age, body mass index (BMI), creatinine (Cr), white blood cell count (WBC), and uric acid (UA).
Discussion: These features reflect general systemic inflammation, metabolic burden, and renal function rather than DF-specific pathology. The study contributes (1) an open-source multimodal DF dataset bridging TCM and Western medicine, (2) a classification tool that distinguishes DF from uncomplicated T2DM with reasonable accuracy as a potential supplementary screening instrument pending external validation, and (3) novel mechanistic insights suggesting that systemic inflammatory markers may play an important role in DF pathophysiology.

Keywords:  LightGBM; SHAP values; Traditional Chinese Medicine; diabetic foot; explainable artificial intelligence; machine learning; multimodal data integration; precision medicine

DOI:  https://doi.org/10.3389/fmed.2026.1806349
Nutr Metab Cardiovasc Dis. 2026 May 25. pii: S0939-4753(26)00268-1. [Epub ahead of print] 104806

Construction of precision clinical-proteomics risk model based on machine learning for predicting heart failure in type II diabetes mellitus.

Runnan Shen, Chaoyu Xie, Yechao Huang, Jiexin Li, Pinrong Dong, Kangyuan Huang, Qian Chen, Yingsi Ou, Yang Chen, Jingfeng Wang, Kai Huang, Yangxin Chen.

   BACKGROUND AND AIMS: Heart failure (HF) is a severe complication in type 2 diabetes mellitus (T2DM), but current risk stratification scores have limited predictive accuracy. We aimed to develop novel prediction tools integrating clinical variables with proteomics to improve risk stratification of hospitalization for HF in T2DM.
METHODS AND RESULTS: In this study, we included 2111 UK Biobank participants with T2DM but no prior HF, and profiled 2920 proteins to predict 10-year incident HF hospitalization. Participants were randomly divided into training (70%), tuning (10%), and validation (20%) sets.Three prediction models were developed: a Clinical model based on demographic characteristics, comorbidities, medication use, and laboratory indices; a Protein model based on 40 proteins selected by the Light Gradient Boosting Machine (LGBM); and the Clinical OMics and Protein ASSessment for Heart Failure (COMPASS-HF) model, which integrated both clinical variables and the LGBM-selected proteins. Models were evaluated for area under the curve (AUC), sensitivity, and specificity. During follow-up, 168 participants (7.96%) developed incident HF. The COMPASS-HF model showed better discrimination than the Clinical model, with an AUC of 0.897 (95% CI: 0.850-0.945) versus 0.790 (95% CI: 0.723-0.856). It also demonstrated higher sensitivity (0.882; 95% CI: 0.725-0.967) and consistent performance in subgroups. COMPASS-HF effectively stratified risk of hospitalization for HF, with cumulative incidence rates of 31.9% in the high-risk group and 1.2% in the low-risk group.
CONCLUSIONS: By combining clinical and proteomic variables, we developed a high-performance HF prediction model for T2DM, enabling precise risk stratification and informing early intervention strategies.

Keywords:  Heart failure; Machine learning; Risk prediction model; Risk stratification; Type 2 diabetes mellitus

DOI:  https://doi.org/10.1016/j.numecd.2026.104806
Front Endocrinol (Lausanne). 2026 ;17 1875810

Editorial: Future horizons in diabetes: integrating gut microbiota, AI, and personalized care.

Nazarii Kobyliak, Tetyana Falalyeyeva.



Keywords:  artificial intelligence; diabetes; gut microbiota; personalized care; type 2 diabetes mellitus

DOI:  https://doi.org/10.3389/fendo.2026.1875810
Comput Inform Nurs. 2026 Jun 01.

The Validity and Reliability of Artificial Intelligence Chatbots in the Self-management of Diabetes.

Feyza Dereli, Julide Gulizar Yildirim.

  Artificial intelligence (AI) chatbots are increasingly used to support diabetes self-management, yet their validity and reliability require systematic evaluation. This study aimed to evaluate and compare the validity and reliability of chatbot-generated responses to frequently asked questions in diabetes self-management. Five questions aligned with diabetes self-management parameters (knowledge/diagnosis, partnership in treatment, symptom recognition and management, and coping) were posed to 6 AI chatbots. Two experts assessed the responses using the Global Quality Score. Inter-rater reliability was analyzed using kappa statistics. Validity was evaluated via independent sample t test, Cronbach's alpha, and intraclass correlation coefficients. Google Gemini showed perfect agreement for both validity/usefulness and reliability (K=1.000, P=.002), as well as test-retest reliability (α=0.929, 86.3% agreement). ChatGPT 4.0 demonstrated perfect inter-rater agreement for validity for the usefulness (α=1.00, 100% agreement; K=1, P<.01) and 57.8% agreement (α=0.76; K=0.545, P>.05). However, it showed low reliability for test-retest. All chatbots were generally useful and reliable in symptom recognition and coping domains. Google Gemini provided superior information for diabetes self-management compared with other chatbots. However, due to rapid technological changes, continuous expert evaluations are recommended to ensure accuracy, reliability, usefulness, and ethical compliance.

Keywords:  artificial intelligence; chatbot; diabetes mellitus; patient care

DOI:  https://doi.org/10.1097/CIN.0000000000001573
Sci Rep. 2026 May 30.

Machine learning and SHAP interpretation for predicting coronary heart disease-diabetes comorbidity with dietary antioxidants.

Kangrong Li, Gaoming Zeng, Zixi Zhang, Jiayi Zhu, Siyuan Tan, Zhongjun Ma, Qiuzhen Lin, Zhenjiang Liu, Na Liu, Qiming Liu.

  Coronary heart disease (CHD) and diabetes mellitus frequently co-occur through shared mechanisms such as oxidative stress and inflammation. Whether specific dietary antioxidants mitigate CHD-diabetes comorbidity remains unclear. Using National Health and Nutrition Examination Survey (NHANES) 2005-2018 data (n = 9,279), we developed an interpretable machine-learning pipeline in which standardisation and Synthetic Minority Over-sampling Technique (SMOTE) were embedded inside each fold of tenfold cross-validation to prevent data leakage. Six algorithms (Random Forest, Light Gradient Boosting Machine (LightGBM), K-nearest neighbours, Naive Bayes, support vector machine, eXtreme Gradient Boosting (XGBoost)) were compared on discrimination, calibration and decision-curve net benefit. XGBoost achieved the highest AUC-ROC (0.774, 95% CI 0.759-0.788); Random Forest showed the lowest Brier score (0.111), the calibration slope closest to unity (0.939) and the highest net benefit, and was retained for interpretation. Weighted-quantile-sum regression showed an inverse association between the antioxidant composite and comorbidity risk (OR per quantile 0.87, 95% CI 0.80-0.95; P = 0.001). In mutually adjusted logistic regression, only magnesium retained an independent protective association (per 1 SD: OR 0.80, 95% CI 0.66-0.96; P = 0.016). SHAP identified theobromine (0.020) and lycopene (0.016) as leading protective contributors. Findings support targeted dietary-antioxidant strategies as candidate modifiable factors for cardiometabolic comorbidity prevention.

Keywords:  Comorbidity; Coronary heart disease; Diabetes mellitus; Dietary antioxidants; Lycopene; Machine learning; NHANES; SHAP interpretation; Theobromine

DOI:  https://doi.org/10.1038/s41598-026-51080-2
Sci Rep. 2026 May 30.

Explainable ensemble machine learning for predicting diabetes mellitus and identifying key risk factors: a population-based study in northern Bangladesh.

Most Nusrat Jahan Resma, Md Abdul Kayum, Pankaj Bhowmik, Md Earfan Ali Khondaker, Md Kaderi Kibria.

  Diabetes mellitus (DM) is an escalating global public health concern, with a rapidly increasing burden in low- and middle-income countries, including Bangladesh. Despite its growing prevalence and associated complications such as cardiovascular disease, kidney failure and stroke, comprehensive evidence on its determinants and predictive modeling at the population level remains limited. This study aimed to predict the DM and identify its associated risk factors using ensemble machine learning (EML) approaches among adults in northern Bangladesh. A community-based cross-sectional study was conducted among 1408 adults in Dinajpur district between March 25 and June 5, 2025, using structured and pilot-tested questionnaires administered through face-to-face interviews. Feature selection was performed using Recursive Feature Elimination, Random Forest importance and Best First Search methods. Six machine learning models were developed, followed by a stacking ensemble model to enhance predictive performance. Model evaluation was based on accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). Model interpretability was assessed using SHAP analysis, and findings were validated using multivariable logistic regression. The prevalence of DM was 15.1% in the study population. Among individual models, LightGBM demonstrated the highest performance (accuracy: 89.44%; AUC: 0.958 [95% CI 0.945-0.973]), followed by XGBoost (accuracy: 88.69%; AUC: 0.955 [95% CI 0.945-0.972]). The stacking ensemble model outperformed all base learners, achieving an accuracy of 91.67% and an AUC of 0.967 (95% CI 0.957-0.981). SHAP analysis identified age, family history of diabetes, BMI, weight, dietary behaviors (particularly low vegetable intake and added salt/sugar), family income, and gender as key predictors. Multivariable logistic regression confirmed these findings, showing that advancing age especially 51-60 years, female gender, family history of diabetes, hypertension, kidney disease and low vegetable consumption were independently associated with DM. Therefore, stacking-based ensemble learning significantly improves the predictive accuracy of DM while enabling robust identification of key risk factors. The consistency between machine learning and traditional statistical approaches strengthens the validity of the findings. These results highlight the importance of integrating advanced analytical methods into public health research to support early detection, targeted prevention, and evidence-based decision-making in resource-constrained settings such as northern Bangladesh.

Keywords:  Bangladesh; Diabetes mellitus; Ensemble machine learning; Logistic regression; Prevalence; Risk factors; SHAP analysis

DOI:  https://doi.org/10.1038/s41598-026-55482-0
Endocr Metab Immune Disord Drug Targets. 2026 May 21.

Single-cell Sequencing and Machine Learning Identify Amino Acid Metabolism-related Biomarkers and Regulatory Mechanisms in Diabetic Foot Ulcers.

Wenting Wang, Zhengguo Xia, Yin Wang, Fan Wang, Feng Han.

   INTRODUCTION: Diabetic foot ulcer (DFU) is a serious complication of diabetes with poor healing and high mortality, and effective diagnostic and treatment strategies are still insufficient.
METHODS: Single-cell RNA sequencing dataset GSE165816 was processed for quality control, normalization, dimensionality reduction, clustering, and annotation. Keratinocyte subsets were analyzed using pseudotime trajectory inference, cell-cycle profiling, and assessment of transcription factor activity. Cell-cell communication was evaluated through cadherin signaling analysis. Differential expression and enrichment analyses were performed to identify subgroup-specific functional pathways. A bulk RNA sequencing dataset (GSE134431) was utilized to screen amino acid metabolism-related genes using machine learning approaches, followed by external dataset validation, ROC analysis, molecular docking, and qPCR.
RESULTS: Single-cell RNA sequencing identified 13 cell types, among which keratinocytes showed significant heterogeneity. Four keratinocyte subsets were defined, among which Kera1 exhibited strong stemness, initiated differentiation toward Kera3, and showed distinct functional states across groups. Cell-cell communication analysis revealed enhanced cadherin signaling in DFU non-healing samples, particularly driven by Kera1 autocrine CDH1-CDH1 interactions. Transcription factor analysis highlighted CEBPA and KLF5 as key regulators of Kera1. Machine learning integrated with bulk RNA sequencing identified five amino acid metabolism-related genes (RPL13, ODC1, RPL22L1, GATM, GLUL) with diagnostic value. qPCR further confirmed the dysregulated expression of these genes in clinical samples. Molecular docking suggested ODC1 as a potential therapeutic target for Eflornithine.
DISCUSSION: The identification of Kera1-driven cadherin signaling and five key metabolic biomarkers offers a mechanism-based framework for clinical diagnosis and targeted therapy, potentially shifting DFU management toward more precise, molecular-level interventions.
CONCLUSION: Keratinocyte stemness and subtype-specific differentiation contribute to altered cadherin signaling in DFU, while key amino acid metabolism-related genes serve as diagnostic biomarkers and therapeutic targets.

Keywords:  Single-cell sequencing; amino acid metabolism; diabetic foot ulcers.; machine learning

DOI:  https://doi.org/10.2174/0118715303451779260511203500
Diabetol Metab Syndr. 2026 Jun 01.

Development and temporal external validation of a high-specificity XGBoost rule-in model for diabetes in middle-aged and older Korean adults.

Soo Myeong Kim, Jung Min Cho.

  Early identification of diabetes in older adults is essential for preventing complications, yet many high‑risk individuals remain undetected in community settings. Using recent cycles of the nationally representative Korea National Health and Nutrition Examination Survey (KNHANES 2020-2023), we developed and temporally validated an Extreme Gradient Boosting (XGBoost) model to rule-in diabetes among Korean adults aged ≥ 50 years. Candidate predictors included sociodemographic factors, health behaviors, anthropometric indices, blood pressure, medical history, and simple laboratory markers. Data from 2020 to 2022 were used for model development, with the 2023 cycle reserved as a temporal external validation cohort. We prespecified a high‑specificity rule‑in threshold based on the development cohort and evaluated discrimination (area under the receiver operating characteristic curve (AUROC) and average precision), calibration, Brier score, classification metrics, decision‑curve net benefit, and Shapley additive explanation (SHAP) values. In temporal external validation, the XGBoost model demonstrated robust performance (AUROC 0.868; average precision 0.646; Brier score 0.101) and achieved high rule-in accuracy (0.866), specificity (97.3%), positive predictive value (76.7%), and F1-score (0.521) at the prespecified threshold. Compared with logistic regression and random forest, the model showed superior rule-in performance and performed comparably to Light Gradient Boosting Machine (LightGBM), a gradient boosting framework based on decision tree ensembles, in terms of specificity and positive predictive value, while intentionally accepting reduced sensitivity consistent with a high-specificity design. SHAP analyses identified urine creatinine, urine specific gravity, urine albumin, total cholesterol and other lipids, body mass index, waist circumference, and a history of hypertension and dyslipidemia as major contributors to model predictions. These findings indicate that an XGBoost-based rule-in model using routinely collected survey variables can efficiently identify older Korean adults with a high probability of diabetes and may serve as a practical decision-support tool for prioritizing confirmatory testing and targeted screening in community settings with limited resources.

Keywords:  Diabetes mellitus, Type 2; Health surveys; Machine learning; Middle aged; Predictive value of tests

DOI:  https://doi.org/10.1186/s13098-026-02176-2
Orv Hetil. 2026 May 31. 167(22): 865-875

[Artificial intelligence in diabetic retinopathy screening].

Éva Volek, Melinda Pénzes, Balázs Szécsényi-Nagy, Ákos Skribek, Ádám Pál-Jakab, Viktor Kilin, Zsuzsanna Antus, Miklós Resch.

   INTRODUCTION: Visual impairment and blindness continue to represent a substantial disease burden in Hungary. According to national epidemiological data, the combined prevalence of bilateral blindness and severe visual impairment among individuals aged 50 years and older is approximately 0.9%, and international estimates suggest that around 90% of vision loss cases could be prevented or treated with appropriate care. However, the coverage of ophthalmic screening remains low, primarily due to the lack of targeted financing, limited ophthalmology workforce capacity, and the absence of a unified national screening protocol.
OBJECTIVE: The aim of our study is to review the professional, organizational, financial, legal and ethical conditions for the implementation of artificial intelligence-supported ophthalmic screening in Hungary, with a particular focus on diabetic retinopathy.
METHOD: We conducted a targeted narrative literature review of national epidemiological, human resource, and cost data, as well as an analysis of international diabetic retinopathy screening models and the European Union regulatory frameworks for medical devices and artificial intelligence, using sources selected based on clinical and public health relevance.
RESULTS: The level of Hungarian ophthalmological screening practice is insufficient to significantly reduce the burden of preventable vision impairment, primarily due to limited human resources and funding constraints. The current human resource capacity of the Hungarian ophthalmic care system is insufficient to provide the approximately one million diabetic fundus examinations required annually according to professional guidelines. Preventive and screening activities are not organized as dedicated services but are largely delivered as part of routine ophthalmic outpatient care, without separate financing. International experience indicates that the use of artificial intelligence as a decision-support or triage tool can reduce specialist workload while maintaining diagnostic accuracy.
CONCLUSION: Artificial intelligence-supported fundus screening systems have the potential to improve access to screening, consistency, and efficiency. The introduction of artificial intelligence-based fundus screening in Hungary would require the establishment of appropriate financing mechanisms, regulation of task-sharing involving optometrists and allied health professionals, and compliance with relevant regulatory and ethical frameworks. A transitional hybrid model - combining the pilot use of an internationally validated artificial intelligence system in parallel with launch of domestic development - may offer a realistic pathway toward a structured national screening program and contribute to reducing the disease burden of preventable blindness. Orv Hetil. 2026; 167(22): 865-875.

Keywords:  artificial intelligence; diabeteses retinopathia; diabetic retinopathy; egészségpolitika; fundus screening; health policy; human resources; humánerőforrás; mesterséges intelligencia; szemfenéki szűrés

DOI:  https://doi.org/10.1556/650.2026.33572
Biosens Bioelectron. 2026 May 28. pii: S0956-5663(26)00506-3. [Epub ahead of print]310 118874

Personalized non-invasive continuous glucose monitoring via multiparameter-informed machine learning.

Wangwang Zhu, Xi Li, Jiaqi Hou, Wenjun Li, Jingyun Li, Hao Zheng, Dachao Li, Zhihua Pu.

  Non-invasive continuous glucose monitoring (NCGM) based on reverse iontophoresis (RI) holds considerable promise for diabetes management; however, its clinical translation is constrained by limited predictive accuracy. This limitation primarily arises from the inter-individual differences and the dynamic variability during continuous monitoring. Here we present a multiparameter-informed machine learning framework using physiological factors of glucose, Na+, skin surface pH, temperature, and impedance to composite the effect and enable accurate and personalized NCGM. An accompanying flexible multimodal sensing platform is designed to simultaneously quantify glucose, Na+, and skin surface pH, together with skin temperature and impedance. A physics-based model first generates a physiologically grounded glucose estimate calibrated by Na+ and pH, which is subsequently refined using a CNN-LSTM-Attention architecture. Transfer learning is implemented to enable cross-individual generalization and personalized calibration. In-vivo studies in six healthy volunteers over 14 days demonstrate that the proposed multiparameter biosensing-informed machine learning approach achieves closer agreement with reference glucose profiles, reducing the overall mean absolute relative difference (MARD) from 19.01% to 14.47% and further to 10.51%. Clarke error grid analysis indicates enhanced clinical accuracy, with the highest proportion of data points located within zones A and B (99.45%). Shapley Additive Explanation (SHAP) analysis further reveals pronounced inter-individual variability in feature contributions, highlighting the model's capacity for adaptive, personalized weighting of physiological inputs. These results establish a multiparameter-informed strategy that improves the accuracy of RI-based glucose monitoring, providing a viable pathway toward precise and personalized NCGM.

Keywords:  Machine learning; Multi-parameter measurements; Non-invasive continuous glucose monitoring; Personalization; Reverse iontophoresis

DOI:  https://doi.org/10.1016/j.bios.2026.118874
Front Endocrinol (Lausanne). 2026 ;17 1832468

Multimodal deep learning fusion model for assessment of fetal lung development in gestational diabetes mellitus and pre-eclampsia.

Yanran Du, Chao Ji, Jing Jiao, Fei Xin, Yunyun Ren, Zhenwei Xia, Yi Guo, Jianqiao Zhou.

   Background: Fetal lung development is highly sensitive to adverse intrauterine conditions such as gestational diabetes mellitus (GDM) and pre-eclampsia (PE). Current clinical evaluation mainly relies on ultrasound imaging, but it provides limited information on related histological and molecular changes. This study aimed to develop a multimodal deep learning framework that combined ultrasound imaging features with molecular and histopathological data to assess fetal lung development.
Methods: Rat models of GDM and PE were established, and fetal lung ultrasound images were obtained. Fetal lung tissues were evaluated by histopathology. The expression of key proteins was analyzed by immunohistochemistry, Western blotting, and quantitative PCR. Gene sequencing was conducted, followed by differential expression and functional enrichment analyses. Deep learning algorithms were used for automated lung segmentation, quantitative feature extraction, and model development. By combining imaging features with molecular and histological data, a rat multimodal fusion model was constructed, which was then validated using human fetal lung ultrasound images through transfer learning and parameter optimization.
Results: In animal studies, significant differences were observed in multiple indicators of fetal lung development among normal, GDM, and PE groups, including quantitative histopathology, immunohistochemical protein expression, qPCR results, gene sequencing profiles, and functional enrichment analysis. The performance of the multimodal fusion model was better than that of the ultrasound-only and partially integrated models, achieving accuracies of 0.935 (95% CI: 0.898, 0.973) and 0.948 (95% CI: 0.919, 0.970) and average AUC of 0.954 (95% CI: 0.919, 0.984) and 0.955 (95% CI: 0.932, 0.979) in mid- and late- gestation, respectively. In clinical studies, 1,183 images of human fetal lungs were analyzed, and the classification model based on transfer learning showed superior performance, with accuracies of 0.835 (95% CI: 0.786, 0.894) and 0.874 (95% CI: 0.828, 0.907) and average AUCs of 0.830 (95% CI: 0.772, 0.890) and 0.857 (95% CI: 0.824, 0.893) in early and late trimester pregnancy, respectively.
Conclusions: This study demonstrated that integrating multimodal data improved the assessment of fetal lung development in GDM and PE. By linking imaging features with molecular and histopathological alterations, the proposed framework provides new methodological and biological insights and suggests a potential non-invasive strategy for monitoring fetal lung development in high-risk pregnancies.

Keywords:  deep learning; fetal lung development; gestational diabetes mellitus; multimodal; preeclampsia; ultrasound

DOI:  https://doi.org/10.3389/fendo.2026.1832468
PLoS Med. 2026 Jun;23(6): e1004868

Proteomic signatures of early retinal neurodegeneration in type 2 diabetes mellitus.

Huangdong Li, Ziyu Zhu, Shaopeng Yang, Weijing Cheng, Shaoying Tan, Zhuoyao Xin, Lei Zhang, Zhuoting Zhu, Shida Chen, Wenyong Huang, Wei Wang.

BACKGROUND: Retinal neurodegeneration is an early and independent feature of diabetic retinal disease and has been proposed as a window into the systemic neural consequences of diabetes, yet accessible molecular biomarkers and individualized prediction tools remain scarce. We aimed to identify circulating plasma protein signatures of diabetic retinal neurodegeneration (DRN) and to translate them into a clinically usable risk prediction system.
METHODS AND FINDINGS: In this multi-cohort prospective observational study, we integrated high-throughput plasma proteomics with longitudinal optical coherence tomography (OCT) in two independent populations. The discovery cohort comprised 1,492 participants had baseline plasma proteomics and OCT, and 1,218 were followed with repeated OCT over 6 years in Guangzhou Diabetic Eye Study (GDES). DRN was quantified by the annualized OCT-derived retinal nerve fiber layer thinning rate. In multivariable analyses adjusted for age, sex, smoking, systolic blood pressure, HbA1c, and diabetes duration, we identified 71 plasma proteins associated with development and progression of DRN. These proteins mapped onto pathways governing inflammatory immune recruitment, extracellular matrix remodeling, and microvascular homeostasis, providing a plausible biological basis for DRN. We developed a proteomics-based DRN model (Pro-DRN) using eight machine learning (ML) algorithms, including XGBoost and LightGBM. In the independent test set, Pro-DRN achieved a C-index of 0.860, rising to 0.908 when integrated with clinical variables. Compared with six conventional models, Pro-DRN improved discrimination (ΔC-index 0.137 to 0.159; all P < 0.001), reclassification (IDI 0.212 to 0.245; NRI 0.226 to 0.452; all P < 0.05). In the Hippisley model, the C-index increased from 0.739 (95% CI [0.670, 0.808]) to 0.898 (95% CI [0.858, 0.937]), with IDI 0.245 (95% CI [0.177, 0.318]), NRI 0.452 (95% CI [0.222, 0.673]) (both P < 0.001), and higher net benefit. The proteins most consistently driving model performance included ACTA2, COL6A3, and HSPG2. For clinical translation, we deployed the locked model as an interactive, web-based risk-assessment tool to support early DRN screening and longitudinal monitoring. Cross-ethnic external validation in UK Biobank (n = 502; recruited 2006-2010) reproduced core protein signals and consistent effect directions, confirming robustness across populations. Principal methodological limitation lies in single time point proteomic assessment.
CONCLUSION: In this multi-cohort study, we present a proteomics- and ML-based precision prediction system for DRN. Pro-DRN substantially enhanced early risk stratification beyond conventional clinical factors and may support targeted screening and timely neuroprotective interventions, advancing molecularly guided strategies for diabetic eye disease prevention.

DOI: https://doi.org/10.1371/journal.pmed.1004868