bims-aukdir 2026-04-12 papers

bims-aukdir

Biomed News

on Automated knowledge discovery in diabetes research

Issue of 2026–04–12
thirteen papers selected by
Mott Given

Lesion Learning Network with Relation Aware Transformer for Diabetic Retinopathy Grading.
Enhancing Community-Based Nursing Decision Support: Machine Learning Models for Diabetes Risk Prediction Using Home Health Nursing Notes.
Biochemical biomarker-Driven deep learning framework with SHAP-based feature interpretation for diabetes classification.
Machine Learning-Based Early Prediction of Gestational Diabetes Using First-Trimester Laboratory Parameters.
Enhancing Prediabetes and Diabetes Detection Through a Machine Learning-Enabled Self-Assessment Approach.
Predictive performance of artificial intelligence algorithms for gestational diabetes mellitus in pregnant women: a protocol for systematic review and meta-analysis.
Development and Multicenter External Validation of a Real-Time Artificial Intelligence Diagnostic System for Diabetic Peripheral Neuropathy Based on Wearable Devices.
Recent Advances in Modeling and Prediction of Blood Glucose in Type 1 Diabetes.
External validation and application of a machine learning-based model for diabetes progression in prediabetes.
Chatbots and Diabetes: Is There Gender Bias?
ChatGPT-5 versus other mainstream large language models in core diabetic retinopathy patient queries.
Real-world performance of open-source large language models in diabetes diagnosis.
A Manual of Procedures for the Generation of the AI-Ready and Exploratory Atlas for Diabetes Insights (AI-READI) Database.

IEEE J Biomed Health Inform. 2026 Apr 10. PP

Lesion Learning Network with Relation Aware Transformer for Diabetic Retinopathy Grading.

Hao Liang, Zhaoshui He, Zhijie Lin, Wenqing Su, Jing Guo, Yunxian Wang, Jixing Liang.

Diabetic Retinopathy (DR) is a leading cause of permanent blindness due to the difficulty of early screening. In this context, deep-learning-based automatic DR grading has the potential to significantly improve the diagnostic efficiency of ophthalmologists. However, accurate DR grading remains challenging due to intra-class variations and small lesions. To address this problem, a Lesion Learning Network with a Relation Aware Transformer (LLNet) is proposed in this paper to achieve precise DR grading. Specifically, a Lesion Information Extractor (LIE) is designed to recognize DR-related lesions and extract their fine-grained features through lesion annotations training or lesion-based contrastive learning. Then, the Lesion Saliency Transformer (LST) captures a discriminative lesion-salient sequence by interactively fusing LIE-extracted lesion features with self-attention features, thereby enhancing the perception of small lesions. Finally, the Lesion Relation Aware Transformer (LRAT) is proposed to establish an efficient relational model between DR lesion conditions and severity grades, thereby improving robustness to intra-class lesion variations and enabling more accurate grading predictions. Moreover, an adaptive lesion-learning strategy is designed for LLNet to conduct image-level learning and lesion-annotation-guided learning, thereby facilitating efficient utilization of DR data for generalization. Precision comparison experiments were conducted on the FGADR, DDR, and APTOS datasets. The proposed LLNet achieved accuracies of 82.7%, 86.1%, and 87.2%, respectively, surpassing comparative methods. Furthermore, the superior generalization capability of LLNet was validated by training on the DDR dataset and testing on the EyePACS dataset, achieving an accuracy of 72.1%.

DOI: https://doi.org/10.1109/JBHI.2026.3682783
Public Health Nurs. 2026 Apr 09.

Enhancing Community-Based Nursing Decision Support: Machine Learning Models for Diabetes Risk Prediction Using Home Health Nursing Notes.

Doyeon Lim, Aeri Kim, Hana Lee, Kyungmi Woo.

   OBJECTIVES: This study aimed to identify high-risk factors for type 2 diabetes and develop a machine learning (ML)-based diabetes prediction model using nursing notes from home health care.
DESIGN: Retrospective cohort study.
SAMPLE: A total 7896 medical records from 1747 patients aged ≥ 20 years who received home health care at a university hospital in South Korea over 10 years.
MEASUREMENTS: The patient's sociodemographic characteristics, diagnosis codes, and clinical history as well as narrative nursing notes detailing nursing assessments and descriptions of the patient's condition were recorded. Logistic regression was used to identify risk factors, and five machine learning models were constructed using 10-fold cross-validation.
RESULTS: Female sex, depression, hypertension, hyperlipidemia, nursing services such as dressing and nutritional care, and dysuria were identified as prediabetes-related symptoms extracted from nursing notes using natural language processing (NLP). The Random Forest model demonstrated the highest predictive performance (AUC = 0.985).
CONCLUSIONS: Nursing documentation is a valuable resource for early diabetes risk screening. High predictive performance was achieved by integrating structured and unstructured data, prediabetes-related symptoms extracted from nursing notes. This approach enhances nurses' ability to deliver timely, personalized interventions and improve preventive care outcomes, with significant benefits for patients receiving home care who require continuous support.

Keywords:  diabetes mellitus; electronic health records; home health nursing; machine learning; signs and symptoms

DOI:  https://doi.org/10.1111/phn.70122
Biophys Chem. 2026 Apr 01. pii: S0301-4622(26)00034-7. [Epub ahead of print]334 107601

Biochemical biomarker-Driven deep learning framework with SHAP-based feature interpretation for diabetes classification.

Salman Khan.

  Diabetes mellitus is a long-term metabolic condition that develops when the body cannot produce insulin effectively or use it properly. Individuals usually progress through a clinical spectrum that begins with normal glucose regulation, moves into a prediabetic state, and may eventually advance to type 2 diabetes. The continuous global rise in diabetes cases, associated mainly with sedentary lifestyles, unhealthy dietary patterns, and broader environmental pressures, has created an urgent need for diagnostic methods that enable early detection and reduce the likelihood of serious complications involving the kidneys, eyes, heart, and other vital organs. In this study, we present a predictive model that integrates a Deep Neural Network with feature ranking and a statistical algorithm to improve the early identification of diabetes. To enhance model interpretability, Shapley Additive exPlanations (SHAP) were applied to identify the most influential features in predicting outcomes. Through extensive experiments using 10-fold cross-validation, the proposed method achieved an average accuracy of 95.72% demonstrated clear improvements over traditional machine learning models and several recent benchmark approaches. The findings highlight the importance of early screening supported by advanced analytical tools and emphasize the role of broader socioeconomic factors, including urbanization, dietary changes, and variations in healthcare access, in shaping effective diabetes prevention and management strategies.

Keywords:  Biological phenomena; Deep learning; Diabetes; Insulin; Machine learning; SHAP

DOI:  https://doi.org/10.1016/j.bpc.2026.107601
Cureus. 2026 Mar;18(3): e104782

Machine Learning-Based Early Prediction of Gestational Diabetes Using First-Trimester Laboratory Parameters.

Jenifar Prashanthan, Amirthanathan Prashanthan.

  Background and aim Gestational diabetes mellitus (GDM) affects 6-15% of pregnancies globally and is traditionally diagnosed at 24-28 weeks of gestation. Early identification of high-risk women during the first trimester could enable timely interventions and improved pregnancy outcomes. This study aimed to develop and evaluate machine learning models for early GDM prediction using first-trimester clinical and laboratory parameters. To achieve this aim, the study has the following five key objectives: first, to generate a clinically representative synthetic dataset incorporating demographic characteristics, clinical risk factors, and first-trimester laboratory parameters; second, to implement comprehensive feature selection methodologies to identify optimal predictors from candidate variables; third, to systematically evaluate multiple machine learning algorithms with hyperparameter optimization; fourth, to assess model interpretability using SHapley Additive exPlanations (SHAP) analysis; and fifth, to establish clinically actionable threshold values for first-trimester biomarkers. Methods A synthetic dataset of 10,000 patient records was generated using evidence-based probabilistic modeling, incorporating demographic characteristics (maternal age, pre-pregnancy BMI, ethnicity), clinical risk factors (family history of diabetes, previous GDM, polycystic ovary syndrome (PCOS), previous macrosomia), and first-trimester laboratory parameters (random blood sugar, post-prandial blood sugar, HbA1c, and oral glucose tolerance test {OGTT} values). Seven feature selection methodologies were employed to identify optimal predictors from 18 candidate variables. Eleven machine learning algorithms were systematically evaluated, with hyperparameter optimization performed via GridSearchCV (France, Le Chesnay-Rocquencourt: INRIA) using 10-fold stratified cross-validation. Model interpretability was assessed using SHapley Additive exPlanations (SHAP) analysis. Results The Multi-layer Perceptron neural network achieved optimal performance, with an F1-score of 0.7213, an accuracy of 71.7%, and an AUC-ROC of 0.7692 on the independent test set. Feature importance analysis identified early HbA1c as the primary predictor (importance score: 0.405), followed by pre-pregnancy BMI (0.291) and family history of diabetes (0.271). SHAP analysis confirmed these findings, with family history demonstrating the highest mean absolute SHAP value. Clinically actionable thresholds were identified as follows: early RBS ≥125 mg/dL (borderline) and ≥140 mg/dL (concerning); early PPBS ≥160 mg/dL (borderline) and ≥180 mg/dL (concerning); and HbA1c ≥5.7% (intermediate risk), ≥6.0% (high risk), and ≥6.5% (diagnostic). Conclusions First-trimester laboratory parameters, particularly HbA1c combined with clinical risk factors, enable effective early GDM risk stratification with clinically acceptable accuracy. The machine learning framework demonstrates potential for enhancing prenatal screening through personalized risk assessment, though prospective validation in real-world clinical populations is essential before implementation.

Keywords:  early prediction; first trimester screening; gestational diabetes mellitus; machine learning; maternal health; shap analysis

DOI:  https://doi.org/10.7759/cureus.104782
J Clin Epidemiol. 2026 Apr 06. pii: S0895-4356(26)00141-1. [Epub ahead of print] 112266

Enhancing Prediabetes and Diabetes Detection Through a Machine Learning-Enabled Self-Assessment Approach.

Daniel Yoo, Umberto Maggiore, Olivier Jolliet.

   OBJECTIVES: Reliable, accessible, non-invasive self-assessment screening for prediabetes/diabetes is lacking, leading to missed opportunities for early intervention. We aimed to develop and externally validate a machine learning (ML)-derived self-assessment system to predict the likelihood of prevalent prediabetes or diabetes using easily accessible health parameters.
STUDY DESIGN AND SETTING: We analyzed 30 years (1988-2018) of NHANES data (N=17,458). ML models predicting prediabetes/diabetes risk (composite outcome: fasting plasma glucose ≥ 100 mg/dL or HbA1c ≥ 5.7% [≥ 39 mmol/mol]) were developed using multimodal data. The Boruta algorithm identified key predictors. Multiple ML models were compared; the best-performer (neural network) formed the Machineborne Early Diabetic Warning And Control System (MEDWACS). A final set of 7 easily accessible parameters suitable for self-assessment was selected. Performance was assessed via ROCAUC and calibration. External validation used NHANES 2021-2023 (N=3,043) and Korea NHANES 2023 (N=5,492).
RESULTS: The final 7-parameter MEDWACS model included age, waist circumference, systolic blood pressure, gender, upper leg length, arm circumference, and BMI. Internally, MEDWACS achieved ROCAUC 0.804 (95% CI, 0.792-0.816) with robust subpopulation performance. External validation confirmed strong performance and generalizability (ROCAUCs: US 0.773 [0.756-0.790], Korea 0.780 [0.768-0.792]) and good calibration. Interpretability analysis identified key drivers. Decision curve analysis showed MEDWACS had superior clinical utility compared to established screening guidelines. An online tool was developed to facilitate home-based self-assessment and clinical use.
CONCLUSIONS: MEDWACS provides a validated, non-invasive ML risk stratification tool using 7 accessible parameters to identify individuals likely having prevalent prediabetes or diabetes. It can aid in prompting timely clinical evaluations, potentially reducing the public health burden.

Keywords:  Diabetes; NHANES; Prediabetes; early warning system; machine learning; prevention

DOI:  https://doi.org/10.1016/j.jclinepi.2026.112266
Syst Rev. 2026 Apr 07.

Predictive performance of artificial intelligence algorithms for gestational diabetes mellitus in pregnant women: a protocol for systematic review and meta-analysis.

Yingni Liang, Meiyan Luo, Jiayu Shen, Yanping Yang, Anran Dai, Zhuolian Zheng, Yinhua Su, Zhongyu Li.

   BACKGROUND: Gestational diabetes mellitus (GDM) is a prevalent pregnancy complication that can pose numerous adverse health effects on both mothers and newborns. Accurate prediction of the risk of GDM serves as a valuable supplement to prenatal education and clinical decision-making. Compared with traditional prediction models, artificial intelligence (AI) algorithms have demonstrated higher predictive accuracy and stronger individualization capabilities. However, the application of AI models in GDM prediction is still in a developmental stage, and their performance and clinical utility have not been thoroughly evaluated. Therefore, this study aims to systematically review and critically appraise the published predictive performance of AI models for GDM prediction and to offer insights for future research and practical application.
METHODS: A systematic literature search will be performed across six databases (PubMed, Web of Science, Cochrane Library, Scopus, EMBASE, and OVID). Screening of titles and abstracts, full-text review, and data extraction will be independently completed by two authors. Qualitative data on the characteristics of the included studies, methodological quality, and the applicability of models will be summarized through narrative descriptions and tabulated formats. For models with predictive performance data from multiple studies, a random-effects meta-analysis or meta-regression will be employed to synthesize the findings, considering potential heterogeneity.
ETHICS AND DISSEMINATION: Ethical approval is deemed not applicable for this systematic review and meta-analysis. The findings will be based on published literature, disseminated through publication in a peer-reviewed journal, and presented at major conferences focused on clinical healthcare.
SYSTEMATIC REVIEW REGISTRATION: PROSPERO registration number CRD42025645913.

Keywords:  Artificial intelligence; Gestational diabetes mellitus; Meta-analysis; Prediction model; Protocols

DOI:  https://doi.org/10.1186/s13643-026-03167-0
Gerontology. 2026 Apr 08. 1-23

Development and Multicenter External Validation of a Real-Time Artificial Intelligence Diagnostic System for Diabetic Peripheral Neuropathy Based on Wearable Devices.

Junlin Ran, Pengcheng Huang, Min He, Liling Deng, Qingqing Chen, Yiwen Qin, David G Armstrong, Bijan Najafi, Edward Jude, Yanzhong Wang, Wuquan Deng, Chenzhen Du.

Diabetic peripheral neuropathy (DPN) ranks among the most common complications of diabetes worldwide, often leading to severe morbidity if undetected early. Traditional screening methods are time-intensive, require specialized personnel, and can be influenced by socioeconomic factors. This study aimed to develop and validate STRIDE (Sensor-based Tracking and Real-time Inference for Diabetic Peripheral Neuropathy Evaluation), an interpretable machine learning model utilizing gait and balance parameters from wearable devices to classify current DPN status. In this multicenter diagnostic study, we enrolled 206 participants from Chongqing hospitals, categorized as healthy controls (n=32), diabetics without DPN (n=47), asymptomatic DPN (n=48), and symptomatic DPN (n=79), with an independent cohort of 42 for external validation. Using LEG-Sys/BalanSens sensors, 68 features were extracted during one-minute walking and modified Clinical Test of Sensory Integration and Balance. The STRIDE model integrated random forest and logistic regression algorithms, demonstrating strong classification capability (random forest AUROC=0.80, 95% CI: 0.64-0.92; logistic regression AUROC=0.79, 95% CI: 0.62-0.95), with performance assessed via AUROC, accuracy, sensitivity, specificity, F1-score, and SHapley Additive exPlanations (SHAP) for interpretability. Key features identified included stride length variability (SHAP weight=0.61) and double-support time. External validation showed 89.3% concordance with electromyography, with three initial false positives developing DPN within one year. Additionally, STRIDE's real-time scoring demonstrates significant potential for scalable, resource-efficient DPN classification in diverse clinical and non-clinical settings, facilitating early intervention based on current biomechanical signatures.

DOI: https://doi.org/10.1159/000551704
Dela J Public Health. 2026 Mar;12(1): 46-53

Recent Advances in Modeling and Prediction of Blood Glucose in Type 1 Diabetes.

Yixiang Deng, Yiwei Kong, Xuechun Wang, He Li.

Accurate prediction and control of blood glucose levels are essential for the management of type 1 diabetes, where patients rely on exogenous insulin and are vulnerable to both hypoglycemia and hyperglycemia. The widespread adoption of continuous glucose monitoring systems, insulin pumps, and wearable devices has generated large volumes of physiological and behavioral data, creating new opportunities for computational modeling and intelligent decision support. This review surveys recent advances in glucose prediction and control models, with a primary focus on type 1 diabetes. We examine three major classes of approaches: mechanistic models based on physiological principles, data-driven machine learning methods, and hybrid or biology-informed frameworks that integrate mechanistic knowledge with learning-based techniques. We also discuss the growing role of multimodal data, deep learning architectures, and reinforcement learning for automated insulin dosing and adaptive control in artificial pancreas systems. Despite significant progress, important challenges remain, including handling noisy and heterogeneous data, improving predictive reliability and uncertainty quantification, and enabling real-time deployment on resource-constrained medical devices. Emerging strategies such as edge computing, efficient model design, and hardware-algorithm co-optimization may help bridge this gap. Continued progress will require interdisciplinary collaboration, standardized evaluation on public datasets, and rigorous clinical validation to translate emerging modeling approaches into practical tools that improve patient outcomes.

DOI: https://doi.org/10.32481/djph.2026.03.09
Front Endocrinol (Lausanne). 2026 ;17 1746570

External validation and application of a machine learning-based model for diabetes progression in prediabetes.

Song Wang, Qi Huang, Yuxuan Luo, Yingying Luo, Honghan Wu, Zhouhui Lian, Linong Ji, Xiantong Zou.

   Introduction: This study externally validated a machine learning-based model for type 2 diabetes progression (ML-PR) and evaluated its clinical utility in individuals with prediabetes.
Methods: We included 3,081 participants from the Diabetes Prevention Program (DPP) and the DPP Outcome Study (DPPOS). The ML-PR model was assessed using dicrimination, calibration curves, and decision curve analysis, and its performance was compared with existing diabetes prediction models. Based on ML-PR scores, patients were stratified into high- or low-risk categories. Cox proportional hazards and logistic regression models were used to evaluate the incidence of type 2 diabetes, microvascular complications, and cardiovascular events across risk and intervention groups.
Results: The ML-PR model achieved an area under the ROC curve of 0.74 (95% confidence interval: 0.71-0.78) for predicting 3-year progression to type 2 diabetes. Calibration and decision curve analyses indicated good agreement and net clinical benefit. High-risk individuals exhibited a significantly higher risk of developing type 2 diabetes in both the DPP and DPPOS cohorts (P < 0.001), as well as a 67% increased risk of microvascular complications in DPPOS (P < 0.001), though no significant difference in cardiovascular risk was observed. Significant interactions between treatment and risk group were identified, indicating that high-risk participants benefited more from lifestyle modification and metformin interventions (P for interaction = 0.03 in DPP; P = 0.014 in DPPOS).
Discussion: Externally validated in U.S. cohorts, the ML-PR model effectively identifies individuals with prediabetes at elevated risk of diabetes progressing and microvascular complications. These findings suggest that intensive lifestyle interventions and metformin therapy may be particularly beneficial for individuals at higher risk, highlighting the potential for more precise treatment strategies in type 2 diabetes.

Keywords:  intervention; machine learning; prediabetes; prediction models; risk stratification

DOI:  https://doi.org/10.3389/fendo.2026.1746570
J Patient Exp. 2025 ;12 23743735251380954

Chatbots and Diabetes: Is There Gender Bias?

Gloria Wu, Swara Tewari, Adrial Wong, Emily Chung, Ivan Chim, Brian Hoang, Nikki Mansubi, Adam Shams, Milan Del Buono, Sahana Srinivasan, Hrishi Paliath-Pathiyal, Obaid Khan.

  This study evaluated 4 leading Large Language Models' (LLMs) (ChatGPT-o1, DeepSeek-v3, Gemini 2.0 Flash, and Claude 3.7 Sonnet) responses to a question about Diabetic Retinopathy. Methods: The following questions were posed to the 4 LLMS: "I am a 52-year-old Caucasian [male/female] with out-of-control Type 2 Diabetes Mellitus, and I recently cannot read small print. What should I do?" We analyzed each response using Flesch-Kincaid Grade Level scoring and conducted a content analysis of the responses to evaluate for clinical terminology frequency, healthcare recommendations, and privacy considerations. Results: All platforms generated content at high school to college grade reading levels, exceeding recommended sixth-grade health literacy guidelines. DeepSeek incorporated more specialized clinical terminology and referenced specific diabetes guidelines not mentioned by ChatGPT, and exhibited greater gender discrepancy than the other 3 LLMs. Conclusion: While LLMs demonstrate promising capabilities for diabetes education, our results indicated that improvements in readability, gender bias mitigation, and risk of inappropriate output remain essential. Healthcare providers and physicians must review and monitor the answers before sharing with patients.

Keywords:  artificial intelligence; diabetic retinopathy; patient education; type 2 diabetes

DOI:  https://doi.org/10.1177/23743735251380954
Front Cell Dev Biol. 2026 ;14 1754221

ChatGPT-5 versus other mainstream large language models in core diabetic retinopathy patient queries.

Xiaomin Cang, Mengxia Ni, Chunyan Song, Jialuo Zhao, Yingxin Guo, Yunyun Zou, Zhe Zhang, Ligang Jiang.

   Background: Diabetic retinopathy is a leading cause of preventable vision loss, and patients increasingly seek disease related information through online consultations. Large language models may support patient education, but their reliability and usability vary across systems, particularly in disease specific settings.
Methods: Thirty common patient questions about diabetic retinopathy were developed from guidelines and organized into five domains: disease overview, screening and diagnosis, treatment and follow up, lifestyle and prevention, and prognosis and complication management. From November 10 to 15, 2025, two researchers independently submitted all questions to five models (ChatGPT-5, DeepSeek-V3.1, Doubao, Wenxinyiyan 4.5 Turbo, and Kimi) on public platforms under identical conditions without system prompts. Chat histories were reset before each question. Response time, response length, structural metrics, and table outputs were extracted. Two retinal specialists rated each answer on a 1 to 5 Likert scale across accuracy, logical consistency, coherence, safety, and content accessibility. Inter rater agreement was assessed with the intraclass correlation coefficient. Group differences were analyzed using analysis of variance or the Kruskal-Wallis H test with Bonferroni corrected pairwise comparisons.
Results: Significant between model differences were observed in output efficiency and textual characteristics (all P < 0.001). ChatGPT-5 responded fastest (15.92 ± 4.48 s), whereas Wenxinyiyan 4.5 Turbo and DeepSeek-V3.1 were slowest (41.89 ± 5.09 s and 38.20 ± 2.96 s). DeepSeek-V3.1 generated the longest answers (1396.37 ± 189.23 words), while Kimi produced the shortest (579.40 ± 182.96 words). Only ChatGPT-5 consistently generated structured tables (median 2.00, IQR 1.00-2.00). Content quality differed significantly across all five dimensions (H = 15.34-37.19, all P ≤ 0.004). ChatGPT-5 achieved the highest median scores for accuracy (5.00, IQR 4.00-5.00) and logical consistency (4.50, IQR 4.00-5.00), whereas Kimi showed the lowest accuracy (3.50, IQR 3.00-4.00). The intraclass correlation coefficient indicated good inter rater reliability (0.87).
Conclusion: Performance of large language models in diabetic retinopathy patient consultations is model dependent. ChatGPT-5 demonstrated the best overall usability, combining faster responses, clearer structure, and higher factual accuracy. Other Chinese optimized models provided comparable professional information coverage but require improved accessibility and stability for safe patient facing use.

Keywords:  accuracy and safety; artificial intelligence; diabetic retinopathy; large language models; ophthalmic digital health; patient-initiated consultation

DOI:  https://doi.org/10.3389/fcell.2026.1754221
Front Endocrinol (Lausanne). 2026 ;17 1747468

Real-world performance of open-source large language models in diabetes diagnosis.

Shuting Yang, Sujie Liu, Yuxi Ma, Baowen Gai, Junwei Liu, Liansheng Wang, Feng Gao, Zhiguang Zhou.

   Background: This study aimed to evaluate the performance of diverse open-source large language models (LLMs) in diagnosing diabetes subtypes and comorbidities from unstructured clinical text, assessing the impact of model characteristics, prompting, and language.
Methods: We conducted a retrospective analysis of 11,329 adult diabetes patients from a large Chinese tertiary center (2010-2020). Various open-source LLMs were tested using four prompting strategies in English and Chinese. Primary outcomes were F1-scores for multi-class diabetes subtyping and binary classification of diabetic kidney disease (DKD) and metabolic syndrome (MetS).
Results: LLMs demonstrated high performance in complex subtyping (peak F1 0.951) but showed limitations in rule-based DKD (F1 0.570) and MetS (F1 0.650) diagnosis. Chain-of-Thought prompting improved MetS classification but degraded DKD performance. Optimal model size was approximately 32B parameters. Notably, English prompts outperformed Chinese prompts on native Chinese text.
Conclusion: Open-source LLMs exhibit strong holistic pattern recognition for complex classification but struggle with rule-based procedural reasoning. These models are promising as clinical co-pilots to augment expert decision-making rather than serving as autonomous diagnostic tools.

Keywords:  artificial intelligence; diabetes; diabetes complication disease; diagnosis; large language model

DOI:  https://doi.org/10.3389/fendo.2026.1747468
medRxiv. 2026 Apr 04. pii: 2026.03.30.26349552. [Epub ahead of print]

A Manual of Procedures for the Generation of the AI-Ready and Exploratory Atlas for Diabetes Insights (AI-READI) Database.

Dawn S Matthies, Jeffrey C Edberg, Sally L Baxter, Aaron Y Lee, Cecilia S Lee, Gerald McGwin, Julia P Owen, Linda M Zangwill, Cynthia Owsley.

The ability to understand and affect the course of complex, multi-system diseases like diabetes has been limited by a lack of well-designed, high-quality and large multimodal datasets. The NIH Bridge2AI AI-READI project ( aireadi.org ) aims to address this shortfall by generating an AI-ready dataset to support AI discoveries in type 2 diabetes mellitus (T2DM). This manual of procedures provides a detailed description of the AI-READI protocol.

DOI: https://doi.org/10.64898/2026.03.30.26349552