bims-aukdir Biomed News
on Automated knowledge discovery in diabetes research
Issue of 2025–12–14
fourteen papers selected by
Mott Given



  1. Popul Health Manag. 2025 Dec 04.
      Type 2 diabetes mellitus (T2D) is a prevalent metabolic disorder with significant health and economic burdens worldwide. The relationship between inflammation-related indicators and the risk of developing new-onset T2D remains underexplored. This study aims to identify and validate an interpretable predictive model for incident T2D using inflammation-related indicators. We analyzed data from 220,937 participants free of diabetes at baseline in the UK Biobank. Six machine learning algorithms were employed to construct predictive models. Feature selection was performed using Least Absolute Shrinkage and Selection Operator regression. SHapley Additive exPlanations (SHAP) were used to interpret the best-performing model. A genetic risk score (GRS, an aggregate measure of genetic susceptibility to T2D) was constructed, and multivariate Cox regression assessed the combined effects of genetic and inflammatory factors on T2D incidence. The Extreme Gradient Boosting model demonstrated the best performance (training set AUC = 0.863, testing set AUC = 0.838). Key predictors included body mass index, cholesterol, age, alanine aminotransferase, high-density lipoprotein, and Prognostic Nutritional Index (a marker predictive of inflammation and nutritional outcomes). SHAP analysis revealed significant contributions from these features. C-reactive protein and white blood cell count showed strong associations with future T2D risk. Integrating the GRS significantly improved the model's predictive performance (ΔAUC = +0.025, P < 0.05 via DeLong's test). This study presents an interpretable machine learning model for new-onset T2D risk prediction, emphasizing the role of inflammation and genetic factors. The findings provide a valuable tool for early T2D prevention and intervention, offering insights into the complex interplay between inflammation and diabetes development.
    Keywords:  genetic risk score; inflammation; machine learning; new-onset T2D
    DOI:  https://doi.org/10.1177/19427891251401221
  2. Comput Methods Biomech Biomed Engin. 2025 Dec 13. 1-11
      Accurate prediction of gestational diabetes mellitus (GDM) is critical for improving maternal and fetal outcomes. This study develops a Transformer-based multimodal fusion model that integrates tabular clinical features and image-encoded electronic health records (EHRs), aiming for accurate end-to-end classification of GDM. Preprocessed EHRs were transformed into grayscale, RGB, and heatmap, with visual features were extracted by a Vision Transformer and tabular features by an MLP. A modality-aware attention mechanism enhances cross-modal fusion. Evaluated on two public datasets, performance gains over the strongest single-modality models reached 3.95% and 0.38% in accuracy.
    Keywords:  Gestational diabetes mellitus; cross-modal feature fusion; end-to-end prediction; image encoding; transformer
    DOI:  https://doi.org/10.1080/10255842.2025.2573868
  3. Acta Bioeng Biomech. 2025 Sep 01. 27(3): 49-60
      Purpose: The aim of this study was to investigate a novel data mining approach for early and effective diagnosis of Gestational Diabetes Mellitus (GDM). Methods: Gestational Diabetes Mellitus (GDM) data contains two classes (healthy and diabetic), 15 features and 3525 instances. In the first stage, the widely used and effective KNN and regression methods were employed for the filling of missing data. Then, the data source transformed into grayscale images as primary images and multiplexed images. Finally, both original data and transformed data are classified with KNN, SVM and CNN using k-fold cross validation technique. Performance metrics were compared to extract the best suitable system. Results: The original GDM source and the missing values replacement of GDM are classified with KNN and SVM methods. Also, primary images of this dataset and multiplexed images are classified with CNN 50%-50% and 70%-30% train-test respectively. The results of classification performance demonstrated that reaching up to 97.91% with CNN, recall of 97.61%, specificity of 97.61%, precision of 97.97% and F1-score of 97.79%. This result outperformed all previous studies conducted on the same dataset in the literature. Conclusions: This work is demonstrated a new approach that the best results of classification accuracy when compared with previous studies related to proposed methods to identify GDM disease. It can be clearly stated that applying a data mining method to impute missing values, followed by converting the dataset into images based on certain criteria and classifying with CNN, is the most effective approach for predicting GDM.
    Keywords:  CNN; GDM disease; KNN; SVM; data mining; image conversion
    DOI:  https://doi.org/10.37190/abb/209528
  4. Sci Rep. 2025 Dec 12.
      
    Keywords:  Contrast enhancement; Eye fundus images; Illumination correction pre-processing; Retinal diseases
    DOI:  https://doi.org/10.1038/s41598-025-31339-w
  5. J Diabetes Investig. 2025 Dec 08.
       INTRODUCTION: Identifying patient characteristics predictive of treatment response is crucial for optimizing type 2 diabetes outcomes. Using data from three phase 2/3 imeglimin trials in Japan, this analysis applied machine learning to determine characteristics associated with HbA1c improvement.
    METHODS: Regression tree and random forest methods identified baseline characteristics predictive of HbA1c improvement. Partial dependence plots (PDP) visualized the relationship between HbA1c change and continuous variables deemed important by Boruta.
    RESULTS: For monotherapy, key predictors were baseline HbA1c, hypertension, smoking, type 2 diabetes duration, body mass index (BMI), low-density lipoprotein-cholesterol (LDL-C), metabolic syndrome, and estimated glomerular filtration rate. Nonsmokers with HbA1c ≥8.35% and LDL-C < 3.26 mmol/L at baseline showed the greatest improvement in HbA1c (-1.24%). Random forest analysis and Boruta identified baseline HbA1c, BMI, fatty liver index, smoking, and hypertension as significant predictors of HbA1c improvement. PDPs identified a positive correlation between higher baseline HbA1c, and a negative correlation between BMI and fatty liver index, and HbA1c improvement. For imeglimin add-on to insulin therapy, key predictors were BMI, age, LDL-C, type 2 diabetes duration, systolic blood pressure, and alanine transaminase (ALT). Patients with BMI <25.9, LDL-C < 2.68 mmol/L, and ALT <21 U/L showed the greatest HbA1c improvement (-1.48%). Random forest analysis and Boruta confirmed BMI, age, and LDL-C as significant predictors. PDPs identified a positive correlation between older age, and a negative correlation between higher BMI and LDL-C, and HbA1c improvement.
    CONCLUSIONS: Machine learning effectively identified baseline characteristics predictive of HbA1c response to imeglimin, supporting the potential for personalized type 2 diabetes treatment strategies.
    Keywords:  Imeglimin; Machine learning analysis; Type 2 diabetes
    DOI:  https://doi.org/10.1111/jdi.70215
  6. Sci Rep. 2025 Dec 10.
      
    Keywords:  Body composition data; Deep learning; Diabetes prediction; Generative adversarial networks; Machine learning; Multilayer perceptron
    DOI:  https://doi.org/10.1038/s41598-025-31928-9
  7. Diagnostics (Basel). 2025 Dec 02. pii: 3070. [Epub ahead of print]15(23):
      Background/Objectives: The aim of this study was to identify systemic, metabolic, and host-related prognostic factors for long-term outcomes in patients with a diabetic foot ulcer (DFU). Methods: One hundred patients were selected from a high-risk cohort of 426 individuals with a DFU (January 2021-January 2023) based on predefined inclusion and exclusion criteria. Clinical, laboratory, and imaging data were collected. Outcomes were categorized as favorable (healing) or unfavorable (non-healing, re-ulceration, amputation, or death). Prognostic factors were analyzed using random forest and categorical boosting models, with SHAP values to determine the importance of individual predictors. Results: The median age of participants was 65 years (interquartile range [IQR], 57-69.25), and the median duration of diabetes was 18 years (IQR, 12-26). Over a mean 2.1-year follow-up, unfavorable outcomes occurred in 53% of the whole cohort and in 36% of survivors. The strongest predictors of poor prognosis were prior amputation, elevated inflammatory markers, reduced eGFR, and dyslipidemia. Triglycerides showed a U-shaped association with outcomes. A lower BMI and shorter diabetes duration paradoxically were also linked to poorer prognosis. Glycemic control, comorbidities, and local foot characteristics had limited predictive value. Conclusions: Long-term DFU prognosis is driven mainly by systemic and host-related factors rather than by ulcer characteristics alone. Inflammation, renal dysfunction, dyslipidemia-particularly triglycerides-and prior amputation were the strongest predictors of unfavorable outcomes.
    Keywords:  diabetes mellitus; diabetic foot ulcer; machine learning analysis; outcomes; risk factors
    DOI:  https://doi.org/10.3390/diagnostics15233070
  8. Cardiovasc Diabetol Endocrinol Rep. 2025 Dec 10. 11(1): 38
    AMD Annals study group
      
    Keywords:  Cardiorenal risk stratification; Glucagon-like peptide-1 receptor agonists (GLP-1RA); Machine learning analysis; Prescription patterns and patient-centered outcomes; Type 2 diabetes mellitus (T2D); Sodium-glucose co-transporter-2 inhibitors (SGLT2is)
    DOI:  https://doi.org/10.1186/s40842-025-00251-7
  9. Sci Rep. 2025 Dec 12.
      Gestational diabetes mellitus (GDM) is a prevalent condition requiring accurate patient education, yet the reliability and readability of large language models (LLMs) in this context remain uncertain. This study evaluated the performance of four LLMs-ChatGPT-4o, Gemini 2.5 Pro, Grok 3.0, and DeepSeek R-1-using 25 patient-oriented questions derived from clinical scenarios. Seven endocrinologists independently rated the responses with the modified DISCERN (mDISCERN) instrument and the Global Quality Score (GQS). Readability was analyzed using the Flesch Reading Ease (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG), while lexical diversity was assessed through type-token ratio (TTR). Grok and Gemini obtained the highest mDISCERN and GQS scores, whereas ChatGPT performed significantly lower (p < 0.05). DeepSeek generated the most readable outputs, while Grok provided the longest and most complex responses. All models scored below the FRES threshold of 60 recommended for lay audiences. Response length showed strong positive correlations with mDISCERN and GQS, while TTR was inversely related to quality but positively associated with readability. These findings highlight variability among LLMs in GDM education and emphasize the need for model-specific improvements to ensure reliable patient-facing health information.
    Keywords:  Artificial intelligence; Gestational diabetes mellitus; Large language models; Patient education; Readability
    DOI:  https://doi.org/10.1038/s41598-025-27235-y
  10. Front Nutr. 2025 ;12 1705683
       Background: Type 2 diabetes mellitus (T2DM) is a major global public health issue, with a particularly high prevalence in China, especially among older men. Obesity, dietary habits, and metabolic risk factors are key contributors to the development of T2DM. However, research on the relationship between dietary patterns, obesity, and T2DM in elderly Chinese men remains limited. Objective: This study aims to examine the links between obesity, dietary habits, blood pressure, and the risk of developing T2DM in elderly Chinese men. We utilize unsupervised machine learning methods along with SHAP-based model interpretation to identify significant lifestyle and metabolic factors associated with T2DM risk.
    Methods: A cross-sectional study was conducted with 982 participants aged 60 years and older from community health centers in Heze City, China. Unsupervised machine learning methods (UMAP) were used to identify dietary patterns, and supervised machine learning with SHAP was applied to evaluate the importance of obesity, dietary patterns, and lifestyle factors on T2DM risk. Logistic regression analyses were performed to investigate the associations between obesity, dietary habits, blood pressure, and T2DM risk. Sensitivity analyses were performed to verify the robustness of the findings.
    Results: Four distinct dietary patterns were identified: "high-fiber nutrient-dense," "staple-protein," "seafood-eggs," and "sugary and processed foods." The prevalence of newly diagnosed T2DM in males was 48.37%. Obesity was inversely associated with T2DM risk across all models (odds ratios: 0.272-0.278, all P < 0.05). Compared with the high-fiber nutrient-dense pattern, adherence to the staple-protein, seafood-eggs, and sugary and processed foods patterns was significantly associated with increased obesity and T2DM risk (all P < 0.01). Shapley Additive Explanations (SHAP) analysis highlighted dietary behaviors, total energy intake, and physical activity as major contributors to T2DM prediction. Sensitivity analyses confirmed the robustness of these associations, independent of total caloric intake and BMI.
    Conclusion: In this population of elderly Chinese males, unhealthy dietary patterns are positively associated with obesity and T2DM risk, whereas obesity itself showed an inverse relationship with T2DM. These findings underscore the importance of promoting nutrient-dense diets and targeted lifestyle interventions to reduce T2DM risk in this population.
    Keywords:  SHAP analysis; dietary patterns; obesity; type 2 diabetes mellitus; unsupervised machine learning
    DOI:  https://doi.org/10.3389/fnut.2025.1705683
  11. Sci Rep. 2025 Dec 09.
      In the Healthcare 5.0 environment, the IoT devices are used for collecting the users statics. Hence, IoT devices can be used for the early detection and staging of diabetes. However, due to the complex interrelationship among the healthcare feature-set it is difficult to do an accurate prediction. In this context, this paper presents self-attention GRU model for predictive diabetes detection. A GRU-based self-attention mechanism captures temporal dependencies and spatial features that improves the model performance. Finally, CNN with Batch Normalization and ReLU performs the final classification. Experimental results show that the model achieved 93.94% accuracy, 95.28% precision, 93.94% recall, and an AUC of 0.9697, outperforming GRU, LSTM, RNN, and transformer-based baselines.
    DOI:  https://doi.org/10.1038/s41598-025-29674-z
  12. Sci Rep. 2025 Dec 11.
      We investigated the characteristics of neurovascular degeneration in diabetic retinopathy (DR) using optical coherence tomography (OCT) and OCT angiography (OCTA). En-face 3 × 3 mm OCTA images were obtained from 327 eyes of DR patients without macular edema. Nonperfusion squares (NPSs) were defined as 15 × 15-pixel regions lacking vascular signals. Neurovascular parameters were extracted from five subfields of the Early Treatment Diabetic Retinopathy Study grid. High-dimensional data were embedded into a two-dimensional space using Uniform Manifold Approximation and Projection, and clustering revealed three distinct groups: Mild, Intermediate, and Severe. Eyes with central subfield thickness (CST) < 246 μm were classified as having diabetic macular atrophy. The Mild group exhibited lower NPS counts, while the Intermediate group showed increased NPS counts in the deep layer. The Severe group had the highest NPS counts and the lowest CST, with a significant negative correlation between CST and superficial NPS counts (ρ = - 0.252, P = 0.039). Eyes with diabetic macular atrophy in the Severe group demonstrated higher NPS counts, worse visual acuity, and more frequent ellipsoid zone disruption compared to the Mild and Intermediate groups (P < 0.001). These findings suggest a pathological relationship between macular ischemia and retinal atrophy, offering new insights into DR progression.
    Keywords:   Diabetic macular atrophy ; Diabetic macular ischemia; Diabetic retinopathy; Neurodegeneration; Uniform manifold approximation and projection
    DOI:  https://doi.org/10.1038/s41598-025-31862-w
  13. Sci Rep. 2025 Dec 09. 15(1): 43369
      This work introduces the MSFAUMobileNet model, a complex U-Net structure tailored for retinal blood vessel segmentation, which is a critical process for detecting and monitoring retinal diseases such as diabetic retinopathy, glaucoma, and age-related macular degeneration (AMD). The model uses Multi-Scale Feature Aggregation (MSFA), Residual Connections, and Attention Mechanisms to enhance its segmentation accuracy. Utilizing MobileNetV2 as the encoder, the model is capable of effectively generating 13 bottleneck layers' worth of hierarchical features. Although residual connections and attention mechanisms are useful in improving the segmentation process and guaranteeing the precise outlining of intricate vascular networks, MSFA extracts spatial information at various resolutions. The model was tested on the DRIVE dataset and produced exceptionally high scores with accuracy at 99.99%, Dice coefficient at 99.95%, and Intersection over Union (IoU) at 99.94%. These scores show how efficiently the model separates the complex retinal network, enabling early treatment and detection of retinal disease. MSFAUMobileNet is a good medical image analysis software for real clinical practice owing to its computational speed and precision, particularly in the management of retinal disease.
    Keywords:  Attention mechanism; Diabetic retinopathy; Glaucoma; MobileNetV2; Multi-Scale feature aggregation (MSFA); Residual connections; Retinal vessel segmentation; U-Net architecture
    DOI:  https://doi.org/10.1038/s41598-025-28707-x
  14. BMC Endocr Disord. 2025 Dec 11.
       AIM: To analyze data from non-intensive care unit (non-ICU) inpatients with diabetes to predict the risk of hypoglycemia using electronic health records (EHRs) and point-of-care (POC) blood glucose values.
    METHODS: Patient demographics, laboratory results, POC blood glucose, and procedures were performed during the hospital stays on Days 0-2 to predict hypoglycemic episodes (blood glucose ≤ 3.9 mmol/L) on Days 3-6. The dataset was randomly split into a training set and an independent verification set at a 7:3 ratio. Logistic Regression (LR) and Artificial Neural Network (ANN) were compared using the area under the curve (AUC). A nomogram plot was also constructed to display the predicted hypoglycemia probabilities.
    RESULTS: Data from 16,593 diabetic patients (January 2017 to June 2022) were analyzed. Predictive factors from the LR model included the use of insulin; previous hypoglycemia in Days 0-2; respiratory rate; blood urea nitrogen; potassium; D-dimer levels; coefficient variation of blood glucose (BG CV) > 31%; and blood glucose gap (BG gap, maximum of blood glucose - minimum of blood glucose) > 10 mmol/L. In the verification set, the AUC of ANN was 0.762 and the AUC of LR was 0.763. There was no significant difference in the effects of the models built by the two methods. The results showed that the probability predicted by the nomogram using LR is similar to the clinical results. Decision curve analysis (DCA) indicated potential clinical application for the LR model.
    CONCLUSIONS: The LR model demonstrated considerable value in predicting hypoglycemia risk, comparable to ANN. Trials of such models should be conducted to evaluate their utility in reducing inpatient hypoglycemia.
    CLINICAL TRIAL NUMBER: Not applicable.
    Keywords:  Artificial neural network; Glycemic variability; Hypoglycemia; Inpatient diabetes; Prediction
    DOI:  https://doi.org/10.1186/s12902-025-02125-6