bims-aukdir 2025-10-26 papers

bims-aukdir

Biomed News

on Automated knowledge discovery in diabetes research

Issue of 2025–10–26
eleven papers selected by
Mott Given

Privacy preservation in diabetic disease prediction using federated learning based on efficient cross stage recurrent model.
Machine learning-driven Diabetes Health Tracer (DHT): Optimizing prognosis using RaSK_GraDe and RaSK_GraDeL models.
Interpretable Machine Learning Model for Predicting and Assessing the Risk of Diabetic Nephropathy: Prediction Model Study.
Plasma multi-omics and machine learning reveal predictive biomarkers for type 2 diabetes and retinopathy in Qatar biobank cohort.
Enhancing diabetic retinopathy diagnosis and grading: a retrospective study on AI-assisted decision making and cost analysis.
Artificial Intelligence for Early Detection of Preeclampsia and Gestational Diabetes Mellitus: A Systematic Review of Diagnostic Performance.
Diabetic Retinopathy Screening Among Federally Qualified Health Center Patients Using Point-of-Care AI: DRES-POCAI: A Trial Protocol.
Development of a Novel Machine Learning Method for Estimation of Life-Long Chronic Disease Progression and Its Application to Type 2 Diabetes.
A deep learning framework with hybrid stacked sparse autoencoder for type 2 diabetes prediction.
Assessing Lung Injury Induced by Streptozotocin-induced Diabetes: A Deep Neural Network Analysis of Histopathological and Immunohistochemical Images.
Establishment of a Diabetes-Tailored Data Intelligence Platform Enhances Clinical Care, Enables Risk-Based Monitoring, and Facilitates Population-Health-Based Approaches at a Pediatric Diabetes Network.

Sci Rep. 2025 Oct 24. 15(1): 37258

Privacy preservation in diabetic disease prediction using federated learning based on efficient cross stage recurrent model.

R Jayalakshmi, T Tamilvizhi.

  Diabetic retinopathy (DR) is a major problemfor the diabetes patients that makes a serious threat to vision and causes the irreversible blindness if not diagnosed and treated early. Conventional deep learning-based approaches designed for DR detection have demonstrated promising results; still, the requirement of centralized data aggregation makes privacy and security concerns for sharing the healthcare data. Thus, for providing the privacy preservation federated learning (FL) based methods were designed; still, the computation overhead and inaccurate detection of disease limits the performance. Hence, this research introduces a privacy-preserving framework named federated learning based diabetic retinopathy detection network (FedDRNet) model. The proposed FedDRNet model includes efficient cross stage recurrent network (ECSRNet) for training the local and server model that combines the strengths of ShuffleNet, CSPNet, and GRU to achieve high accuracy and computational efficiency. Besides, to strengthen the privacy, Homomorphic Encryption is applied prior to the update sharing for obtaining secure communication between clients and the central server. Also, improved K-means clustering (IKMC) based user selection enhances the communication efficiency by reducing the communication rounds. The analysis of FedDRNet by implementing in PYTHON programming tool based on Accuracy, Precision, Recall, F-Score, and Specificity acquired the values of 98.6, 98.8, 98.3, 98.6, and 98.1% respectively.

Keywords:  Client clustering model; Cross stage partial network; Diabetic retinopathy; Federated learning; Homomorphic encryption; Multi-scale filtering; Noise removal; Privacy-preservation; ShuffleNet

DOI:  https://doi.org/10.1038/s41598-025-21229-6
PLoS One. 2025 ;20(10): e0327661

Machine learning-driven Diabetes Health Tracer (DHT): Optimizing prognosis using RaSK_GraDe and RaSK_GraDeL models.

Muhammad Noman, Maria Hanif, Abdul Hameed, Muhammad Babar, Basit Qureshi.

Diabetes mellitus presents a significant global health challenge, particularly in regions like Pakistan, India, and Bangladesh. Machine learning (ML) techniques offer promising solutions for diabetes prediction, surpassing traditional methods in reliability and efficiency. This research conducts a comparative analysis of ML algorithms including Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), K-nearest neighbors (KNN), Gradient Boosting (GB), RaSK_GraDe (Proposed Voting), and RaSK_GraDeL (Proposed Stacking). Evaluation is performed using datasets, such as PIMA Indian, Frankfurt Hospitals Diabetes, RTML with Insulin, and the proposed Diabetes Health Tracer (DHT) dataset comprising 2877 observations with nine features. Data pre-processing techniques address missing values, outliers, normalization, and class balancing (SMOTE), enhancing model robustness. Hyperparameter tuning via cross-validation and Random Search optimizes model performance. Additionally, ensemble methods-Voting Classifier (RaSK GraDe) and Stacking Model (RaSK GraDeL with Logistic Regression) are applied, achieving notable accuracies of 98.03% and 98.55%, respectively, on the DHT dataset. The study underscores ML's potential in diabetes prediction, advocating for personalized treatment and healthcare management advancements.

DOI: https://doi.org/10.1371/journal.pone.0327661
JMIR Med Inform. 2025 Oct 22. 13 e64979

Interpretable Machine Learning Model for Predicting and Assessing the Risk of Diabetic Nephropathy: Prediction Model Study.

Yili Wen, Zhiqiang Wan, Huiling Ren, Xu Wang, Weijie Wang.

   Background: Diabetic nephropathy (DN), a severe complication of diabetes, is characterized by proteinuria, hypertension, and progressive renal function decline, potentially leading to end-stage renal disease. The International Diabetes Federation projects that by 2045, 783 million people will have diabetes, with 30%-40% of them developing DN. Current diagnostic approaches lack sufficient sensitivity and specificity for early detection and diagnosis, underscoring the need for an accurate, interpretable predictive model to enable timely intervention, reduce cardiovascular risks, and optimize health care costs.
Objective: This study aimed to develop and validate a machine learning-based predictive model for DN in patients with type 2 diabetes, with a focus on achieving high predictive accuracy while ensuring transparency and interpretability through explainable artificial intelligence techniques, thereby supporting early diagnosis, risk assessment, and personalized clinical decision-making.
Methods: Our retrospective cohort study investigated 1000 patients with type 2 diabetes using data from electronic medical records collected between 2015 and 2020. The study design incorporated a sample of 444 patients with DN and 556 without, focusing on demographics, clinical metrics such as blood pressure and glucose levels, and renal function markers. Data collection relied on electronic records, with missing values handled via multiple imputation and dataset balance achieved using Synthetic Minority Oversampling Technique (SMOTE). In this study, advanced machine learning algorithms, namely Extreme Gradient Boosting (XGBoost), CatBoost, and Light Gradient-Boosting Machine (LightGBM), were used due to their robustness in handling complex datasets. Key metrics, including accuracy, precision, recall, F1-score, specificity, and area under the curve, were used to provide a comprehensive assessment of model performance. In addition, explainable machine learning techniques, such as Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP), were applied to enhance the transparency and interpretability of the models, offering valuable insights into their decision-making processes.
Results: XGBoost and LightGBM demonstrated superior performance, with XGBoost achieving the highest accuracy of 86.87%, a precision of 88.90%, a recall of 84.40%, an F1-score of 86.44%, and a specificity of 89.12%. LIME and SHAP analyses provided insights into the contribution of individual features to elucidate the decision-making processes of these models, identifying serum creatinine, albumin, and lipoproteins as significant predictors.
Conclusions: The developed machine learning model not only provides a robust predictive tool for early diagnosis and risk assessment of DN but also ensures transparency and interpretability, crucial for clinical integration. By enabling early intervention and personalized treatment strategies, this model has the potential to improve patient outcomes and optimize health care resource usage.

Keywords:  ML; ML model; diabetes; diabetic nephropathy; early diagnosis; fibrosis; glucose; hypertension; inflammation; interpretability analysis; kidney; machine learning; oxidative stress; patient outcomes; predictive tool; quality of life; renal disease; renal function; risk assessment; type 2 diabetes

DOI:  https://doi.org/10.2196/64979
J Transl Med. 2025 Oct 22. 23(1): 1159

Plasma multi-omics and machine learning reveal predictive biomarkers for type 2 diabetes and retinopathy in Qatar biobank cohort.

Ikhlak Ahmed, Ajaz A Bhat, Sujitha Padma Jeya, Wilson K M Wong, Hana Q Sadida, Mya Polkamp, Hrishikesh P Hardikar, Jyothi Lakshmi, Sura Ahmed Hussain, Evonne Chin-Smith, Amaresh K Ranjan, Mugdha V Joglekar, Anandwardhan A Hardikar, Khalid Fakhro, Ammira Al-Shabeeb Akil.

   BACKGROUND: Type 2 diabetes (T2D) and its vascular complications, including diabetic retinopathy (DR), are escalating in prevalence globally, with disproportionately high prevalence in Middle Eastern populations, where genetic predispositions and lifestyle factors intersect. Early detection and precise risk stratification remain critical challenges in this region. We hypothesised that an integrated plasma multi-omics profile; comprising microRNA, mRNA, and protein biomarkers, could accurately distinguish individuals with T2D and its complications in a Middle Eastern cohort.
METHODS: A candidate panel of mRNA and protein biomarkers identified from in vitro hyperglycaemia models, along with a vascular microRNA signature previously defined in an Australian cohort, was evaluated. These multiomic biomarkers were profiled in 962 individuals (492 controls, 434 T2D and 36 T2D with DR) from the Qatar Biobank (QBB). Random Forest machine learning workflow was used for risk stratification, with model performance assessed by accuracy and area under the receiver operating characteristic curve. SHAP analysis and penalised regression were used to identify key discriminative biomarkers.
RESULTS: The Random Forest classifier achieved robust performance, with an AUC of 0.83, F1 score of 0.78, and overall accuracy of 0.76 in distinguishing T2D cases from controls. A regulatory axis involving miR-29c (protective) and PROM1 (risk-promoting) was identified as a central driver for T2D and DR progression. Protein biomarkers, including ANGPT2 (fold change = 1.64, p-value = 3.8e-03) and PlGF (fold change = 0.66, p-value = 3.7e-02), were significantly associated with vascular complications.
CONCLUSIONS: Integrating multi-omics data with machine learning enables accurate risk stratification for T2D and DR in Middle Eastern populations. The miR-29c-PROM1 axis and associated proteins represent promising biomarkers for early detection and targeted intervention. Leveraging QBB resources, this study lays the groundwork for precision health initiatives aimed at mitigating diabetes-related complications in a high-risk Middle Eastern cohort.

Keywords:  Biomarkers; Diabetic retinopathy; Gene expression; Machine learning; Middle east; Multi-omics; PROM1; Random forest; Type 2 diabetes; miR-29c

DOI:  https://doi.org/10.1186/s12967-025-07113-x
Br J Ophthalmol. 2025 Oct 20. pii: bjo-2025-327442. [Epub ahead of print]

Enhancing diabetic retinopathy diagnosis and grading: a retrospective study on AI-assisted decision making and cost analysis.

Xieyang Xu, Jiaying Zhang, Xuefei Song, Xinyi Liu, Yan Liu, Lili Feng, Yun Su, Yan Li, Linna Lu, Xianqun Fan.

   BACKGROUND/AIMS: Diabetic retinopathy (DR) is a major ocular complication of diabetes mellitus. While artificial intelligence (AI)-based DR screening tools have gained widespread adoption, most research focuses on comparing AI performance with human, with limited attention to AI's role as assistants. This study evaluates the impact of AI-assisted decision-making on DR diagnosis and grading based on colour fundus photographs (CFP) and ultra-widefield fundus (UWF) images.
METHODS: A total of 224 retinal images were analysed by 21 ophthalmologists and primary care physicians (PCPs) in China. Participants independently diagnosed and graded DR based on CFP and UWF images. After a 1-week interval, they repeated the task with AI assistance. Diagnosis accuracy was compared with a gold standard before and after AI assistance. Incremental costs and accuracy improvements were assessed using generalized estimating equations (GEE) models.
RESULTS: AI assistance significantly improved DR diagnosis accuracy for both CFP and UWF images. For CFP, accuracy increased from 79.90% to 85.68% for PCPs, 81.19% to 88.69% for ophthalmic residents and 81.41% to 88.05% for ophthalmic attendings. Similar improvements were observed for UWF, with accuracy rising from 83.62% to 89.66% for residents and from 81.31% to 88.98% for attendings. GEE analysis revealed an incremental cost of 4.79 units and an accuracy improvement of 0.35 units with AI assistance.
CONCLUSION: AI assistance shows potential in improving the accuracy of DR diagnosis and grading. Despite the associated costs, AI enables ophthalmologists to achieve superior diagnosis, facilitating earlier DR detection and treatment.

Keywords:  Imaging; Retina

DOI:  https://doi.org/10.1136/bjo-2025-327442
Cureus. 2025 Sep;17(9): e92585

Artificial Intelligence for Early Detection of Preeclampsia and Gestational Diabetes Mellitus: A Systematic Review of Diagnostic Performance.

Sahar Altayeb Alfaki Ahmed, Mohammedelfateh Adam, Hanady Me M Osman, Naif Hadi Fahad Alqahtani, Abeer Ebaid Mahdi Gabreldaar, Mona Sidahmed Hassan Abdalla, Ryan Osman Alhessen Saidahmed.

  Preeclampsia (PE) and gestational diabetes mellitus (GDM) are major contributors to maternal and neonatal morbidity and mortality. Early detection is critical, yet current approaches, such as clinical risk scores for PE and glucose challenge/oral glucose tolerance test (OGTT) screening for GDM, often show limited sensitivity and variable predictive accuracy. Artificial intelligence (AI) and machine learning (ML) offer promising avenues for enhancing early prediction and diagnosis. This systematic review, conducted in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines, synthesized evidence from five databases (PubMed, Scopus, Embase, IEEE Xplore, ACM Digital Library) covering January 2020-July 2025. Eligible studies included both model development and validation efforts in pregnant populations. Data were extracted on study characteristics, AI model types, and diagnostic performance metrics. Risk of bias was assessed using the Prediction model Risk of Bias Assessment Tool (PROBAST). Nine studies met the inclusion criteria, reflecting strict eligibility requirements and limited high-quality research in this area. AI models frequently achieved strong performance, with area under the curve (AUC) values often >0.85. For PE, a neural network model externally validated in Spain achieved AUCs of 0.920 and 0.913 for early and preterm PE, with sensitivity up to 84%. For GDM, an XGBoost model achieved an AUC of 0.946 with an accuracy of 87.5%, while a Random Forest model reached a sensitivity of 75-85% and a specificity of 88-91%. Ensemble methods generally outperformed logistic regression. Seven studies were judged low risk of bias, while two were high risk, particularly in participant selection and analysis domains. Several models also demonstrated good calibration and positive net benefit on decision curve analysis, comparable to established clinical tools. AI models show substantial potential for early detection of PE and GDM, though heterogeneity and limited external validation remain barriers. Future research should prioritize multicenter, prospective validation, standardized reporting, and attention to equity and generalizability to ensure safe and effective translation into clinical practice.

Keywords:  artificial intelligence; diagnostic performance; early diagnosis; gestational diabetes; machine learning; predictive models; preeclampsia; systematic review

DOI:  https://doi.org/10.7759/cureus.92585
JAMA Netw Open. 2025 Oct 01. 8(10): e2538114

Diabetic Retinopathy Screening Among Federally Qualified Health Center Patients Using Point-of-Care AI: DRES-POCAI: A Trial Protocol.

Edgar A Diaz, Marva L Seifert, Vida Gruning, Nicole A Stadnick, Elizabeth Lugo-Butler, Ariel N Servin, Christian I Rodríguez-Rosales, Carrie Geremia, Chaithanya Ramachandra, Malavika Bhaskaranand, Dan Howard, Oliver Solis, Sharon Velasquez, Brian Snook, Sonia Tucker, Fatima A Muñoz.

Importance: Diabetic retinopathy screening (DRS) rates have historically been low among underserved populations due to barriers in accessing traditional eye care. Although artificial intelligence (AI)-powered DRS provides a potential strategy to improve screening rates, its optimal integration into primary care workflows within federally qualified health centers (FQHCs) requires rigorous evaluation. The clinical workflow of the Diabetic Retinopathy Screening Point-of-Care Artificial Intelligence (DRES-POCAI) trial in FQHCs integrates AI-powered DRS with electronic health records (EHRs) to automate results and prompt referrals, aiming to improve screening rates and facilitate early diagnosis and timely treatment.
Objective: To increase DRS rates, facilitate early-stage DR detection, improve timely eye specialist follow-up, and assess the effect of DRS on patients' knowledge, attitudes, self-efficacy, and satisfaction.
Design, Setting, and Participants: DRES-POCAI is a patient-level, multiclinic, open-label, parallel superiority randomized clinical trial at 2 FQHC sites of San Ysidro Health in San Diego County, California. The study recruitment targets 848 active FQHC patients aged 22 years or older with diabetes, no DRS in the prior 11 months, and scheduled medical visits during the intervention period. Patients with a history of retinopathy or retinal vascular occlusion and other physical or mental conditions are excluded. The study started in June 2024, with recruitment anticipated to conclude in August 2025 and follow-up until February 2026.
Intervention: The intervention arm receives DRS at their primary care clinic using an AI-powered DRS system, with retinal image analysis to identify more than mild DR and vision-threatening DR. Results are immediately available in the EHRs, and practitioners receive risk-stratified referral recommendations. The usual care arm receives referrals to an FQHC optometrist or external eye care practitioner, with results transmitted to the medical home later.
Main Outcomes and Measures: The primary outcome is DRS completion status. Secondary outcomes include DR diagnosis stage, specialist referrals, and participants' DR knowledge, attitudes, and intentions regarding future AI-powered DRS.
Results: Findings will be disseminated in peer-reviewed publications after data collection and analysis.
Conclusions and Relevance: DRES-POCAI will determine the effectiveness of an AI-powered DRS intervention to increase DRS rates in FQHC primary care workflows.
Trial Registration: ClinicalTrials.gov Identifier: NCT06721351.

DOI: https://doi.org/10.1001/jamanetworkopen.2025.38114
Clin Transl Sci. 2025 Oct;18(10): e70351

Development of a Novel Machine Learning Method for Estimation of Life-Long Chronic Disease Progression and Its Application to Type 2 Diabetes.

Yamato Sano, Ryota Jin, Hideki Yoshioka, Yuki Nakazato, Hiromi Sato, Akihiro Hisaka.

  Individual predictions of long-term chronic disease progression from data of limited duration provide valuable insights into estimating patient outcomes and therapeutic needs. Statistical Restoration of Fragmented Time course (SReFT) was developed to address this challenge, yet it is computationally too intensive for large-scale datasets. Although diabetes is a representative chronic disease with significant medical needs, it has been challenging to analyze long-term changes using large-scale patient data due to this limitation. In this study, we aimed to develop a new method (SReFT-machine learning, SReFT-ML) by applying machine learning to the concept of SReFT, and to confirm its performance using synthetic data and the data from a clinical trial, the Action to Control Cardiovascular Risk in Diabetes (ACCORD) trial (N = 10,251). SReFT-ML has successfully analyzed both synthetic and clinical data, and reconstructed biomarker trajectories over a 30-year period in patients with diabetes. Decreases in diastolic blood pressure and renal function may be important indicators of disease progression. Furthermore, although age and mortality data were not included in the model, survival analysis demonstrated a clear trend of hazard increases in mortality and diabetes-related outcomes with disease progression. This study introduced machine learning to enhance long-term disease progression modeling. The resulting model characterized a 30-year trajectory of disease risk in diabetes. The results provide a clinically meaningful hypothesis that incorporating systemic factors such as renal function and blood pressure, in addition to classic glycemic control, may enhance comprehensive diabetes care. Trial Registration: ClinicalTrials.gov number: NCT00000620.

Keywords:  biomarkers; diabetes mellitus; disease progression; machine learning; modeling

DOI:  https://doi.org/10.1111/cts.70351
Sci Rep. 2025 Oct 21. 15(1): 36678

A deep learning framework with hybrid stacked sparse autoencoder for type 2 diabetes prediction.

Abdussamad, Hanita Daud, Rajalingam Sokkalingam, Muhammad Zubair, Iliyas Karim Khan, Zafar Mahmood.

  Sparse numerical datasets are dominant in fields such as applied mathematics, astronomy, finance, and healthcare, presenting challenges due to their high dimensionality and sparse distribution. The predominance of zero values complicates optimal feature selection, making data analysis and model performance more complex. To overcome this challenge, this study introduces a deep learning-based algorithm, Hybrid Stacked Sparse Autoencoder (HSSAE), which integrates [Formula: see text] and [Formula: see text] regularization with binary cross-entropy loss to improve feature selection efficiency, where [Formula: see text] regularization penalizes large weights, simplifying data representations, while [Formula: see text] regularization prevents overfitting by limiting the total weight size. Additionally, the dropout technique enhances the algorithm's performance by randomly deactivating neurons during training, avoiding over-reliance on specific features. Meanwhile, batch normalization stabilizes weight distributions, reducing computational complexity and accelerating the convergence. The proposed algorithm, HSSAE, was evaluated against traditional classifiers, including Decision Tree, Random Forest, K-Nearest Neighbors, and Naïve Bayes, as well as deep learning-based models, such as Convolutional Neural Network, Long Short-Term Memory, and Stacked Sparse Autoencoder, in terms of Precision, Recall, Accuracy, F1-score, AUC, and Hamming Loss. Quantitatively, the proposed algorithm, HSSAE, was tested on two different sparse datasets, demonstrating superior performance with the highest accuracy of 89% on the health indicator dataset and 93% on the EHRs diabetes prediction dataset, respectively, and outperforming competing classifiers. The proposed algorithm, HSSAE, extracts features effectively and enhances robustness, making it well-suited for sparse data applications, particularly in healthcare, where high prediction accuracy is crucial.

Keywords:  Autoencoder; Deep learning; Diabetes prediction; Feature selection; Machine learning; Sparse data

DOI:  https://doi.org/10.1038/s41598-025-20534-4
Curr Comput Aided Drug Des. 2025 Oct 21.

Assessing Lung Injury Induced by Streptozotocin-induced Diabetes: A Deep Neural Network Analysis of Histopathological and Immunohistochemical Images.

Tuğba Şentürk, Demet Bolat, Arzu Yay, Münevver Baran, Fatma Latifoğlu.

   INTRODUCTION: Diabetes mellitus is an endocrine disorder characterized by metabolic abnormalities and chronic hyperglycemia, caused by insulin deficiency (Type I) or resistance (Type II). It affects various tissues differently, and its complications extend beyond classical targets, such as the kidneys and eyes, to lesser-studied organs, including the lungs. Understanding tissue-specific damage is crucial for effective disease management and the prevention of complications.
OBJECTIVE: This study aims to evaluate the histopathological and immunohistochemical effects of diabetic lung fibrosis using a streptozotocin (STZ)-induced diabetes model. Additionally, it seeks to develop a high-performance image classification system based on deep neural networks to accurately classify tissue damage in diabetic models.
METHODS: Lung tissue samples were collected from the STZ-induced diabetes model and analyzed through histopathological and immunohistochemical techniques. Image data were further processed using convolutional neural networks (CNNs), including pre-trained models, such as ResNet50, VGG16, and SqueezeNet. Classification was conducted in multiple color spaces (RGB, Grayscale, and HSV) and evaluated using performance metrics, including confusion matrix, precision, recall, F1 score, and accuracy.
RESULTS AND DISCUSSION: The use of color significantly enhanced image patch classification performance. Among the models tested, SqueezeNet in the RGB color space demonstrated the highest accuracy, achieving an F1 score of 93.49% ± 0.04 and an accuracy of 93.77% ± 0.04. These results indicated the efficacy of CNN-based classification in detecting lung damage associated with diabetes.
CONCLUSION: Our findings confirmed that diabetes induces histopathological changes in lung tissue, contributing to fibrosis and potential pulmonary complications. Deep learning-based classification methods, particularly when utilizing color space variations and advanced preprocessing techniques, provide a powerful tool for analyzing diabetic tissue damage and may aid in the development of diagnostic support systems.

Keywords:  Convolutional neural networks; color space conversion; deep learning.; diabetes; histopathology; image classification

DOI:  https://doi.org/10.2174/0115734099387481250930073924
J Diabetes Sci Technol. 2025 Oct 18. 19322968251367776

Establishment of a Diabetes-Tailored Data Intelligence Platform Enhances Clinical Care, Enables Risk-Based Monitoring, and Facilitates Population-Health-Based Approaches at a Pediatric Diabetes Network.

Brent Lockee, Craig A Vandervelden, Daniel R Tilden, Kelsey Panfil, Erin M Tallon, Emily DeWit, Katie Noland, David D Williams, Harpreet Gill, Susana R Patton, Priya Prahalad, Juan Espinoza, Amey Waghmode, Mitchell Barnes, Mark A Clements.

   BACKGROUND: Patient-generated health data (PGHD) represents an opportunity to customize care, particularly in type 1 diabetes (T1D) care where continuous glucose monitor (CGM) and insulin pump usage continues to rise. Previous solutions to integrating CGM data into the electronic health record (EHR) have been limited in their ability to integrate data from multiple sources, ensure data fidelity, integrate data from multiple data streams, and rapidly adapt to changes in data output from numerous vendors. We developed a novel data infrastructure contained outside of the EHR to provide an alternative approach to PGHD integration, enable diabetes centers to identify and predict risk, and to facilitate research and quality improvement.
METHODS: We identified three key capabilities: ingesting and storing a wide variety of data, refining raw data into actionable insights, and visualizing and reporting to decision makers. To meet these requirements, we built a data intelligence platform we coined the diabetes data dock (D-data dock) in the Microsoft Azure cloud platform.
RESULTS: The D-data dock houses approximately 100 million CGM measurements, one million clinical events and insulin bolus records, and a near complete EHR record covering approximately 3000 patients per year from 2016 to 2023. We provide case studies detailing how the D-data dock allows timely monitoring of CGM data, enables novel study designs, and powers machine-learning-informed supplemental care interventions.
CONCLUSIONS: The D-data dock is a novel approach to harnessing disparate data streams to improve patient care, enable timely interventions, and drive innovation to improve the lives and care of people with T1D.

Keywords:  continuous glucose monitor; data integration; data intelligence platform; machine learning; patient generated health data; population health

DOI:  https://doi.org/10.1177/19322968251367776