J Med Internet Res. 2026 Mar 02. 28
e79863
LEAPfROG Consortium
Background: Electronic health record (EHR) data, a key form of routinely collected patient data, offer great potential for medical research and the development of artificial intelligence (AI) tools. However, because these data are primarily gathered for health care rather than research, it often lacks the quality needed for AI training, raising both methodological and ethical concerns. While previous studies have reviewed the ethical implications of both routinely collected patient data and AI separately, their intersection, where AI is applied to such data, remains largely unexplored.
Objective: This study aimed to examine the ethical challenges that arise at the intersection of EHR data and AI development and to derive practice-oriented recommendations using the Dutch LEAPfROG (Leveraging Real-World Data to Optimize Pharmacotherapy Outcomes in Multimorbid Patients Using Machine Learning and Knowledge Representation Methods) project as a guiding case.
Methods: We used a mixed methods design combining a scoping literature review with a systematic search and 2 stakeholder workshops structured by the guidance ethics approach, reflecting a staged and iterative process aligned with the LEAPfROG project's development phases. The review identified 25 relevant publications from 2014 to 2024. The workshops, conducted with 17 and 13 participants respectively, included patients, clinicians, ethicists, data officers, and AI developers. Both workshops used dialogue to identify ethical values, impacts, and action points, focusing on a case study of drug-induced acute kidney injury.
Results: The analysis highlighted four major themes: (1) data privacy, transparency, and consent, including challenges of meaningful consent and risks of reidentification; (2) public trust and regulatory challenges, such as fragmented oversight and inconsistent governance; (3) fair representation and model generalizability, where incomplete or biased data may perpetuate health inequities; and (4) responsible AI integration in clinical practice, including concerns about clinical tropism, administrative burden, and the risk of overreliance on AI outputs. Both literature and stakeholder perspectives underscore the risk of decontextualization when EHR data are reused and emphasize the importance of clearly defining the purpose of data reuse to ensure real-world applicability and foster trust.
Conclusions: Responsible AI development requires explicit attention to how EHR data are produced, interpreted, and governed in practice, recognizing that data quality and meaning are shaped by the clinical, institutional, and social contexts in which they originate. Technical solutions or top-down regulation alone are insufficient. Instead, stakeholder-led and context-sensitive approaches are needed to define the purposes, risks, and benefits of medical AI. Grounded in the realities of health care practice and in the perspectives of patients, clinicians, and data custodians, these approaches can strengthen transparency, fairness, and clinical relevance, ensuring that EHR data are used ethically and effectively to support equitable and trustworthy AI innovation.
Keywords: artificial intelligence; ethics; medical informatics; pharmacotherapy; routinely collected health data; stakeholder participation