Eur Heart J Digit Health. 2026 Jun;7(5):
ztag070
iCARE4CVD consortium
Aims: Artificial intelligence (AI) tools utilizing large language models (LLMs) can accelerate scientific literature reviews by automating title, abstract, and full-text-based screenings of relevant patient populations and biomarkers. We developed an AI-based tool to automate and improve full-text screening performance using LLMs to accurately identify relevant publications that meet complex criteria.
Methods and results: We conducted a literature review utilizing the Population, Intervention-biomarkers, Comparison, Outcome framework to define our inclusion and exclusion criteria, focusing on biomarkers in heart failure with reduced ejection fraction (HFrEF). An AI-based full-text screening tool was created to process 5405 selected publications, combining multi-level and task-oriented retrieval-augmented generation (RAG) and agent-based methods, establishing ground truth standards to evaluate performance metrics both for the tool and human reviewers. Intra-LLM reliability was assessed by rerunning screenings on a batch of publications. Among the public and private domain models, LLaMA 3.3 70B was selected for its superior accuracy (82%), precision (71%), and recall (100%) in screening 49 manuscripts by LLMs. During the training phase, based on several hundred manuscripts, performance metrics significantly improved. Validation results showed a sensitivity of 91.4%, specificity of 53.2%, a false positive rate of 46.8%, and a false negative rate of 8.6%. The LLM outperformed human reviewers in F1 score and interrater reliability, achieving 100% consistency across multiple runs, with each run consisting of multiple LLMs on 1000 documents.
Conclusion: Our study demonstrated that AI tool can reduce labour-intensive efforts while maintaining accuracy in literature reviews, with greater inter-rater agreement compared to human reviewers.
Keywords: Artificial intelligence (AI); Biomarkers; Full-text screening; Heart failure; Large language models (LLMs); Retrieval-augmented generation (RAG)