bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–01–12
thirty-one papers selected by
Thomas Krichel, Open Library Society



  1. J CME. 2025 ;14(1): 2444726
      Many national meetings and speaker series feature an "Annual Review of the Literature" (ARL) session in which an individual or team presents a sampling of articles, selected and prepared because they represent important current topics or new ideas in the discipline of interest. Despite this, there is little in the medical literature describing how to thoughtfully and systematically develop these sessions. We identify best practices that we have developed and used in the United States Clerkship Directors of Internal Medicine (CDIM) over many years. These include identification of a theme, team assembly, timeline development, search strategy and rubric development and employment, and presentation planning strategies. Employing the steps described can help facilitate this otherwise arduous process.
    Keywords:  Education; clinical clerkship; faculty development; medical; program evaluation; systematic review
    DOI:  https://doi.org/10.1080/28338073.2024.2444726
  2. Clin Spine Surg. 2024 Nov 18.
      The adequacy of the literature search is one of the critical domains that affect the quality of the systematic review. The aim of a literature search in the systematic review should be to obtain thorough, comprehensive, transparent, and reproducible results. Precision (also called "positive predictive value") and sensitivity (also called "recall") have been postulated as 2 markers for rating the quality of literature search in systematic reviews. The reporting of such measures shall help in improving the relevance, transparency, reproducibility, and comprehensibility of the search. A search strategy that maximizes sensitivity with reasonable precision shall improve the quality of the review.
    DOI:  https://doi.org/10.1097/BSD.0000000000001738
  3. BioData Min. 2025 Jan 09. 18(1): 1
      Biomedical datasets are the mainstays of computational biology and health informatics projects, and can be found on multiple data platforms online or obtained from wet-lab biologists and physicians. The quality and the trustworthiness of these datasets, however, can sometimes be poor, producing bad results in turn, which can harm patients and data subjects. To address this problem, policy-makers, researchers, and consortia have proposed diverse regulations, guidelines, and scores to assess the quality and increase the reliability of datasets. Although generally useful, however, they are often incomplete and impractical. The guidelines of Datasheets for Datasets, in particular, are too numerous; the requirements of the Kaggle Dataset Usability Score focus on non-scientific requisites (for example, including a cover image); and the European Union Artificial Intelligence Act (EU AI Act) sets forth sparse and general data governance requirements, which we tailored to datasets for biomedical AI. Against this backdrop, we introduce our new Venus score to assess the data quality and trustworthiness of biomedical datasets. Our score ranges from 0 to 10 and consists of ten questions that anyone developing a bioinformatics, medical informatics, or cheminformatics dataset should answer before the release. In this study, we first describe the EU AI Act, Datasheets for Datasets, and the Kaggle Dataset Usability Score, presenting their requirements and their drawbacks. To do so, we reverse-engineer the weights of the influential Kaggle Score for the first time and report them in this study. We distill the most important data governance requirements into ten questions tailored to the biomedical domain, comprising the Venus score. We apply the Venus score to twelve datasets from multiple subdomains, including electronic health records, medical imaging, microarray and bulk RNA-seq gene expression, cheminformatics, physiologic electrogram signals, and medical text. Analyzing the results, we surface fine-grained strengths and weaknesses of popular datasets, as well as aggregate trends. Most notably, we find a widespread tendency to gloss over sources of data inaccuracy and noise, which may hinder the reliable exploitation of data and, consequently, research results. Overall, our results confirm the applicability and utility of the Venus score to assess the trustworthiness of biomedical data.
    Keywords:  Bioinformatics; Biomedical data quality; Cheminformatics; Computational biology; Data documentation; Data trustworthiness; Datasheets for Datasets; EU AI Act; Health informatics; Kaggle; Medical data; Medical text; Trustworthiness; Trustworthy data
    DOI:  https://doi.org/10.1186/s13040-024-00412-x
  4. JAMIA Open. 2025 Feb;8(1): ooae129
       Objectives: The National Library of Medicine (NLM) currently indexes close to a million articles each year pertaining to more than 5300 medicine and life sciences journals. Of these, a significant number of articles contain critical information about the structure, genetics, and function of genes and proteins in normal and disease states. These articles are identified by the NLM curators, and a manual link is created between these articles and the corresponding gene records at the NCBI Gene database. Thus, the information is interconnected with all the NLM resources, services which bring considerable value to life sciences. National Library of Medicine aims to provide timely access to all metadata, and this necessitates that the article indexing scales to the volume of the published literature. On the other hand, although automatic information extraction methods have been shown to achieve accurate results in biomedical text mining research, it remains difficult to evaluate them on established pipelines and integrate them within the daily workflows.
    Materials and Methods: Here, we demonstrate how our machine learning model, GNorm2, which achieved state-of-the art performance on identifying genes and their corresponding species at the same time handling innate textual ambiguities, could be integrated with the established daily workflow at the NLM and evaluated for its performance in this new environment.
    Results: We worked with 8 biomedical curator experts and evaluated the integration using these parameters: (1) gene identification accuracy, (2) interannotator agreement with and without GNorm2, (3) GNorm2 potential bias, and (4) indexing consistency and efficiency. We identified key interface changes that significantly helped the curators to maximize the GNorm2 benefit, and further improved the GNorm2 algorithm to cover 135 species of genes including viral and bacterial genes, based on the biocurator expert survey.
    Conclusion: GNorm2 is currently in the process of being fully integrated into the regular curator's workflow.
    Keywords:  AI workflow implementation; AI-assisted curation; article indexing; gene identification; gene name entity recognition; gene name normalization
    DOI:  https://doi.org/10.1093/jamiaopen/ooae129
  5. bioRxiv. 2024 Dec 17. pii: 2024.12.13.628309. [Epub ahead of print]
      Annotation with widely used, well-structured ontologies, combined with the use of ontology-aware software tools, ensures data and analyses are Findable, Accessible, Interoperable and Reusable (FAIR). Standardized terms with synonyms support lexical search. Ontology structure supports biologically meaningful grouping of annotations (typically by location and type). However, there are significant barriers to the adoption and use of ontologies by researchers and resource developers. One barrier is complexity. Ontologies serving diverse communities are often more complex than needed for individual applications. It is common for atlases to attempt their own simplifications by manually constructing hierarchies of terms linked to ontologies, but these typically include relationship types that are not suitable for grouping annotations. Here, we present a suite of tools for validating user hierarchies against ontology structure, using them to generate graphical reports for discussion and ontology views tailored to the needs of the HuBMAP Human Reference Atlas, and the Human Developmental Cell Atlas. In both cases, validation is a source of corrections and content for both ontologies and user hierarchies.
    DOI:  https://doi.org/10.1101/2024.12.13.628309
  6. JMIR Dermatol. 2025 Jan 06. 8 e60210
       Background: Online digital materials are integral to patient education and health care outcomes in dermatology. Acanthosis nigricans (AN) is a common condition, often associated with underlying diseases such as insulin resistance. Patients frequently search the internet for information related to this cutaneous finding. To our knowledge, the quality of online educational materials for AN has not been systematically examined.
    Objective: The primary objective of this study was to profile the readability and quality of the content of publicly available digital educational materials on AN and identify questions frequently asked by patients.
    Methods: This study analyzed publicly available internet sources to identify the most frequent questions searched by patients regarding AN using the Google Rankbrain algorithm. Furthermore, available articles on AN were evaluated for quality and reading level using metrics such as the Brief DISCERN score, and readability was determined using three specific scales including the Flesch-Kincaid score, Gunning Fog index, and the Coleman-Liau index, based on literature.
    Results: Patients most frequently accessed facts on AN from government sources, which comprised 30% (n=15) of the analyzed sources. The available articles did not meet quality standards and were at a reading level not appropriate for the general public. The majority of articles (n=29/50, 58%) had substandard Brief DISCERN scores, failing to meet the criteria for good quality.
    Conclusions: Clinicians should be aware of the paucity of valuable online educational material on AN and educate their patients accordingly.
    Keywords:  DISCERN; acanthosis nigricans; dermatology; general public; information behavior; information resource; information seeking; patient education; public health; readability; reading level; skin
    DOI:  https://doi.org/10.2196/60210
  7. Eye Contact Lens. 2024 Dec 31.
       PURPOSE: We aimed to compare the answers given by ChatGPT, Bard, and Copilot and that obtained from the American Academy of Ophthalmology (AAO) website to patient-written questions related to keratoconus in terms of accuracy, understandability, actionability, and readability to find out whether chatbots can be used in patient education.
    METHODS: Twenty patient-written questions obtained from the AAO website related to keratoconus were asked to ChatGPT, Bard, and Copilot. Two ophthalmologists independently assessed the answers obtained from chatbots and the AAO website in terms of accuracy, understandability, and actionability according to the Structure of Observed Learning Outcome taxonomy, Patient Education Materials Assessment Tool-Understandability, and Patient Education Materials Assessment Tool-Actionability tests, respectively. The answers were also compared for readability according to the Flesch Reading Ease scores obtained through the website.
    RESULTS: Bard had significantly higher scores compared with ChatGPT-3.5, Copilot, and AAO website according to Structure of Observed Learning Outcome taxonomy and Patient Education Materials Assessment Tool-Understandability (P<0.001 for each), whereas there was no significant difference between the other groups. Bard and ChatGPT achieved significantly higher scores than the AAO website according to the Patient Education Materials Assessment Tool-Actionability scale (P=0.001). The AAO website achieved significantly higher scores than the Bard on the Flesch Reading Ease scale, whereas there was no significant difference between the other groups (P=0.017).
    CONCLUSION: Chatbots are promising to provide accurate, understandable, and actionable answers. Chatbots can be a valuable aid in the education of patients with keratoconus under clinician supervision. In this way, unnecessary hospital visits can be prevented, and the burden on the health care system can be alleviated, while patient awareness can be raised.
    DOI:  https://doi.org/10.1097/ICL.0000000000001160
  8. J Clin Med. 2024 Dec 10. pii: 7482. [Epub ahead of print]13(24):
      Background/Objectives: Artificial intelligence (AI), particularly natural language processing (NLP) models such as ChatGPT, presents novel opportunities for patient education and informed consent. This study evaluated ChatGPT's use as a support tool for informed consent before penile prosthesis implantation (PPI) in patients with erectile dysfunction (ED) following radical prostatectomy. Methods: ChatGPT-4 answered 20 frequently asked questions across four categories: ED and treatment, PPI surgery, complications, and postoperative care. Three senior urologists independently rated information quality using the DISCERN instrument on a Likert scale ranging from 1 (poor quality) to 5 (good quality). Readability was assessed using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) formulas, and inter-rater reliability was measured using intraclass correlation coefficients. Results: The inter-rater reliability coefficient was 0.76 (95% CI 0.71-0.80). Mean DISCERN scores indicated moderate quality: 2.79 ± 0.92 for ED and treatment, 2.57 ± 0.98 for surgery, 2.65 ± 0.86 for complications, and 2.74 ± 0.90 for postoperative care. High scores (>4) were achieved for clarity and relevance, while complex issues, such as risks and alternative treatments, scored the lowest (<2). The FRE scores ranged from 9.8 to 28.39, and FKGL scores ranged from 14.04 to 17.41, indicating complex readability suitable for college-level comprehension. Conclusions: ChatGPT currently provides variable and often inadequate quality information without sufficient comprehensibility for informed patient decisions, indicating the need for further improvements in quality and readability.
    Keywords:  ChatGPT; erectile dysfunction; informed consent; natural language processing; patient education; penile prosthesis
    DOI:  https://doi.org/10.3390/jcm13247482
  9. Surgery. 2025 Jan 04. pii: S0039-6060(24)01011-0. [Epub ahead of print]180 109024
       BACKGROUND: Improving patient education has been shown to improve clinical outcomes and reduce disparities, though such efforts can be labor intensive. Large language models may serve as an accessible method to improve patient educational material. The aim of this study was to compare readability between existing educational materials and those generated by large language models.
    METHODS: Baseline colorectal surgery educational materials were gathered from a large academic institution (n = 52). Three prompts were entered into Perplexity and ChatGPT 3.5 for each topic: a Basic prompt that simply requested patient educational information the topic, an Iterative prompt that repeated instruction asking for the information to be more health literate, and a Metric-based prompt that requested a sixth-grade reading level, short sentences, and short words. Flesch-Kincaid Grade Level or Grade Level, Flesch-Kincaid Reading Ease or Ease, and Modified Grade Level scores were calculated for all materials, and unpaired t tests were used to compare mean scores between baseline and documents generated by artificial intelligence platforms.
    RESULTS: Overall existing materials were longer than materials generated by the large language models across categories and prompts: 863-956 words vs 170-265 (ChatGPT) and 220-313 (Perplexity), all P < .01. Baseline materials did not meet sixth-grade readability guidelines based on grade level (Grade Level 7.0-9.8 and Modified Grade Level 9.6-11.5) or ease of readability (Ease 53.1-65.0). Readability of materials generated by a large language model varied by prompt and platform. Overall, ChatGPT materials were more readable than baseline materials with the Metric-based prompt: Grade Level 5.2 vs 8.1, Modified Grade Level 7.3 vs 10.3, and Ease 70.5 vs 60.4, all P < .01. In contrast, Perplexity-generated materials were significantly less readable except for those generated with the Metric-based prompt, which did not statistically differ.
    CONCLUSION: Both existing materials and the majority of educational materials created by large language models did not meet readability recommendations. The exception to this was with ChatGPT materials generated with a Metric-based prompt that consistently improved readability scores from baseline and met recommendations in terms of the average Grade Level score. The variability in performance highlights the importance of the prompt used with large language models.
    DOI:  https://doi.org/10.1016/j.surg.2024.109024
  10. Int J Obstet Anesth. 2024 Dec 20. pii: S0959-289X(24)00329-7. [Epub ahead of print]61 104317
       INTRODUCTION: Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.
    METHODS: Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent t-test.
    RESULTS: Bard readability scores were high school level, significantly easier than ChatGPT's college level by all scoring metrics (P <0.001). Bard had significantly longer answers (P <0.001), yet with similar accuracy of Bard (85 % ± 10) and ChatGPT (87 % ± 14) (P = 0.5). PEMAT understandability scores were no statistically significantly different (P = 0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (P = 0.007) CONCLUSION: Answers to questions about "labor epidurals" should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4th to 6th grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.
    Keywords:  Epidural; Generative Artificial Intelligence; Labor analgesia; Patient educational materials; Pregnancy
    DOI:  https://doi.org/10.1016/j.ijoa.2024.104317
  11. Medicine (Baltimore). 2025 Jan 10. 104(2): e41059
      This study evaluates the efficacy of GPT-4, a Large Language Model, in simplifying medical literature for enhancing patient comprehension in glaucoma care. GPT-4 was used to transform published abstracts from 3 glaucoma journals (n = 62) and patient education materials (Patient Educational Model [PEMs], n = 9) to a 5th-grade reading level. GPT-4 was also prompted to generate de novo educational outputs at 6 different education levels (5th Grade, 8th Grade, High School, Associate's, Bachelor's and Doctorate). Readability of both transformed and de novo materials was quantified using Flesch Kincaid Grade Level (FKGL) and Flesch Reading Ease (FKRE) Score. Latent semantic analysis (LSA) using cosine similarity was applied to assess content consistency in transformed materials. The transformation of abstracts resulted in FKGL decreasing by an average of 3.21 points (30%, P < .001) and FKRE increasing by 28.6 points (66%, P < .001). For PEMs, FKGL decreased by 2.38 points (28%, P = .0272) and FKRE increased by 12.14 points (19%, P = .0459). LSA revealed high semantic consistency, with an average cosine similarity of 0.861 across all abstracts and 0.937 for PEMs, signifying topical themes were quantitatively shown to be consistent. This study shows that GPT-4 effectively simplifies medical information about glaucoma, making it more accessible while maintaining textual content. The improved readability scores for both transformed materials and GPT-4 generated content demonstrate its usefulness in patient education across different educational levels.
    DOI:  https://doi.org/10.1097/MD.0000000000041059
  12. JB JS Open Access. 2025 Jan-Mar;10(1):pii: e24.00007. [Epub ahead of print]10(1):
       Background: This study assesses the effectiveness of large language models (LLMs) in simplifying complex language within orthopaedic patient education materials (PEMs) and identifies predictive factors for successful text transformation.
    Methods: We transformed 48 orthopaedic PEMs using GPT-4, GPT-3.5, Claude 2, and Llama 2. The readability, quantified by the Flesch-Kincaid Reading Ease (FKRE) and Flesch-Kincaid Grade Level (FKGL) scores, was measured before and after transformation. Analysis included text characteristics such as syllable count, word length, and sentence length. Statistical and machine learning methods evaluated the correlations and predictive capacity of these features for transformation success.
    Results: All LLMs improved FKRE and FKGL scores (p < 0.01). GPT-4 showed superior performance, transforming PEMs to a seventh-grade reading level (mean FKGL, 6.72 ± 0.99), with higher FKRE and lower FKGL than other models. GPT-3.5, Claude 2, and Llama 2 significantly shortened sentences and overall text length (p < 0.01). Importantly, correlation analysis revealed that transformation success varied substantially with the model used, depending on original text factors such as word length and sentence complexity.
    Conclusions: LLMs successfully simplify orthopaedic PEMs, with GPT-4 leading in readability improvement. This study highlights the importance of initial text characteristics in determining the effectiveness of LLM transformations, offering insights for optimizing orthopaedic health literacy initiatives using artificial intelligence (AI).
    Clinical Relevance: This study provides critical insights into the ability of LLMs to simplify complex orthopaedic PEMs, enhancing their readability without compromising informational integrity. By identifying predictive factors for successful text transformation, this research supports the application of AI in improving health literacy, potentially leading to better patient comprehension and outcomes in orthopaedic care.
    DOI:  https://doi.org/10.2106/JBJS.OA.24.00007
  13. J Prosthet Dent. 2025 Jan 04. pii: S0022-3913(24)00833-3. [Epub ahead of print]
       STATEMENT OF PROBLEM: Artificial intelligence (AI) chatbots have been proposed as promising resources for oral health information. However, the quality and readability of existing online health-related information is often inconsistent and challenging.
    PURPOSE: This study aimed to compare the reliability and usefulness of dental implantology-related information provided by the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models (LLMs).
    MATERIAL AND METHODS: A total of 75 questions were developed covering various dental implant domains. These questions were then presented to 3 different LLMs: ChatGPT-3.5, ChatGPT-4, and Google Gemini. The responses generated were recorded and independently assessed by 2 specialists who were blinded to the source of the responses. The evaluation focused on the accuracy of the generated answers using a modified 5-point Likert scale to measure the reliability and usefulness of the information provided. Additionally, the ability of the AI-chatbots to offer definitive responses to closed questions, provide reference citation, and advise scheduling consultations with a dental specialist was also analyzed. The Friedman, Mann Whitney U and Spearman Correlation tests were used for data analysis (α=.05).
    RESULTS: Google Gemini exhibited higher reliability and usefulness scores compared with ChatGPT-3.5 and ChatGPT-4 (P<.001). Google Gemini also demonstrated superior proficiency in identifying closed questions (25 questions, 41%) and recommended specialist consultations for 74 questions (98.7%), significantly outperforming ChatGPT-4 (30 questions, 40.0%) and ChatGPT-3.5 (28 questions, 37.3%) (P<.001). A positive correlation was found between reliability and usefulness scores, with Google Gemini showing the strongest correlation (ρ=.702).
    CONCLUSIONS: The 3 AI Chatbots showed acceptable levels of reliability and usefulness in addressing dental implant-related queries. Google Gemini distinguished itself by providing responses consistent with specialist consultations.
    DOI:  https://doi.org/10.1016/j.prosdent.2024.12.016
  14. Clin Otolaryngol. 2025 Jan 07.
       INTRODUCTION: Artificial intelligence (AI) based chat robots are increasingly used by users for patient education about common diseases in the health field, as in every field. This study aims to evaluate and compare patient education materials on rhinosinusitis created by two frequently used chat robots, ChatGPT-4 and Google Gemini.
    METHOD: One hundred nine questions taken from patient information websites were divided into 4 different categories: general knowledge, diagnosis, treatment, surgery and complications, then asked to chat robots. The answers given were evaluated by two different expert otolaryngologists, and on questions where the scores were different, a third, more experienced otolaryngologist finalised the evaluation. Questions were scored from 1 to 4: (1) comprehensive/correct, (2) incomplete/partially correct, (3) accurate and inaccurate data, potentially misleading and (4) completely inaccurate/irrelevant.
    RESULTS: In evaluating the answers given by ChatGPT-4, all answers in the Diagnosis category were evaluated as comprehensive/correct. In the evaluation of the answers given by Google Gemini, the answers evaluated as completely inaccurate/irrelevant in the treatment category were found to be statistically significantly higher, and the answers evaluated as incomplete/partially correct in the surgery and complications category were found to be statistically significantly higher. In the comparison between the two chat robots, in the treatment category, ChatGPT-4 had a higher correct evaluation rate than Google Gemini and was found to be statistically significant.
    CONCLUSION: The answers given by ChatGPT-4 and Google Gemini chat robots regarding rhinosinusitis were evaluated as sufficient and informative.
    Keywords:  ChatGPT‐4; Google Gemini; artificial intelligence; rhinosinusitis
    DOI:  https://doi.org/10.1111/coa.14273
  15. J Burn Care Res. 2025 Jan 06. pii: irae211. [Epub ahead of print]
      Patients often use Google for their medical questions. With the emergence of artificial intelligence large language models, such as ChatGPT, patients may turn to such technologies as an alternative source of medical information. This study investigates the safety, accuracy, and comprehensiveness of medical responses provided by ChatGPT in comparison to Google for common questions about burn injuries and their management. A Google search was performed using the term "burn," and the top ten frequently searched questions along with their answers were documented. These questions were then prompted into ChatGPT. The quality of responses from both Google and ChatGPT was evaluated by three burn and trauma surgeons using the Global Quality Score (GQS) scale, rating from 1 (poor quality) to 5 (excellent quality). A Wilcoxon paired t-test evaluated the difference in scores between Google and ChatGPT answers. Google answers scored an average of 2.80 ± 1.03, indicating that some information was present but important topics were missing. Conversely, ChatGPT-generated answers scored an average of 4.57 ± 0.73, indicating excellent quality responses with high utility to patients. For half of the questions, the surgeons unanimously preferred their patients receive information from ChatGPT. This study presents an initial comparison of Google and ChatGPT responses to commonly asked burn injury questions. ChatGPT outperforms Google in responding to commonly asked questions on burn injury and management based on the evaluations of three experienced burn surgeons. These results highlight the potential of ChatGPT as a source of patient education.
    Keywords:  Artificial Intelligence; Burn Care and Surgery; Burns; ChatGPT; Google; Patient Education
    DOI:  https://doi.org/10.1093/jbcr/irae211
  16. Cancer Med. 2025 Jan;14(1): e70554
       PURPOSE: Caregivers in pediatric oncology need accurate and understandable information about their child's condition, treatment, and side effects. This study assesses the performance of publicly accessible large language model (LLM)-supported tools in providing valuable and reliable information to caregivers of children with cancer.
    METHODS: In this cross-sectional study, we evaluated the performance of the four LLM-supported tools-ChatGPT (GPT-4), Google Bard (Gemini Pro), Microsoft Bing Chat, and Google SGE-against a set of frequently asked questions (FAQs) derived from the Children's Oncology Group Family Handbook and expert input (In total, 26 FAQs and 104 generated responses). Five pediatric oncology experts assessed the generated LLM responses using measures including accuracy, clarity, inclusivity, completeness, clinical utility, and overall rating. Additionally, the content quality was evaluated including readability, AI disclosure, source credibility, resource matching, and content originality. We used descriptive analysis and statistical tests including Shapiro-Wilk, Levene's, Kruskal-Wallis H-tests, and Dunn's post hoc tests for pairwise comparisons.
    RESULTS: ChatGPT shows high overall performance when evaluated by the experts. Bard also performed well, especially in accuracy and clarity of the responses, whereas Bing Chat and Google SGE had lower overall scores. Regarding the disclosure of responses being generated by AI, it was observed less frequently in ChatGPT responses, which may have affected the clarity of responses, whereas Bard maintained a balance between AI disclosure and response clarity. Google SGE generated the most readable responses whereas ChatGPT answered with the most complexity. LLM tools varied significantly (p < 0.001) across all expert evaluations except inclusivity. Through our thematic analysis of expert free-text comments, emotional tone and empathy emerged as a unique theme with mixed feedback on expectations from AI to be empathetic.
    CONCLUSION: LLM-supported tools can enhance caregivers' knowledge of pediatric oncology. Each model has unique strengths and areas for improvement, indicating the need for careful selection based on specific clinical contexts. Further research is required to explore their application in other medical specialties and patient demographics, assessing broader applicability and long-term impacts.
    Keywords:  artificial intelligence; health care communication; health literacy; large language models; patient education; pediatric oncology
    DOI:  https://doi.org/10.1002/cam4.70554
  17. JTCVS Open. 2024 Dec;22 530-539
       Objective: Well-designed patient education materials (PEMs) increase health literacy, which has been linked to better surgical patient outcomes. The quality of lung cancer surgery PEMs is unknown, however. Here we assessed printed lung cancer surgery PEMs for readability, understandability, actionability, and accessibility.
    Methods: Various lung cancer programs throughout the United States were contacted for their lung cancer surgery PEMs. The readability of the received materials was calculated using 6 readability tests. Four thoracic surgeon-advanced practice practitioner dyads scored the PEMs for understandability, actionability, and accessibility using the Patient Education Material Assessment Tool and the Accessibility Assessment Tool, with the recommended minimum threshold of 70%. One-sample t tests were performed to compare each parameter against its recommended threshold.
    Results: Out of 34 institutions contacted, 18 (52.9%) provided PEMs. The average reading level of the PEMs ranged from 7th grade to 11th grade, significantly exceeding the recommended 6th grade health literacy threshold (P < .01). Although mean understandability (73.7 ± 13.2%) and actionability (70.2 ± 17.8%) scores were not significantly different from the minimum threshold, and the mean accessibility score (81.8 ± 13.5%) was significantly higher than the threshold (P < .05), there was wide variation in the scores. Most PEMs scored well in organization and writing but lacked other features that can enhance patient understanding, such as visual aids and summaries.
    Conclusions: PEMs are written at reading levels that are too advanced for patients. Although PEMs scored well in understandability, actionability, and accessibility, analysis of individual items revealed the need for improvement, including the use of shorter sentences, more visual aids and summaries, and expansion of language translations.
    Keywords:  accessibility; lung cancer surgery; patient education materials; readability; understandability
    DOI:  https://doi.org/10.1016/j.xjon.2024.09.005
  18. Int J Med Inform. 2025 Jan 06. pii: S1386-5056(25)00004-8. [Epub ahead of print]195 105787
       BACKGROUND: Large language models (LLMs) are becoming increasingly popular and are playing an important role in providing accurate clinical information to both patients and physicians. This study aimed to investigate the effectiveness of ChatGPT-4.0, Google Gemini, and Microsoft Copilot LLMs for responding to patient questions regarding refractive surgery.
    METHODS: The LLMs' responses to 25 questions about refractive surgery, which are frequently asked by patients, were evaluated by two ophthalmologists using a 5-point Likert scale, with scores ranging from 1 to 5. Furthermore, the DISCERN scale was used to assess the reliability of the language models' responses, whereas the Flesch Reading Ease and Flesch-Kincaid Grade Level indices were used to evaluate readability.
    RESULTS: Significant differences were found among all three LLMs in the Likert scores (p = 0.022). Pairwise comparisons revealed that ChatGPT-4.0's Likert score was significantly higher than that of Microsoft Copilot, while no significant difference was found when compared to Google Gemini (p = 0.005 and p = 0.087, respectively). In terms of reliability, ChatGPT-4.0 stood out, receiving the highest DISCERN scores among the three LLMs. However, in terms of readability, ChatGPT-4.0 received the lowest score.
    CONCLUSIONS: ChatGPT-4.0's responses to inquiries regarding refractive surgery were more intricate for patients compared to other language models; however, the information provided was more dependable and accurate.
    Keywords:  Artificial intelligence; ChatGPT-4.0; Google Gemini; Microsoft copilot; Refractive surgery
    DOI:  https://doi.org/10.1016/j.ijmedinf.2025.105787
  19. Ann Thorac Surg Short Rep. 2024 Sep;2(3): 331-335
       Background: Online resources are becoming the primary educational resource for patients. Quality and reliability of websites about coronary artery bypass graft (CABG) procedures are unknown.
    Methods: We queried 4 search engines (Google, Bing, Yahoo!, and Dogpile) for the terms coronary artery bypass, coronary artery bypass graft, coronary artery bypass graft surgery, and CABG. The top 30 websites from each were aggregated. After exclusions, 85 websites were graded with the DISCERN instrument, patient-focused criteria, and readability calculators by a 2-reviewer system.
    Results: Accessibility was low; 34.1% of websites disclosed authorship, and 23.5% were available in Spanish. Median total score was 55 of 95 (interquartile range [IQR], 44-68); this score varied by website type (P = .048). Professional medical society (median, 76; IQR, 76-76) and governmental agency (median, 69; IQR, 56.6-75.5) scored higher, whereas industry (median, 51.8; IQR, 47.1-56.4) and hospital/health care (median, 49; IQR, 40-61) scored lower. Readability was low, with median Flesch-Kincaid grade level score of 11.1 (IQR, 9.5-12.6) and 75.3% of websites written above eighth-grade reading level.
    Conclusions: Accessibility of online patient educational resources for CABG procedures is limited by language and reading level despite being widely available. Quality and reliability of the information offered varied between website types. Improving readability to ensure patients' understanding and comprehensive decision-making should be prioritized.
    DOI:  https://doi.org/10.1016/j.atssr.2023.12.021
  20. Arthrosc Sports Med Rehabil. 2024 Dec;6(6): 100982
       Purpose: To examine the overall reading levels of anterior cruciate ligament reconstruction online patient education materials (OPEMs) written in English and Spanish.
    Methods: We conducted Google searches for OPEMs using "ACL surgery" and "cirugía LCA" as English and Spanish search terms, respectively. Several measures of readability were used to analyze 25 English-language OPEMs (Flesch Reading Ease, Flesch Reading Ease Grade Level, Flesch-Kincaid Grade Level, Coleman-Liau Index, Gunning Fog Index, and Simple Measure of Gobbledygook) and 25 Spanish-language OPEMs (Fernández-Huerta Index, Fernández-Huerta Grade Level, and Índice de Legibilidad de Flesch-Szigriszt). English- and Spanish-language OPEMs were compared based on mean overall grade level and number of OPEMs written below a seventh- or ninth-grade reading level.
    Results: English-language OPEMs showed a higher mean overall grade level than Spanish-language OPEMs (10.48 ± 1.86 vs 8.64 ± 1.22, P < .001). No significant differences were noted in the number of OPEMs written below a seventh-grade reading level. However, significantly more Spanish-language OPEMs were written below a ninth-grade reading level compared with English-language OPEMs (56% vs 16%, P = .003).
    Conclusions: Although Spanish-language OPEMs were written at a lower reading level, average readability for both English- and Spanish-language OPEMs was significantly higher than the recommended level. Across both languages, only a single English-language webpage met the American Medical Association-recommended sixth-grade reading level. More Spanish-language articles were written at or below the average adult reading level in the United States.
    Clinical Relevance: It is imperative that patient educational materials be written at a reading level that is understood by the most patients. This is especially true for OPEMs, when a medical provider is not present to answer questions. Therefore, it is important to evaluate the reading level of OPEMs to determine whether they are written at an appropriate level for the best patient understanding.
    DOI:  https://doi.org/10.1016/j.asmr.2024.100982
  21. Curr Neuropharmacol. 2025 Jan 02.
       BACKGROUND: Today more and more people search the web for health-related information, risking to come across misinformation and biased content that may affect their treatment decisions. Cannabidiol (CBD) is among the products for which beneficial effects have been claimed, often at the expense of the risks; further keeping in mind unreliable information reported on products themselves.
    OBJECTIVE: This study evaluated the quality of information retrieved by Google on the potential effects of CBD on weight management, also comparing Italian and English contents, hypothesizing generally low quality and language-driven differences in offered information.
    METHODS: Queries regarding cannabidiol and obesity-related terms were entered into Google, ranking the first 50 webpages from both merged Italian and English results for analysis.
    RESULTS: Of the outputs, 37 Italian and 27 English websites addressed the topic and were not related to medical literature. As expected, a substantial proportion of information was of low quality, with English sites performing better (29.6%) than Italian ones (54%, p = 0.052) in terms of "JAMA benchmarks" for trustworthiness of information. Also, while most English sites were "Health portals" (40.7%) with neutral stance toward CBD (74.1%), Italian ones were predominantly "commercial" (78.4%, p = 0.001) and promoting CBD use (89.2%, p < 0.001).
    CONCLUSION: Findings suggest the need for better online information, especially in non-Englishspeaking countries, as scarce and unequal information can lead people to make poor health choices, with potentially harmful consequences.
    Keywords:  Cannabidiol; Quality; World Wide Web; google.; information; internet; weight loss
    DOI:  https://doi.org/10.2174/011570159X333465241121184404
  22. Healthcare (Basel). 2024 Dec 10. pii: 2492. [Epub ahead of print]12(24):
      This study aimed to assess the quality of YouTube (YT) videos providing medical information on cervical spine fractures; secondly, a comparison of two timeframes has been conducted. Using Google Chrome with privacy settings to minimize personalization, two searches were conducted on 20 July 2021 and the second one on 10 April 2024 using various terms related to cervical spine injuries. Videos were evaluated using the DISCERN (Quality Criteria for Consumer Health Information), GQS (Global Quality Score), and JAMA scoring systems. In total, 91 videos were included. Mechanisms of injury were the most frequent video content (n = 66), and postoperative pain occurred the least (n = 6). The mean DISCERN score of 43.26 (std = 11.25), mean GQS of 2.67 (std = 0.74), and mean JAMA score was 2.2 (std = 0.68). Inclusion of treatment options had an odd ratio of 21.72 for a better-quality video. The largest number of videos was provided by physicians (n = 24). In DISCERN, risks of treatment were graded lowest = 1.9. Newer videos achieved higher scores in the DISCERN, GQS, and JAMA scoring systems reaching 52.5, 3, and 2.75, respectively. These scores suggest inadequate information provision in the videos, hindering patients' understanding of their condition. Due to insufficient information presented in current videos, patients are not fully informed.
    Keywords:  YouTube; cervical; fracture; spine
    DOI:  https://doi.org/10.3390/healthcare12242492
  23. Abdom Radiol (NY). 2025 Jan 06.
       PURPOSE: To evaluate the quality of YouTube videos on patient education concerning prostatic artery embolization (PAE).
    METHODS: All PAE videos on YouTube were evaluated in December 2023. The quality of the videos was evaluated utilizing the DISCERN Scale Criterion. The popularity and engagement of each video was assessed using the Video Power Index (VPI) and Viewer Impact Score (VIS), respectively. Comparisons of these metrics were conducted and stratified by the video source type including academic institution, interventional radiologist, and patient testimony. Data describing discussion of risks, benefits, and indications were further collected.
    RESULTS: Of the 43 videos, video characteristics included duration (mean = 4.6 min), views (mean = 16,885), and likes (mean = 139). The mean DISCERN, VPI, and VIS scores were 47.9, 15.0, and 36.9, respectively. There was no correlation between quality, and popularity (R2 = 0.09) or engagement (R2 = 0.01). Videos featuring board-certified physicians did not significantly improved DISCERN scores (p = 0.13), VPI (p = 0.15), or VIS (p = 0.39) scores when compared to those without. Content by interventional radiologists demonstrated higher popularity compared to videos featuring other specialties (p = 0.04), but there was no difference in quality (p = 0.18).
    CONCLUSION: Educational videos about PAE on YouTube are of average quality. Clinicians should be aware of the general state of online information concerning PAE and guide patients towards high quality resources.
    Keywords:  DISCERN; PAE; Prostatic artery embolization; YouTube
    DOI:  https://doi.org/10.1007/s00261-024-04790-y
  24. JMIR Infodemiology. 2025 Jan 05.
       BACKGROUND: YouTube is an increasingly used platform for medical information. However, the reliability and validity of health-related information on celiac disease (CD) on YouTube has not been determined.
    OBJECTIVE: The aim of this study was to analyse the reliability and validity of CD-related YouTube videos.
    METHODS: On 15 November 2023, a search was performed on YouTube using the keyword "celiac disease". This search resulted in a selection of videos which were then reviewed by two separate evaluators for content, origin and specific features. The evaluators assessed the reliability and quality of these videos using a modified DISCERN score (mDISCERN), the Journal of the American Medical Association benchmark criteria score (JAMA), the usefulness score, video power ındex (VPİ) and the global quality scale score (GQS).
    RESULTS: In the analysis of 120 initially screened CD videos, 85 met the criteria for inclusion in the study after certain videos were excluded based on pre-defined criteria. While the duration of the videos uploaded by healthcare professionals was significantly longer than the other group (P=0.009), it was concluded that the median scores for mDISCERN (4 vs. 2, P<.001), GQS (4 vs. 3, P<.001), JAMA (4 vs. 2, P<.001) and usefulness (8 vs. 6, P<.001) of the videos from this group were significantly higher than those from non-healthcare professionals. Video interaction parameters including the median number of views, views per day, likes, dislikes, comments and VPI demonstrated no significant difference between the two groups.
    CONCLUSIONS: This study showed that YouTube videos about celiac disease vary significantly in reliability and quality depending on their source. Increasing the production of reliable videos by healthcare professionals may help to improve patient education and make YouTube a more reliable resource.
    CLINICALTRIAL:
    DOI:  https://doi.org/10.2196/58615
  25. Cureus. 2024 Dec;16(12): e75419
      Background Various studies have evaluated the quality of health-related information on TikTok (ByteDance Ltd., Beijing, China), including topics such as COVID-19, diabetes, varicoceles, bladder cancer, colorectal cancer, and others. However, there is a paucity of data on studies that examined TikTok as a source of quality health information on human papillomavirus (HPV). This study, therefore, evaluated the quality of health information on HPV on TikTok. Methods The terms "HPV" and "human papillomavirus" were searched on TikTok on a single day in August 2024, and 200 videos were retrieved. Relevant user metrics were collected for each video, including the number of likes, shares, and followers, the video length, and the uploader type. Two independent raters assessed each video regarding the completeness of six types of content (the definition of HPV, symptoms, risk factors, evaluation, management, and outcomes). Then, the two raters independently assessed the quality of information in the videos using the DISCERN instrument. Results Sixty-nine videos met inclusion criteria; 11 were created by general users, 44 by healthcare professionals, and 14 by organizations. Videos uploaded by general users and health professionals have a longer duration (p < 0.001) and more likes (p = 0.048) than those uploaded by organizations. More than 60% of the videos contained little or no content on the HPV content assessed. Although the reliability and quality of treatment choices were higher among videos uploaded by healthcare professionals, the overall quality of HPV health information using the DISCERN instrument was "very poor" (24.2 (±6.92)). Conclusions The overall quality of HPV videos uploaded on TikTok is very poor and not acceptable, thus failing to satisfy public health needs. Healthcare professionals must enhance their social media presence, produce reliable and substantive material, and collaborate with social media platforms and high-engagement accounts to facilitate users' access to high-quality data. TikTok users must recognize that material regarding HPV may lack medical accuracy and should consistently consult healthcare providers for medical guidance.
    Keywords:  health information; hpv; human papillomavirus; reliability; social media; tiktok
    DOI:  https://doi.org/10.7759/cureus.75419
  26. Arthrosc Sports Med Rehabil. 2024 Dec;6(6): 100983
       Purpose: To analyze the most frequently searched questions associated with shoulder labral pathology and to evaluate the source-type availability and quality.
    Methods: Common shoulder labral pathology-related search terms were entered into Google, and the suggested frequently asked questions were compiled and categorized. In addition, suggested sources were recorded, categorized, and scored for quality of information using JAMA (The Journal of the American Medical Association) benchmark criteria. Statistical analysis was performed to compare the types of questions and their associated sources, as well as the quality of sources.
    Results: In this study, 513 questions and 170 sources were identified and categorized. The most popular topics were diagnosis/evaluation (21.5%) and indications/management (21.1%.). The most common website types were academic (27.9%), commercial (25.2%), and medical practice (22.5%). Multiple statistically significant associations were found between specific question categories and their associated source types. The average JAMA quality score for all sources was 1.56, and medical websites had significantly lower quality scores than nonmedical sites (1.05 vs 2.12, P < .001).
    Conclusions: Patients searching the internet for information regarding shoulder labral pathology often look for facts regarding the diagnosis and management of their conditions. They use various source types to better understand their conditions, with government sources being of the highest quality, whereas medical sites showed statistically lower quality. Across the spectrum of questions, the quality of readily available resources varies substantially.
    Clinical Relevance: The use of online resources in health care is expanding. It is important to understand the most commonly asked questions and the quality of information available to patients.
    DOI:  https://doi.org/10.1016/j.asmr.2024.100983
  27. Am J Otolaryngol. 2024 Dec 31. pii: S0196-0709(24)00381-8. [Epub ahead of print]46(1): 104595
       OBJECTIVE: To evaluate the quality and engagement of rhinology-related educational videos shared by healthcare providers on Instagram.
    METHODS: The top 150 videos on Instagram for #SinusSurgeryEducation, #TurbinateReductionEducation, and #SeptoplastyEducation were selected. Videos were categorized by provider's specialty and analyzed for engagement metrics (likes, comments, shares, views), video duration, and hashtags. The Patient Education Materials Assessment Tool Audio/Visual (PEMAT-A/V) was used to assess the understandability and actionability of the medical educational videos.
    RESULTS: Sixty-three videos were analyzed: septoplasty education (26 videos), turbinate reduction education (17 videos), and sinus surgery education (20 videos). Of these, 88 % were classified as medical education content, while 12% focused on before-and-after surgery visualizations. Among the educational content, 38 % were by otolaryngologists, 32 % by plastic surgeons, and 30 % by other providers such as anesthesiologists and chiropractors. Content created by plastic surgeons received higher engagement metrics compared to otolaryngologists. The average PEMAT-A/V scores were 75 % for understandability and 37 % for actionability.
    CONCLUSION: Our analysis reveals that plastic surgeons and otolaryngologists are using social media for medical education, with content demonstrating moderate engagement and quality understandability. As social media continues to evolve as a source for disseminating health-related information, providers should strive to understand its mechanisms and impacts.
    Keywords:  Medical education; Otolaryngology education; Social media
    DOI:  https://doi.org/10.1016/j.amjoto.2024.104595
  28. Health Info Libr J. 2025 Jan 06.
       BACKGROUND: The COVID-19 demanded efficient and effective supply of information to the public to help reduce the rate of transmission.
    OBJECTIVES: This study aims to analyse Omanis' information behaviour during the COVID-19 pandemic, to help national authorities to prepare for future health crises or pandemics.
    METHODS: A self-administered online survey involving a structured open-ended questionnaire was conducted via the SurveyMonkey software. Snowball and convenience sampling methods were used to recruit potential participants from social media sites like Instagram, Twitter and Facebook. Non-parametric testing (Mann-Whitney U and Kruskal-Wallis H tests) assisted in analysis of demographic factors. Descriptive statistical analysis identified trends in information needs and seeking behaviour.
    RESULTS: Over 6000 responses were obtained. The results revealed that Oman nationals were seeking information on symptoms of COVID-19, global and national infection rates, preventive measures, treatment and vaccines. Primary sources of information were radio news, Oman TV, international TV, print media, healthcare professionals, international agencies and online news websites.
    DISCUSSION: There was little trust of local information sources with many Omanis relying on international information sources such as the WHO and international TV networks.
    CONCLUSION: Public health agencies need to prepare for timely, and reliable information provision during health crises.
    Keywords:  Middle East; health information needs; information sources; information‐seeking behaviour; pandemic
    DOI:  https://doi.org/10.1111/hir.12565
  29. Health Info Libr J. 2025 Jan 08.
       BACKGROUND: Much government response to improving vaccination uptake during the COVID-19 pandemic has focused on the problems of misinformation and disinformation. There may, however, be other signals within online health information that influence uptake of vaccination.
    OBJECTIVE: This study identified the influence of various health information signals within online information communities on the intention of receiving the vaccine.
    METHOD: A deductive approach was used to derive constructs from signalling theory. Constructs were validated by a convenience sample using a questionnaire. Structural equation modelling (SEM) was used to evaluate the measurement model, the structural model and the multigroup analysis.
    RESULTS: The analysis showed a significant impact of signals derived from past experience, information asymmetry and source credibility constructs on the perceived quality of the vaccine service. The perceived quality also had a significant impact on the intention to receive the vaccine.
    DISCUSSION: Signalling theory was able to explain the importance of health information signals perceived from online platforms on the intention of individuals to receive the vaccine.
    CONCLUSION: Information asymmetry between information provider and receiver, perceived credibility of sources and perceived quality of the vaccination service may influence decisions about vaccination.
    Keywords:  health information needs; information seeking behaviour; pandemic; public health; statistical models
    DOI:  https://doi.org/10.1111/hir.12564
  30. Pathogens. 2024 Dec 20. pii: 1125. [Epub ahead of print]13(12):
      Background: Urinary tract infections (UTIs) are among the most prevalent bacterial infections. With many patients turning to the Internet as a health resource, this study seeks to understand public engagement with online resources concerning recurrent UTIs (rUTIs), assess their reliability, and identify common questions/concerns about rUTIs. Methods: Social media analysis tool BuzzSumo was used to calculate online engagement (likes, shares, comments, views) with information on rUTIs. The reliability of highly engaged articles was evaluated using the DISCERN questionnaire. Highly engaged categories were entered as keywords in Google Trends to quantify search interest. To categorize patient-specific concerns, a database containing anonymously collected patient questions about rUTIs was created. Results: BuzzSumo revealed four search categories: general information, treatment, causes, and herbal remedies. DISCERN scores indicated moderate reliability overall; however, the "herbal remedies" category demonstrated poor reliability despite high engagement. Google Trends analysis highlighted "causes" and "treatment" searches as highest in relative interest. The 10 most popular categories of concern were antibiotics, microbiome, vaccines, prevention, pelvic pain, sex, testing, symptoms, diet/lifestyle, and hormones. Conclusions: People living with rUTIs demonstrate key concerns and often seek information online, yet articles with high engagement often contain unreliable information. Healthcare professionals may consider counteracting misinformation by providing evidence-based information online about rUTIs.
    Keywords:  health-related information; patient concerns; patient engagement; recurrent urinary tract infections; urinary tract infections
    DOI:  https://doi.org/10.3390/pathogens13121125
  31. PLoS One. 2025 ;20(1): e0315049
       BACKGROUND: The ability to access and navigate online sexual health information and support is increasingly needed in order to engage with wider sexual healthcare. However, people from underserved populations may struggle to pass though this "digital doorway". Therefore, using a behavioural science approach, we first aimed to identify barriers and facilitators to i) seeking online sexual health information and ii) seeking online sexual health support. Subsequently, we aimed to generate theory-informed recommendations to improve these access points.
    METHODS: The PROGRESSPlus framework guided purposive recruitment (15.10.21-18.03.22) of 35 UK participants from diverse backgrounds, including 51% from the most deprived areas and 26% from minoritised ethnic groups. Using semi-structured interviews and thematic analysis, we identified barriers and facilitators to seeking online sexual health information and support. A Behaviour Change Wheel (BCW) analysis then identified recommendations to better meet the needs of underserved populations.
    RESULTS: We found diverse barriers and facilitators. Barriers included low awareness of and familiarity with online information and support; perceptions that online information and support were unlikely to meet the needs of underserved populations; overwhelming volume of information sources; lack of personal relevancy; chatbots/automated responses; and response wait times. Facilitators included clarity about credibility and quality; inclusive content; and in-person assistance. Recommendations included: Education and Persuasion e.g., online and offline promotion and endorsement by healthcare professionals and peers; Training and Modelling e.g., accessible training to enhance searching skills and credibility appraisal; and Environmental Restructuring and Enablement e.g., modifications to ensure online information and support are simple and easy to use, including video/audio options for content.
    CONCLUSIONS: Given that access to many sexual health services is now digital, our analyses produced recommendations pivotal to increasing access to wider sexual healthcare among underserved populations. Implementing these recommendations could reduce inequalities associated with accessing and using online sexual health service.
    DOI:  https://doi.org/10.1371/journal.pone.0315049