bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–11–23
28 papers selected by
Thomas Krichel, Open Library Society



  1. Intern Med J. 2025 Nov 17.
      The discovery of pathogenic bacteria in the late 19th century led librarians to ask whether their books might spread disease. Microbiological investigations over the following decades found that certain bacteria could be isolated at low levels from books used by persons carrying infections. UK public health legislation prohibited the return of library books from households where a case of notifiable disease had dwelt, and this idea was taken up to varying degrees in Australian states and New Zealand. Libraries enthusiastically promoted themselves as hygienic and many put in place likely ineffective formaldehyde fumigation of returned books. Each new pandemic reignites this concern, but no evidence has been found that library users or staff are at risk of acquiring infections from books.
    Keywords:  books; disease transmission; fumigation; libraries; pandemic response; public health legislation
    DOI:  https://doi.org/10.1111/imj.70270
  2. PLOS Digit Health. 2025 Nov;4(11): e0001090
      The global shift toward digital health communication presents both opportunities and challenges for older adults, whose populations is expanding rapidly. This study explored how older adults and health content producers engage with health information across paper and digital formats, and assessed the potential of hybrid approaches such as augmented paper. Two qualitative studies were conducted in Surrey, UK: focus groups with older adults (n = 9) and interviews with public health professionals (n = 6). Data were analysed through content and thematic analysis to identify user requirements. Findings show that older adults continue to value printed materials for familiarity and reliability, but turn to digital formats for timeliness and convenience. Trust in online content, ease of use, and device compatibility emerged as central concerns shaping engagement. Content producers echoed these challenges, highlighting cost constraints and the need for accessible, multi-format materials. Both stakeholder groups favoured app-free connections between print and digital content, with QR codes preferred for their simplicity, familiarity, and avoidance of app installation. Participants also emphasised the importance of multimodal presentation (e.g., text, video, audio) and options to self-print key materials. While based on a small, UK-specific sample, the study highlights design implications for inclusive health communication. Hybrid solutions that combine print with carefully curated digital resources can reduce barriers linked to trust and usability, and extend access for older adults with varied levels of digital confidence. These insights provide actionable guidance for public health organisations and policymakers seeking to balance cost-effectiveness with accessibility. Broader testing in more diverse populations is recommended to refine these strategies and ensure equitable health communication worldwide. These findings underline the importance of designing hybrid health communication strategies that are not only user-friendly but also equitable, supporting the goals of the WHO Decade of Healthy Ageing by promoting inclusive access to reliable health information for older adults worldwide.
    DOI:  https://doi.org/10.1371/journal.pdig.0001090
  3. JMIR Med Educ. 2025 Nov 20. 11 e80084
       Background: Video-sharing sites such as YouTube (Google) and TikTok (ByteDance) have become indispensable resources for learners and educators. The recent growth in generative artificial intelligence (AI) tools, however, has resulted in low-quality, AI-generated material (commonly called "slop") cluttering these platforms and competing with authoritative educational materials. The extent to which slop has polluted science education video content is unknown, as are the specific hazards to learning from purportedly educational videos made by AI without the use of human discretion.
    Objective: This study aimed to advance a formal definition of slop (based on the recent theoretical construct of "careless speech"), to identify its qualitative characteristics that may be problematic for learners, and to gauge its prevalence among preclinical biomedical science (medical biochemistry and cell biology) videos on YouTube and TikTok. We also examined whether any quantitative features of video metadata correlate with the presence of slop.
    Methods: An automated search of publicly available YouTube and TikTok videos related to 10 search terms was conducted in February and March 2025. After exclusion of duplicates, off-topic, and non-English results, videos were screened, and those suggestive of AI were flagged. The flagged videos were subject to a 2-stage qualitative content analysis to identify and code problematic features before an assignment of "slop" was made. Quantitative viewership data on all videos in the study were scraped using automated tools and compared between slop videos and the overall population.
    Results: We define "slop" according to the degree of human care in production. Of 1082 videos screened (814 YouTube, 268 TikTok), 57 (5.3%) were deemed probably AI-generated and low-quality. From qualitative analysis of these and 6 additional AI-generated videos, we identified 16 codes for problematic aspects of the videos as related to their format or contents. These codes were then mapped to the 7 characteristics of careless speech identified earlier. Analysis of view, like, and comment rates revealed no significant difference between slop videos and the overall population.
    Conclusions: We find slop to be not especially prevalent on YouTube and TikTok at this time. These videos have comparable viewership statistics to the overall population, although the small dataset suggests this finding should be interpreted with caution. From the slop videos that were identified, several features inconsistent with best practices in multimedia instruction were defined. Our findings should inform learners seeking to avoid low-quality material on video-sharing sites and suggest pitfalls for instructors to avoid when making high-quality educational materials with generative AI.
    Keywords:  TikTok; YouTube; artificial intelligence; basic medical sciences education; biochemistry education; careless speech; cell biology education; generative AI; medical biochemistry; medical education; slop
    DOI:  https://doi.org/10.2196/80084
  4. PLoS One. 2025 ;20(11): e0330394
       RESULTS: Participants (n = 287; 31.1% male; mean age 41.6 years, SD = 13.5) reported accessing three main forms of communication: radio (84.7%), the internet (80.5%), and cable television (71.8%). Some participants (19.5%) reported having no internet access at home. Participants' main sources of COVID-19 information were websites (35.3%), social media (26.5%), and the news delivered via television or newspaper (48.4%). Only 11.0% of participants acquired information from healthcare workers. Some participants (9.9%) preferred to receive information in the traditional languages of the Dene (Dene Ked'e, Tłıchǫ) and Anishinaabe. Ensuring communication access by providing adequate internet access in all communities and producing information sources in preferred languages should be a priority. These results can inform future public health policy for Northwest Territories.
    DOI:  https://doi.org/10.1371/journal.pone.0330394
  5. Med Ref Serv Q. 2025 Nov 19. 1-9
      Health sciences librarians are increasingly assuming the role of instructional partners, with a core responsibility to teach evidence-based practice (EBP) skills that go beyond basic information retrieval. Like information literacy, EBP skills are built on critical thinking. This column explains how health science librarians can use thinking routines to enhance critical thinking abilities. Thinking routines are simple, adaptable strategies created by Harvard University's Project Zero as part of the Visible Thinking initiative. These routines provide structured yet flexible methods to support essential EBP, information literacy, and critical thinking skills.
    Keywords:  Library instruction; thinking routines; visible thinking
    DOI:  https://doi.org/10.1080/02763869.2025.2588225
  6. Indian J Orthop. 2025 Nov;59(11): 1880-1886
       Aim: This study aimed to assess the reliability of orthopaedic information provided by ChatGPT in response to common patient inquiries and concerns related to Total Joint Replacement surgery, focusing on preventing the dissemination of potentially harmful medical advice.
    Method: This qualitative exploratory case study was conducted at a tertiary care centre hospital. Ten common questions patients pose to orthopaedic surgeons when considering knee arthroplasty were formulated and presented independently to an Orthopaedic Consultant, Associate Consultant, a Fellow in Joint Replacement surgery, and ChatGPT. For review, the answers were submitted to a panel of three orthopaedic surgeons specializing in arthroplasty. They were scored based on accuracy, relevance, and usefulness, with a maximum possible score of 100.0 points, allowing for 0.5-point increments. They were also asked to identify which answers were from chat GPT, not humans.
    Results: ChatGPT exhibited the highest total aggregate score of 232.0 points out of a maximum of 300.0 points, surpassing the scores of the human participants (Participant 1: 197.0 points; Participant 2: 227.5 points; Participant 3: 220.5 points). Furthermore, two out of three panel specialists rated ChatGPT the highest. When comparing the average scores for ChatGPT and the human participants for each question, ChatGPT outperformed the human participants in 8 out of 10 questions. Out of the 120 encounter instances, the evaluator could only point correctly that the response was from ChatGPT response 14 times (11.66%).
    Conclusion: This study highlights the utility and limitations of ChatGPT in the medical field-ChatGPT exhibits great potential in assisting doctors and surgeons in patient care by providing accurate and relevant information. The study also demonstrated that the answers seemed indistinguishable from humans in most cases. In the current landscape of ChatGPT and other AI technologies, their integration in the medical field should be viewed as complementary to human expertise, which must be leveraged for the greater good.
    Keywords:  ChatGPT; Knee; Knee arthroplasty; Large Language Model (LLM); Turing test
    DOI:  https://doi.org/10.1007/s43465-025-01474-7
  7. Healthc Inform Res. 2025 Oct;31(4): 405-415
       OBJECTIVES: The integration of digital technology has greatly expanded public access to health and drug-related information through the Internet. However, the rapid proliferation of unverified content on websites targeting the general population raises serious concerns about health misinformation. This study aimed to evaluate the quality of Indonesian drug information websites accessible to the public and to design a verified, web-based drug information system.
    METHODS: A cross-sectional evaluation was conducted using the Quality Evaluation Scoring Tool (QUEST) to assess the quality of publicly available drug information websites in Indonesia. Development of the verified drug information platform followed the Rapid Application Development model, employing a prototyping approach.
    RESULTS: Among the 14 publicly accessible drug information websites evaluated, 5 (35.71%) were classified as low quality (QUEST score ≤9), 4 (21.42%) as moderate quality (score 10-18), and 5 (35.71%) as high quality (score >18). The drug information website developed by the Faculty of Pharmacy, Universitas Andalas, achieved a high-quality rating, with a QUEST score of 27 (96.43%), although it received the lowest subscore in the Complementarity domain. Higher QUEST scores indicate better information quality.
    CONCLUSIONS: The findings show that nearly half of the websites providing drug information to the Indonesian public are of low quality. The website developed by the Faculty of Pharmacy, Universitas Andalas, demonstrated strong overall quality, but improvements in the Complementarity domain are recommended to further strengthen user engagement and support.
    Keywords:  Consumer Health Information; Drug Information Services; Information Quality Websites; Internet
    DOI:  https://doi.org/10.4258/hir.2025.31.4.405
  8. J Med Internet Res. 2025 Nov 18.
       BACKGROUND: Precision health promotion, which aims to tailor health messages to individual needs, is hampered by the lack of structured metadata in vast digital health resource libraries. This bottleneck prevents scalable, personalized content delivery and exacerbates information overload for the public.
    OBJECTIVE: This study aimed to develop, deploy, and validate an automated tagging system using a large language model (LLM) to create the foundational metadata infrastructure required for tailored health communication at scale.
    METHODS: We developed a comprehensive, three-tier health promotion taxonomy (10 primary, 34 secondary, 90,562 tertiary tags) using a hybrid Delphi and corpus-mining methodology. We then constructed a hybrid inference pipeline by fine-tuning a Baichuan2-7B LLM with Low-Rank Adaptation (LoRA) for initial tag generation. This was then refined by a domain-specific named entity recognition (NER) model and standardized against a vector database. The system's performance was evaluated against manual annotations from non-expert staff on a test set of 1000 resources. We used a "no gold standard" framework, comparing the AI-Human (A-H) inter-rater reliability (IRR) with a supplemental Human-Human (H-H) IRR baseline and expert adjudication for cases where the AI provided additional tags ("AI Additive").
    RESULTS: The AI-Human (A-H) agreement was moderate (Cohan's kappa = 0.544, (95% CI 0.528 to 0.560; Jaccard Similarity = 0.482, 95% CI 0.461 to 0.503). Critically, this was higher than the baseline non-expert Human-Human (H-H) agreement (Cohen's kappa = 0.323, 95% CI 0.294 to 0.352; Jaccard Similarity = 0.347, 95% CI 0.266 to 0.428). A granular analysis of disagreements revealed that in 15.9% (159/1000) of the cases, the "AI Additive" tags, not identified by human annotators. Expert adjudication of these cases confirmed that the "AI Additive" tags were correct and relevant with a precision of 90.0% (45/50, 95% CI 78.2% to 96.7%).
    CONCLUSIONS: A fine-tuned LLM, integrated into a hybrid pipeline, can function as a powerful augmentation tool for health content annotation. The system's consistency (A-H κ=0.544) was found to be superior to the baseline human workflow (H-H κ=0.323). By moving beyond simple automation to reliably identify relevant health topics missed by manual annotators with high, expert-validated accuracy, this study provides a robust technical and methodological blueprint for implementing AI to enhance precision health communication in public health settings.
    CLINICALTRIAL:
    DOI:  https://doi.org/10.2196/83219
  9. Knee. 2025 Nov 17. pii: S0968-0160(25)00282-0. [Epub ahead of print]
       BACKGROUND: Large language models (LLMs) are increasingly used in the medical sector, raising questions about their reliability for patient education. With more LLMs becoming publicly available, it remains unclear whether meaningful performance differences exist between them. This is particularly relevant for anterior cruciate ligament (ACL) injuries, which mainly affect young, active individuals, those most likely to seek health advice from AI. This study aimed to evaluate and directly compare the accuracy of five leading LLMs in answering common patient questions about ACL tears.
    METHODS: Fourteen commonly asked patient questions were identified in a systematic online search. Each question was submitted to five LLMs: ChatGPT-4, Gemini 2.0, Llama 3.1, DeepSeek-V3, and Grok3. Responses were assessed for accuracy by orthopedic consultants using a five-point Likert scale. Word count was recorded as a proxy for readability. Statistical analysis included ANOVA by Tukey's HSD post hoc test.
    RESULTS: All models achieved mean accuracy scores ≥3 (mostly accurate). DeepSeek (3.61) and Grok (3.59) demonstrated significantly higher mean accuracies than Llama (3.25; P < 0.05). ChatGPT and Gemini achieved mean scores of 3.48 and 3.52, respectively. Models generating longer responses, such as Grok and DeepSeek, tended to offer greater accuracy, whereas Llama produced the shortest and least accurate answers.
    CONCLUSIONS: All tested LLMs show promise for patient education regarding ACL injuries, but notable performance differences exist. Model choice is therefore critical. While all responses were evaluated by clinical experts, the lack of guideline-based validation highlights the need for further studies assessing both accuracy and patient comprehension.
    Keywords:  ACL injury; Artificial intelligence; Large language models; Orthopedics; Patient education
    DOI:  https://doi.org/10.1016/j.knee.2025.11.002
  10. BMC Infect Dis. 2025 Nov 20. 25(1): 1624
      
    Keywords:  DISCERN; HIV; Information quality; Inter-rater reliability (ICC); Large language models (LLMs); Readability
    DOI:  https://doi.org/10.1186/s12879-025-11621-y
  11. J Hum Nutr Diet. 2025 Dec;38(6): e70162
       OBJECTIVES: Large Language Models (LLM) like ChatGPT and Gemini have potential in nutrition applications, but recent studies suggest they provide inaccurate dietary advice. The aim of this study was to evaluate the most commonly used LLMs, ChatGPT and Gemini, for dietary recommendations for patients with irritable bowel syndrome (IBS).
    METHODS: Various tools were used to assess the responses of LLMs in this study. The Guideline Compliance Score was created using IBS guidelines. The quality of the responses provided by LLMs was assessed using The Global Quality Score (GQS) and Completeness, Lack of Misinformation, Evidence, Appropriateness, Relevance (CLEAR) tool. Understandability and actionability were assessed using the Patient Education Materials Assessment Tool (PEMAT). The readability of ChatGPT and Gemini's responses was evaluated using Flesch Reading Ease (FRE) and Flesch Kincaid Grade Level (FKGL).
    RESULTS: This study found that most responses from ChatGPT (70%) and Gemini (57.5%) were compliant with the guidelines, but there was no significant difference in guideline compliance, quality, understandability, actionability, or readability scores (p > 0.05). The CLEAR tool showed a moderate positive correlation with PEMAT actionability (r = 0.467, p = 0.038) and understandability (r = 0.568, p = 0.009), a strong positive correlation with GQS (r = 0.611, p = 0.004). In addition, FRE and FKGL had a strong negative correlation (r = -0.784, p < 0.001), while the Guideline Compliance Score showed a moderate negative correlation with FRE (r = -0.537, p = 0.015).
    CONCLUSIONS: The study emphasizes the need for further model improvements before relying solely on LLMs in clinical nutrition practice, emphasizing the importance of dietitians' recommendations and the collaboration between AI models and healthcare teams.
    Keywords:  ChatGPT; Gemini; artificial intelligence; dietary recommendations; irritable bowel syndrome; large language models; patient education
    DOI:  https://doi.org/10.1111/jhn.70162
  12. J Cancer Surviv. 2025 Nov 19.
    Turkish Urooncology Association, Bladder Cancer Working Group
       INTRODUCTION: Artificial intelligence (AI) is quickly transforming healthcare by improving patient and clinician access to and understanding of medical information. Generative AI models answer healthcare queries and provide tailored and quick responses. This research evaluates the readability and quality of bladder cancer (BC) patient information in 10 popular AI-enabled chatbots.
    MATERIALS AND METHODS: We used the latest versions of ten popular chatbots: OpenAI's GPT-4o, Microsoft's Copilot Pro, Claude-3.5 Haiku, Sonar Large, Grok 2, Gemini Advanced 1.5 Pro, Mistral Large, Google Palm 2 (Google Bard), Meta's Llama 3.3, and Meta AI v2. Prompts were developed to provide texts about BC, non-muscle-invasive BC, muscle-invasive BC, and metastatic BC. The modified Ensuring Quality Information for Patients (mEQIP), the Quality Evaluating Scoring Tool (QUEST), and DISCERN were used to assess quality. The Average Reading Level Consensus (ARLC), Flesch Reading Ease (FKRE), and Flesch-Kincaid Grade Level (FKGL) were used to evaluate readability.
    RESULTS: Ten chatbots exhibited statistically significant differences in mean mEQIP, DISCERN, and QUEST scores (p = 0.048, p = 0.025, and p = 0.021, respectively). Meta scored lowest on the average mEQIP, DISCERN, and QUEST, while Llama attained the highest. Statistically significant differences were also seen in the chatbots' average ARLC, FKGL, and FKRE scores (p = 0.002, p = 0.001, and p = 0.002, respectively), in which Google Palm produced texts that are easiest to read, and Llama is the most difficult chatbot to understand.
    CONCLUSION: AI chatbots can produce information on BC that is of moderate quality and readability, while there is significant variability among platforms. Results should be evaluated with caution due to the single-query approach and the continuously advancing AI models. Clinicians can support safety in implementation by delivering structured feedback and incorporating content review stages into patient education processes. Continuous collaboration between healthcare practitioners and AI developers is crucial to maintain the accuracy, currency, and clarity of AI-generated content.
    Keywords:  Artificial intelligence; Bladder cancer; Chatbot; Claude; Copilot; GPT-4o; Gemini; Google Palm; Grok; Llama; Meta AI; Mistral; Sonar
    DOI:  https://doi.org/10.1007/s11764-025-01921-2
  13. BMC Oral Health. 2025 Nov 21. 25(1): 1812
       BACKGROUND: Artificial intelligence (AI)-based chatbots are increasingly used by parents as convenient and fast-access sources of information on health-related topics. This study aimed to assess the readability, accuracy and overall quality of responses provided by ChatGPT-4o, Google Gemini and Microsoft Copilot to questions concerning deleterious oral habits in children.
    METHODS: A total of 43 questions, derived from real-life discussions on the Reddit platform, were revised for clarity and demographic diversity. These were classified into seven categories based on specific types of deleterious oral habits, including thumb sucking, bruxism, pacifier use, bruxism, tongue thrusting, lip sucking, nail biting, and mouth breathing. Responses from each AI chatbot were evaluated using multiple evaluation tools including Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), the modified DISCERN tool (mDISCERN), Global Quality Score (GQS), and misinformation scoring system. Statistical analyses were performed using the Kruskal-Wallis test followed by Dunn's post hoc test for non-normally distributed variables, and one-way ANOVA with Tukey's post hoc test for normally distributed variables (p < .05).
    RESULTS: ChatGPT-4o generated responses with significantly lower readability and higher textual complexity compared to Gemini and Copilot, as reflected by its lower FRE (p = .0022) and higher FKGL (p = .0062) scores. ChatGPT-4o had 76.74% of its responses rated as excellent quality (GQS score of 5), compared to 44.19% for Gemini and 30.23% for Copilot. In terms of accuracy, ChatGPT-4o provided correct information for 93% of the questions (misinformation scores of 4 or 5), while Gemini and Copilot achieved 88.34% and 81.4%, respectively. Google Gemini achieved the highest mDISCERN score (34.1) due to better source referencing.
    CONCLUSION: AI chatbots may serve as supplementary tools for parental education on oral health, yet their performance varies by platform. ChatGPT-4o excelled in accuracy and structure, Gemini in transparency, and Copilot in simplicity. However, these tools should not substitute for professional dental guidance. Enhancing readability and source referencing remains essential for improving the reliability of AI-generated health information.
    Keywords:  Artificial intelligence; Chatbots; Deleterious oral habits
    DOI:  https://doi.org/10.1186/s12903-025-07298-z
  14. Community Dent Health. 2025 Nov 19. 265539X251379259
       OBJECTIVE: This study evaluated the readability level (RL) and textual content quality (TCQ) of Turkish websites providing information about "gingival recession" (GR) to understand their implications for public health, specifically concerning health literacy and access to care. Ensuring online health information is accessible is crucial for promoting informed decision-making and preventive health behaviors.
    BASIC RESEARCH DESIGN: Cross-sectional and assessment of RL and TCQ on Turkish websites.
    SUBJECTS: After excluding ineligible websites, a total of 46 websites about GR obtained from the first 10 search result pages on Google were analyzed.
    MAIN OUTCOME MEASURES: RL was assessed using Ateşman's and Bezirci-Yılmaz's formulas, validated as Turkish formulas. TCQ was scored using six criteria. Analyses compared the mean values and revealed the associations between variables.
    RESULTS: The Bezirci-Yılmaz formula indicated a master's degree RL (17.96 ± 4.95), suggesting high reading difficulty for the public. The Ateşman formula indicated moderate readability (57.08 ± 12.74). The mean TCQ was low (2.89 ± 1.08), indicating insufficient information content. RL were positively associated with word count (p < 0.001 for both) but not with TCQ (p > 0.05 for both).
    CONCLUSION: Turkish websites on GR are significantly harder to read than recommended for public health materials and the average literacy level in Turkiye. This, combined with low content quality, creates a barrier to understanding essential dental health information, potentially worsening health disparities. Addressing this issue is a public health imperative to improve equitable access to crucial dental health information, empower individuals to take proactive steps for their oral health, and advance national oral health objectives.
    Keywords:  content analysis; gingival recession; health literacy; public health; readability; websites
    DOI:  https://doi.org/10.1177/0265539X251379259
  15. J Community Genet. 2025 Nov 21. 17(1): 10
      The impact of rare diseases, like Krabbe disease (KD), collectively affecting millions worldwide, is a public health genetics issue. Because disparities in management and prognosis are often associated with health literacy levels, patient education materials (PEMs) must be accessible to parents who frequent the internet to learn about diagnoses and follow-up. This study aimed to assess accessibility and suitability of online KD resources, using results to provide recommendations for resource improvement.A Google search was conducted utilizing common search terms to identify patient-centered KD resources. Resource content was compared against an author-developed list of essential information for families. Reviewers assessed readability, using Flesch-Kincaid (FK) and Simple Measure of Gobbledygook (SMOG) formulas, and suitability utilizing the Suitability Assessment of Materials (SAM) Tool and the Patient Education Materials Assessment Tool (PEMAT).All resources included a description, symptoms, and genetics of KD. Four resources discussed genetic counseling; two mentioned next steps. Most resources (10/12) had readability scores above the recommended sixth to eighth grade levels for PEMs. The average FK and SMOG scores were 10.6 and 12.5, respectively. Eleven of twelve resources rated 'adequate' or higher using the SAM Tool. PEMAT understandability and actionability scores ranged from 55.1% to 94.1% and 0% to83.3%, respectively, due to lack of graphics and interactivity. No resource met all criteria.Although easy to navigate, resources struggled using clear, common language, utilizing graphics appropriately, promoting interactivity, and presenting concrete next steps. Resource development should focus on implementing post-diagnosis action steps and improving understanding by using common terminology and graphics to promote better care of individuals with KD.
    Keywords:  Accessibility; Health literacy; Krabbe; Readability; Suitability
    DOI:  https://doi.org/10.1007/s12687-025-00845-9
  16. J Community Genet. 2025 Nov 17. 17(1): 5
      This study evaluates the readability and understandability of online resources on biotinidase deficiency, a metabolic disorder included in newborn screening programs. The aim is to determine whether these materials meet health literacy standards. Fifty online documents were initially identified via Google searches using "biotinidase deficiency." After excluding academic articles, duplicates, and inaccessible resources, 21 documents were analyzed. They were categorized as non-profit (hosted on domains such as .org, .gov, or .edu, representing public institutions and academic organizations)(13) or private (hosted on commercial domains like .com, often linked to medical facilities) (8) based on domain extensions. Readability was assessed using Readable.io, providing Flesch Reading Ease scores and Flesch-Kincaid Grade Level. The Patient Education Materials Assessment Tool (PEMAT) was used to evaluate understandability and actionability, with scores averaged by four reviewers. Statistical analyses compared group differences. Private articles showed markedly higher Flesch-Kincaid Grade Level scores, referring to higher reading difficulty, in contrast to non-profit articles (mean ± SD: 13.9 ± 2.2 vs. 10.7 ± 2.0; p = 0.002).There was no statistically significant difference in PEMAT understanding (U) scores between private and non-profit articles (mean ± SD: 52.0 ± 10.5 vs. 42.3 ± 11.4; p = 0.060) or actionability (A) scores (mean ± SD: 29.1 ± 20.0 vs. 13.4 ± 18.0; p = 0.063). Furthermore, articles classified as having lower readability levels (D and E) exhibited markedly reduced actionability scores compared to those with higher readability levels (A to C), indicating a correlation between text complexity and practical use. The recommended health literacy standards for biotinidase deficiency are not met by most online sources. In particular, materials that are difficult to read are less applicable and of limited benefit to parents or caregivers. Given that such readers are expected to take important actions such as conducting screenings or consulting healthcare professionals, the importance of making these materials more appropriate is significant. These findings highlight the importance of patient-centered, clear, and actionable health communication, particularly for conditions identified in newborn screening programmes.
    Keywords:  Biotinidase deficiency; Inherited metabolic disease; Readability; Understandability
    DOI:  https://doi.org/10.1007/s12687-025-00842-y
  17. Dig Dis Sci. 2025 Nov 19.
       PURPOSE: Endoscopic retrograde cholangiopancreatography (ERCP) is a procedure used to diagnose and treat hepato-biliary and pancreatic conditions. Many patients use internet search engines for educational purposes prior to procedures, such as ERCP. The American Medical Association (AMA) recommends patient education materials be written at a 5th-6th grade reading level. There is no prior literature examining the reading level of ERCP education materials. The primary objective of this study is to assess the readability of online patient education materials regarding ERCP.
    METHODS: Using the Google search engine, the top 75 search results for 5 ERCP-related keywords (total 375 websites) were screened. After exclusion, readability was assessed using the Simple Measure of Gobbledygook (SMOG) and three other tools (Flesch-Kincaid, Coleman-Liau and Automated Readability Index). Websites were categorized as academic or non-academic. A word substitution analysis assessed readability after replacing complex words with simpler alternatives.
    RESULTS: A total of 129 websites met inclusion criteria. The mean SMOG reading grade level was 9.43 (n = 129). Academic websites (n = 77) had a statistically significantly lower mean reading level than non-academic websites (n = 52) (SMOG 9.17 vs. 9.83, d = 0.49, p < 0.01). Similar trends were found with the other readability tools. Substituting easier words for complex words resulted in a significantly lower mean reading level (SMOG 8.97, d = 1.52, p < 0.01).
    CONCLUSION: Online ERCP patient education materials are at a 9th-11th grade reading level, exceeding the AMA recommendations. Word substitution analysis significantly decreased the mean reading level, but not to the recommended levels. Medical organizations should prioritize accessible health information to improve patient understanding.
    Keywords:  ERCP; Internet Uses; Patient education; Readability
    DOI:  https://doi.org/10.1007/s10620-025-09571-1
  18. Sci Rep. 2025 Nov 18. 15(1): 40578
      Persistent Idiopathic Facial Pain (PIFP), also known as atypical facial pain, is a poorly understood condition, often leading patients to seek information online. The accuracy, readability, and usability of such information are critical for informed decision-making. This study aimed to assess and compare the readability, quality, and usability of online content about PIFP retrieved from both conventional web searches and generative AI platforms. A cross-sectional comparative analysis was conducted on January 26th 2025 using three search. Google, ChatGPT, and Gemini. Google was searched using the terms "Atypical Facial Pain" and "Persistent Idiopathic Facial Pain," and the first 100 results for each term were screened. ChatGPT and Gemini were prompted to generate the ten most frequently asked questions about PIFP. Selected content was evaluated using the Patient Education Materials Assessment Tool (PEMAT) for understandability and actionability, the Journal of American Medical Association (JAMA) benchmarks for content quality, and two readability measures: Flesch Reading Ease and the Simple Measure of Gobbledygook (SMOG) formulae. A total of 43 websites were included. Only 44.1% of websites met at least one JAMA benchmark, while 55.9% met none. AI-generated content failed to meet any JAMA criteria. However, AI content was significantly more understandable (mean score: 83.31%) than traditional websites (mean: 64.96%, p < 0.001). In contrast, websites had higher actionability scores (25.25%) compared to AI-generated content (11%, p < 0.001). All sources were rated as "difficult" to read. Online information regarding PIFP is often difficult to read and generally lacks both quality indicators and actionable guidance. While AI-generated content is more understandable, it lacks practical advice. These findings underscore the need for improved online patient education materials that are both high in quality and easy to understand, especially for complex pain conditions like PIFP. Health professionals and digital content developers should collaborate to ensure that online resources meet standards for clarity, reliability, and usefulness.
    Keywords:  Chronic pain; Facial pain; Online information; Quality; Readability
    DOI:  https://doi.org/10.1038/s41598-025-24426-5
  19. Stem Cell Reports. 2025 Nov 20. pii: S2213-6711(25)00323-6. [Epub ahead of print] 102719
      This Forum article explores 4,481 YouTube videos about stem cells to map how medical knowledge is shaped online. By analyzing content and user metrics, the article identifies key mediators and influential creators, revealing a complex discourse dominated by celebrity influencers in the promotion and discussion of putative stem cell treatments.
    DOI:  https://doi.org/10.1016/j.stemcr.2025.102719
  20. Anat Sci Educ. 2025 Nov 20.
      YouTube is increasingly used by medical and health science students as a supplementary learning tool. However, the quality and educational value of surface anatomy videos on YouTube remain underexplored. This study aimed to systematically evaluate the quality, reliability, and educational usefulness of YouTube videos focusing on human surface anatomy. A structured YouTube search was conducted (December 2024-January 2025), targeting the seven primary body regions with specific keywords (e.g., "surface anatomy," "bone landmarks," and "dermatomes"). The top 30 videos per search term were selected. Two anatomists independently assessed each video using the Anatomy Content Score (ACS), Global Quality Scale (GQS), modified DISCERN (mDISCERN), and Journal of the American Medical Association (JAMA) benchmarks. Inter-observer agreement was evaluated via Kappa coefficient. Associations between video quality scores and YouTube metrics (view count, like ratio, interaction index) were examined using nonparametric tests. Among 1050 retrieved videos, 85 (8%) met inclusion criteria; 48 (56.5%) were classified as "useful" (ACS ≥ 13, GQS ≥ 4). Longer video duration was significantly (p < 0.001) associated with higher usefulness, whereas view count, like ratio, and interaction index did not correlate with usefulness. ACS strongly correlated with GQS (rs = 0.754) and both correlated moderately with mDISCERN. No significant differences in video quality were observed across body regions, search rankings, presented material type, or upload period (pre- vs. post-COVID-19). YouTube offers a moderate-quality resource for learning surface anatomy, with approximately 60% of evaluated videos deemed useful. Popularity metrics are unreliable indicators of video educational quality, underscoring the need for peer-reviewed, high-quality digital resources.
    Keywords:  YouTube; anatomy instruction; educational videos; medical education; surface anatomy
    DOI:  https://doi.org/10.1002/ase.70160
  21. BMC Urol. 2025 Nov 17. 25(1): 286
       BACKGROUND: Urodynamic testing plays an important role in assessing lower urinary tract function. Due to the increasing use of the internet for medical information, platforms like YouTube have gained popularity as educational tools for medical procedures. However, the quality and reliability of this information are often uncertain. This study aims to evaluate the quality and reliability of YouTube videos as a source of current information and education on urodynamics.
    METHODS: A search for "urodynamics" and "urodynamics test" on YouTube was conducted on March 8, 2024. The first 100 videos were screened, and 29 relevant videos were evaluated by two urologists. Videos were assessed for duration, number of views, likes, comments, and source, and scored for content quality using the British Association of Urological Surgeons (BAUS) criteria, Global Quality Score (GQS), and DISCERN tool.
    RESULTS: Videos were categorized into "useful" and "non-useful" groups based on BAUS scores. While there was no significant difference in video popularity metrics between the groups, GQS and DISCERN scores were significantly higher in the "useful" group (p = 0.003 and p = 0.002, respectively). Videos from healthcare professionals, health information websites, and advertisements showed no significant differences in quality metrics, except for the number of subscribers, with advertisement channels having more subscribers (p = 0.026).
    CONCLUSION: YouTube contains useful information on urodynamics, but the overall quality varies. Efforts should be made to improve the content quality to better serve patients seeking information on this platform.
    Keywords:  Discern; Patient information; Source; Urodynamics; Youtube videos
    DOI:  https://doi.org/10.1186/s12894-025-01983-5
  22. Surg Endosc. 2025 Nov 19.
       BACKGROUND: To qualitatively assess surgical technique in robotic ileocecal resection (ICR) for Crohn's disease (CD) and evaluate the educational quality of procedural videos.
    METHODS: A systematic search was conducted across surgical video platforms (YouTube, Medtube, AIS Channel, WebSurg) using the terms "Robotic ileocecal resection" and "Robotic Crohn's disease." Inclusion criteria were (1) robotic ICR for CD; (2) uploaded between 2005 and 2025; (3) produced by medical professionals; (4) English language. Collected data including surgical indication, port placement, extraction site, anastomotic technique, and extent of mesenteric excision. Video content quality and training value was assessed with the Global Evaluative Assessment of Robotic Skills (GEARS) and LAParoscopic surgery Video Educational GuidelineS (LAP-VEGaS) video assessment tool. A LAP-VEGaS score of ≥ 11 was defined as the educational pass mark.
    RESULTS: Twenty-eight videos were included. Surgical indication was reported in 17 videos (60.7%): stricture (n = 7, 41.2%), fistula (n = 9, 52.9%), both (n = 1, 5.9%). Port positioning and extraction site were shown in 13 (46.4%) and 14 (50.0%) videos, respectively. Mesenteric excision was described in 19 videos (67.9%): extended (n = 5, 26.3%), limited (n = 10, 52.6%) and neither extended nor limited (n = 4, 21.1%). Anastomoses included Kono-S (n = 7, 29.2%), side-to-side (n = 16, 66.7%), side-to-end (n = 1, 4.1%). All the demonstrated anastomoses were performed intracorporeally. LAP-VEGaS scores ranged from 2 to 17 (median = 11), with only 57.14% receiving a score of 11 or higher, whilst GEARS showed novice (n = 7, 25.0%), intermediate (n = 15, 53.6%), expert (n = 6, 21.4%). Video quality was good in 7.1%, moderate in 64.3% and poor in 28.6%. Educational value for surgeons in training was ranked as good in 7.1%, moderate in 25.0% and poor in 67.9%.
    CONCLUSION: Videos of robotic ICR for Crohn's disease can support surgical education, although currently only half meet minimum quality standards. Greater standardisation and adherence to validated frameworks are required to optimise their educational value.
    Keywords:  Crohn’s disease; Educational quality; GEARS score; LAP-VEGaS score; Robotic ileocecal resection; Surgical video assessment
    DOI:  https://doi.org/10.1007/s00464-025-12385-x
  23. Colorectal Dis. 2025 Nov;27(11): e70313
       INTRODUCTION: Exposure to proctology during post-graduate colorectal training is often variable. Videos of proctological procedures can benefit surgical trainees' self-directed learning. The aim of this study is to evaluate the quality of freely available online video material on proctological procedures using a modified colorectal video assessment framework.
    METHODS: PubMed and the YouTube™ platform were searched for the following terms related to proctological procedures for haemorrhoids, anal fissure and fistula. These were assessed (cross-sectional study) for quality using a modified video-assessment checklist that was validated by three colorectal surgeons who regularly perform proctology cases. The resulting 9-item evaluation tool was designed to capture the extent to which videos provide concise and structured information typically required for peer review.
    RESULTS: A total of 98 surgical videos were assessed, comprising 65 from peer-reviewed. Journals and 35 from YouTube™ only. The median total score for peer-reviewed videos was 16.0 (interquartile range [IQR] 13.0-17.0) compared to 10 (IQR 8.0-12.0) for the non-peer-reviewed videos. This difference was statistically significant (Mann-Whitney U = 2024.0, p < 0.001). In particular, journal videos were significantly better at providing more contextual information about the case including presenting symptoms and outcomes.
    CONCLUSION: As might be expected, the quality of YouTube™ videos from the perspective of proctology training was inferior to those released online by peer-reviewed journals. This provides further evidence for the validity of using modified checklists to assess the quality of training materials. Given the findings of this study, trainees should be encouraged to prioritise journal-related over other freely available material for self-directed learning.
    Keywords:  YouTube™; proctology; recorded procedural videos; self‐directed learning
    DOI:  https://doi.org/10.1111/codi.70313
  24. Knee Surg Sports Traumatol Arthrosc. 2025 Nov 16.
       PURPOSE: To evaluate the overall quality and extent of viewership of primary anterior cruciate ligament (ACL) injury prevention content on TikTok.
    METHODS: The social media platform TikTok was queried using ACL injury prevention terms, and the 89 most-viewed English-language videos demonstrating primary preventative ACL injury exercises were included. This sample size was determined a priori via power analysis to detect a moderate effect. Two authors independently extracted video characteristics and engagement metrics and scored content quality using DISCERN, Principles for Health-related Information on Social Media (PRHISM) and ACL Exercise Education Score (ACLEES) scoring systems. Disputes were resolved via author consensus. Interrater reliability was assessed using Cohen's κ. Statistical analyses included linear regression, t-tests and Fisher's exact tests.
    RESULTS: Collectively, the included posts garnered 5,988,018 views, 569,486 likes and 28,385 shares. Median scores were 30 (interquartile range [IQR] 24-38) for DISCERN, 13 (IQR 10-16) for PRHISM and 10 (IQR 6-15) for ACLEES, all indicating poor overall quality. Domain analysis revealed that content was generally accessible and relevant yet lacked citation of evidence and follow-up information. Post length significantly correlated with all three scoring systems. Self-identified healthcare professionals achieved significantly higher PRHISM scores than general users in authorship, authority and financial disclosure domains. No differences were found in engagement metrics or in DISCERN and ACLEES scores between the two groups.
    CONCLUSION: Primary ACL injury prevention content on TikTok is widely viewed, accessible and relevant, but generally of poor quality among both healthcare professionals and general users. Both post author groups inadequately provided condition-specific guidance, treatment rationale and references. TikTok may represent an important avenue for orthopaedic professionals to establish a presence and disseminate high-quality, evidence-based information regarding ACL injury prevention to youth and adolescent athletes in hopes of decreasing injury rates.
    LEVEL OF EVIDENCE: N/A.
    Keywords:  ACL; injury prevention; public health; social media; youth sports
    DOI:  https://doi.org/10.1002/ksa.70128
  25. Front Public Health. 2025 ;13 1670106
       Background: With the overwhelming availability of online health information and high prevalence of health misinformation, it is vital to understand the status and key influencing factors of its use among individuals. This study aims to explore the online health information-seeking behavior and preference of the influencing factors among college students.
    Methods: We used the best-worst scaling approach to determine college students' preferences for factors influencing online health information-seeking behavior. A total of 11 attributes of online health information seeking were confirmed by literature review and focus group, and a balanced incomplete block design was used to create 11 tasks for the BWS survey. An online survey was conducted from March 2023 to May 2023 using the BWS survey questionnaire.
    Results: Both the BWS score and mixed logit model results indicate that "verified by professional institutions or health professionals"(mean BW=1.938; coefficient = 3.096), "information source from trustworthy and authoritative website"(mean BW = 1.921; coefficient = 3.015), "privacy and security guaranteed"(mean BW = 1.234; coefficient = 2.637), and "consistency of information" (mean BW = 0.803; coefficient = 2.313) were the most important factors and were valued more positively than negatively by respondents. The results showed the covariate of medical education had positive effects of 0.410 and 0.279 on the preference of "writing and language" and "professional interface design," while medical education background had negative effects of -0.307 on the preference of "disclosure of author information."
    Conclusion: We recommend that concerned authorities consider interventions targeting the accuracy, credibility, privacy, and consistency of online health information management for college students.
    Keywords:  accuracy; best-worst scaling; college students; credibility; online health information seeking; preference
    DOI:  https://doi.org/10.3389/fpubh.2025.1670106
  26. JMIR Cancer. 2025 Nov 19. 11 e76187
       Background: Clinical trials are important for all stages of the cancer control continuum, including cancer survivorship.
    Objective: The purpose of this study was to evaluate correlates of general clinical trial knowledge among US adult cancer survivors.
    Methods: We conducted a cross-sectional analysis of the National Cancer Institute's 2021 Health Information National Trends Survey. Cancer survivors were recruited from 3 Surveillance, Epidemiology, and End Results registries: Iowa Cancer Registry, Greater Bay Area Cancer Registry, and New Mexico Tumor Registry. Data collection occurred from January 11 to August 20, 2021. Eligible participants had a cancer diagnosis prior to 2018. The primary outcome was self-reported knowledge of clinical trials, assessed by the question: "How would you describe your level of knowledge about clinical trials?" Responses were dichotomized as knowing "a lot" or "a little bit" versus "don't know anything." Independent variables included sociodemographic characteristics, patient-centered communication, health information seeking (including watching health-related videos on YouTube), and confidence in obtaining cancer-related information. We used survey-weighted logistic regression to examine univariable and multivariable associations with clinical trial knowledge. A total of 2 a priori hypotheses were specified: (1) cancer survivors with a higher perceived quality of patient-centered communication would have greater knowledge of clinical trials than those with a lower perceived quality of patient-centered communication and (2) cancer survivors who were "completely confident" in their ability to obtain cancer-related information would have greater knowledge of clinical trials than those less confident. Odds ratios (ORs), 95% CIs, and P values were estimated using SAS (version 9.4; SAS Institute Inc, Cary, NC, USA).
    Results: Among cancer survivors (N=1207) included in the analysis, 269 (22.3%) reported that they did not know anything about clinical trials, while 938 (77.7%) reported knowing "a lot" or "a little." Neither of the 2 a priori hypotheses was supported. In the multivariable weighted logistic regression model, greater knowledge of clinical trials was significantly associated with non-Hispanic White race compared with all other races (OR 2.55, 95% CI 1.59, 4.08; P<.001), having a college degree compared with less than a college degree (OR 3.50, 95% CI 2.25, 5.46; P<.001), seeking cancer information from any source (OR 3.04, 95% CI 2.10-4.40; P<.001) compared with not, and ever watched health-related videos on YouTube (OR 2.71, 95% CI 1.49-4.94; P=.002) compared with never watched. In contrast, female sex assigned at birth was associated with lower odds of clinical trial knowledge compared with male sex assigned at birth (OR 0.57, 95% CI 0.41-0.80; P<.001).
    Conclusions: Sociodemographic characteristics and health-seeking behaviors including watching health-related videos on YouTube were associated with clinical trial knowledge among cancer survivors. These findings highlight opportunities to leverage YouTube as a platform to promote clinical trial awareness and to strengthen survivors' cancer-specific information-seeking skills to improve access to clinical trial information.
    Keywords:  cancer survivors; clinical trials as topic; cross-sectional studies; health literacy; information-seeking behavior; social media; sociodemographic factors
    DOI:  https://doi.org/10.2196/76187
  27. J Med Internet Res. 2025 Nov 18. 27 e80497
       BACKGROUND: Health disparities are closely associated with socioeconomic inequalities. Although this relationship is well recognized in the context of traditional health care access, its influence on online health-seeking behaviors such as posting questions on patient forums and seeking peer responses remains poorly understood, particularly in the context of resource-limited regions. Furthermore, it is unclear what types of questions are most frequently asked online and to what extent these questions receive helpful responses.
    OBJECTIVE: This study aims to examine how socioeconomic status influences online health-seeking behavior by analyzing regional disparities in forum participation and their correlation with economic development. In addition, it aims to identify unmet informational needs among patients with lymphoma through large language model (LLM)-based forum thread classification and expert evaluation of forum responses by using data from the largest online blood cancer forum in China.
    METHODS: We analyzed over 110,000 patient-initiated forum threads posted between 2012 and 2023, covering all the provinces of mainland China. Regional trends in forum participation rates were examined and correlated with economic development, as measured by gross regional product per capita. Second, an LLM was used to classify the threads into 6 predefined topics based on their semantic content, thereby providing an overview of the topics that users cared about. Additionally, an expert manual review was conducted based on relevance, accuracy, and comprehensiveness to assess whether users' questions were adequately addressed within the forum discussions.
    RESULTS: Regional forum participation rates were significantly associated with levels of regional economic development (Wilcoxon rank-sum test; P<.001), with the highest participation rates in the East Coast regions. Participation rates in less-developed regions steadily increased, reflecting the growing public demand for accessible health information. LLM-based analysis revealed that most discussions centered on medical concerns such as interpreting reports and selecting treatment plans across all regions. However, only 37% (117/316) of the user questions received useful responses, underscoring persistent gaps in access to reliable information.
    CONCLUSIONS: To our knowledge, this study represents the most comprehensive real-world investigation to date of spontaneous online forum participation and information needs among patients with cancer. Our findings highlight the necessity for government and health care providers to implement initiatives such as artificial intelligence-driven information platforms and region-specific health education campaigns to bridge information gaps, reduce regional disparities, and improve patient outcomes across China.
    Keywords:  AI; artificial intelligence; digital health; health-seeking behavior; large language model; online patient forum; regional inequities; socioeconomic factors
    DOI:  https://doi.org/10.2196/80497