bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–10–19
37 papers selected by
Thomas Krichel, Open Library Society



  1. BMJ Evid Based Med. 2025 Oct 17. pii: bmjebm-2024-113613. [Epub ahead of print]
       OBJECTIVES: First, investigate whether a long compared with a short abstract decreases readers' attention. Second, investigate differences regarding perceptions of informativeness, accuracy, attractiveness and conciseness.
    DESIGN: Two-arm, single-blinded, parallel-group, superiority randomised controlled trial with 1:1 allocation.
    SETTING/PARTICIPANTS: Researchers worldwide who indexed any type of systematic review in PubMed with an English abstract between 1 January 2024 and 26 March 2024.
    INTERVENTIONS: Researchers were randomly assigned to two groups. Both groups received the same cover letter by email with a link to our survey, which was assigned to either the short (277 words) or long abstract (771 words) of the same systematic review published in two different journals.
    MAIN OUTCOME MEASURES: Primary outcome was the proportion of trial participation after reading the abstract, indicating readers' attention. Secondary outcomes were researchers' perceptions of four indicators of a well-written abstract (informativeness, accuracy, attractiveness, conciseness), and general abstract characteristics.
    RESULTS: A total of 5397 authors were randomly assigned to the short (n=2691) or long abstract (n=2706). Trial participation did not differ between groups (37.8% vs 35.0%; p=0.1935). While the short abstract was considered more attractive (60.5% vs 46.6%; p=0.0034) and concise (82.3% vs 37.9%; p<0.0001), the length had no impact on its informativeness (85.5% vs 91.2%; p=0.0594) and accuracy (80.2% vs 86.3%; p=0.0868). Regarding general abstract characteristics, 76.0% preferred a maximum length of 250-300 words, nearly all a structured format and about half supported reporting funding and registration information.
    CONCLUSIONS: Abstract length had no impact on readers' attention, but short abstracts were considered more attractive and concise. Guidelines like PRISMA-A should recommend a range of 250-300 words for abstracts, allowing authors to include key information while prioritising clarity and precision. With authors considering information on funding and registration as important, journals should update their author guidelines to include these by default.
    TRIAL REGISTRATION NUMBER: NCT06525805.FundingNone.
    Keywords:  Evidence-Based Practice; Methods; Systematic Reviews as Topic
    DOI:  https://doi.org/10.1136/bmjebm-2024-113613
  2. Health Informatics J. 2025 Oct-Dec;31(4):31(4): 14604582251388860
      This study aims to combat health misinformation by enhancing the retrieval of credible health information using effective fusion-based techniques. It focuses on clustering-based subset selection to improve data fusion performance. Five clustering methods - two K-means variants, Agglomerative Hierarchical (AH) clustering, BIRCH, and Chameleon - are evaluated for selecting optimal subsets of information retrieval systems. Experiments are conducted on two health-related datasets from the TREC challenge. The selected subsets are used in data fusion to boost retrieval quality and credibility. AH and BIRCH outperform other methods in identifying effective IR subsets. Using AH-based fusion of up to 20 systems results in a 60% gain in MAP and over a 30% increase in NDCG_UCC, a credibility-focused metric, compared to the best single system. Clustering-based fusion strategies significantly enhance the retrieval of trustworthy health content, helping to reduce misinformation. These findings support incorporating advanced data fusion into health information retrieval systems to improve access to reliable information. The source code of this research is publicly available at https://github.com/Gary752752/DataFusion.
    Keywords:  clustering; credible information retrieval; data fusion; health misinformation; subset selection
    DOI:  https://doi.org/10.1177/14604582251388860
  3. iScience. 2025 Oct 17. 28(10): 113559
      Systematic reviews require substantial time and effort. This study compared the results of conducting reviews by human reviewers with those conducted with Artificial Intelligence (AI). We identified 11 AI tools that could assist in conducting a systematic review. None of the AI tools could retrieve all articles that were detected with a manual search strategy. We identified tools for deduplication but did not evaluate them. AI screening tools assist the human reviewer in presenting the most relevant article on top, which could reduce the number of articles that need to be screened on title and abstract, and on full text. There was a poor inter-rater reliability to evaluate the risk of bias between AI tools and the human reviewers. A summary table created by AI tools differs substantially from manually constructed summary tables. This study highlights the potential of AI tools to support systematic reviews, particularly during screening phases, but not to replace human reviews.
    Keywords:  Artificial intelligence; Medical research
    DOI:  https://doi.org/10.1016/j.isci.2025.113559
  4. Healthcare (Basel). 2025 Oct 08. pii: 2537. [Epub ahead of print]13(19):
      Background/Objectives: The internet has unquestionably altered how people acquire health information. Instead of consulting with a medical professional, billions of pages of information can be accessed by anyone with a smartphone. Women's health issues have been historically and culturally taboo in many cultures globally; therefore, internet searches may be particularly useful when researching these topics. Methods: As an exercise in scientific information evaluation, we chose 12 non-cancer topics specific to women's health and developed a scoring metric based on quantifiable webpage attributes to answer: What topics generate the highest and lowest scores? Does the quality of information (mean score) vary across topics? Does the variation (score deviation) differ among topics? Data were collected following systematic searches after filtering with advanced features of Google and analyzed in a Bayesian framework. Results: The mean score per topic was significantly correlated with the number of sources cited within an article. There were significant differences in the quality scores across topics; "pregnancy" and "sleep" scored the highest and had more sources cited per page than all other topics. The greatest variation in scores were for "cortisol" and "weight". Conclusions: A purposeful, systematic internet search of 12 critical women's health topics suggests that scrutiny is necessary when this information is obtained by a typical internet user. Future work should include review by medical professionals based on their interaction with patients who self-report what they know or think about a condition they present and respect, while educating, patients' own internet searching.
    Keywords:  Google; internet; self-diagnosis; systematic search; women’s health
    DOI:  https://doi.org/10.3390/healthcare13192537
  5. Front Public Health. 2025 ;13 1627916
      The credibility of health-related information sources may influence parental decisions regarding childhood vaccinations. This study examined whether the type of health information source used by parents (verified vs. unverified) was associated with their child's vaccination outcomes. A national sample of 887 parents (55.1% male; mean age = 36.42 years, standard deviation = 12.29) completed an anonymous online survey. Participants reported demographic characteristics, primary health information sources, and whether their child received 12 vaccine recommendations by the Center of Disease Control and Prevention (CDC). Health information sources were categorized as primarily from verified (healthcare providers, scientific journals) or unverified (family/friends, social media, opinion blogs). Compared to those using unverified sources, verified source users had significantly higher odds of vaccinating their child across most vaccines, including DTaP, Hib, Hepatitis A, Hepatitis B, influenza, MMR, MCV4, pneumonia, IPV, and RV (all ps < 0.01). Verified source users also reported significantly higher overall vaccination rates (p < 0.001). These associations remained significant after adjusting for key demographic covariates (e.g., age, household size, number of children). No significant differences were found for the chickenpox or COVID-19 vaccines. Results underscore the importance of health information credibility in promoting vaccine adherence and suggest that targeted efforts to improve access to verified health information may help address childhood vaccination gaps.
    Keywords:  parental vaccine concerns; pediatric immunization; unverified sources; vaccine misinformation; verified health information
    DOI:  https://doi.org/10.3389/fpubh.2025.1627916
  6. BMJ Health Care Inform. 2025 Oct 15. pii: e101632. [Epub ahead of print]32(1):
       OBJECTIVES: Antimicrobial resistance is a critical public health threat. Large language models (LLMs) show great capability for providing health information. This study evaluates the effectiveness of LLMs in providing information on antibiotic use and infection management.
    METHODS: Using a mixed-method approach, responses to healthcare expert-designed scenarios from ChatGPT 3.5, ChatGPT 4.0, Claude 2.0 and Gemini 1.0, in both Italian and English, were analysed. Computational text analysis assessed readability, lexical diversity and sentiment, while content quality was assessed by three experts via DISCERN tool.
    RESULTS: 16 scenarios were developed. A total of 101 outputs and 5454 Likert-scale (1-5) scores were obtained for the analysis. A general positive performance gradient was found from ChatGPT 3.5 and 4.0 to Claude to Gemini. Gemini, although producing only five outputs before self-inhibition, consistently outperformed the other models across almost all metrics, producing more detailed, accessible, varied content and a positive overtone. ChatGPT 4.0 demonstrated the highest lexical diversity. A difference in performance by language was observed. All models showed a median score of 1 (IQR=2) regarding the domain addressing antimicrobial resistance.
    DISCUSSION: The study highlights a positive performance gradient towards Gemini, which showed superior content quality, accessibility and contextual awareness, although acknowledging its smaller dataset. Generating appropriate content to address antimicrobial resistance proved challenging.
    CONCLUSIONS: LLMs offer great promise to provide appropriate medical information. However, they should play a supporting role rather than representing a replacement option for medical professionals, confirming the need for expert oversight and improved artificial intelligence design.
    Keywords:  Access to Information; Artificial intelligence; Infectious Disease Medicine; Large Language Models; Public Health
    DOI:  https://doi.org/10.1136/bmjhci-2025-101632
  7. PLoS One. 2025 ;20(10): e0334250
       BACKGROUND: The use of artificial intelligence for creating medical information is on the rise. Nonetheless, the accuracy and reliability of such information require thorough assessment. As a language model capable of generating text, ChatGPT needs a detailed examination of its effectiveness in the healthcare domain.
    OBJECTIVE: This research sought to evaluate the precision of medical data produced by ChatGPT-4o (https://chat.openai.com/chat, accessed Mar. 12, 2025), concentrating on its capability to handle the top five ophthalmic issues that pose the greatest global health challenges. Furthermore, the investigation compared the AI's answers to recognized medical guides.
    METHODS: This research employed an adapted version of the Ensuring Quality of Information for Patients (EQIP) instrument to evaluate the quality of ChatGPT's replies. The guidelines for the five conditions were rephrased into pertinent queries. These queries were then fed into ChatGPT, employing benchmarking against established ophthalmology clinical guidelines, and the resulting answers were independently scrutinized for precision and consistency by two investigators. The consistency among raters was evaluated using Cohen's kappa value.
    RESULTS: The median EQIP score across the five conditions was 18 (IQR 18-19). The modified EQIP instrument revealed a robust consensus between the two evaluators when assessing ChatGPT's responses, as indicated by a Cohen's kappa value of 0.926 (95% CI 0.875-0.977, P<0.001). The alignment between the ChatGPT responses and the guideline recommendations was 84% (21/25), as indicated by a Cohen's kappa value of 0.658 (95% CI 0.317-0.999, P<0.001).
    CONCLUSIONS: ChatGPT demonstrates robust quality and guideline compliance in producing medical content. Nevertheless, improvements are necessary to enhance the accuracy of quantitative data and ensure a more comprehensive coverage, thereby offering valuable insights for the advancement of medical information generation.
    DOI:  https://doi.org/10.1371/journal.pone.0334250
  8. JMIR AI. 2025 Oct 17. 4 e70604
       BACKGROUND: Mpox (monkeypox) outbreaks since 2022 have emphasized the importance of accessible health education materials. However, many Japanese online resources on mpox are difficult to understand, creating barriers for public health communication. Recent advances in artificial intelligence (AI) such as ChatGPT-4o show promise in generating more comprehensible and actionable health education content.
    OBJECTIVE: The aim of this study was to evaluate the comprehensibility, actionability, and readability of Japanese health education materials on mpox compared with texts generated by ChatGPT-4o.
    METHODS: A cross-sectional study was conducted using systematic quantitative content analysis. A total of 119 publicly available Japanese health education materials on mpox were compared with 30 texts generated by ChatGPT-4o. Websites containing videos, social media posts, academic papers, and non-Japanese language content were excluded. For generating ChatGPT-4o texts, we used 3 separate prompts with 3 different keywords. For each keyword, text generation was repeated 10 times, with prompt history deleted each time to prevent previous outputs from influencing subsequent generations and to account for output variability. The Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) was used to assess the understandability and actionability of the generated text, while the Japanese Readability Measurement System (jReadability) was used to evaluate readability. The Journal of the American Medical Association benchmark criteria were applied to evaluate the quality of the materials.
    RESULTS: A total of 119 Japanese mpox-related health education web pages and 30 ChatGPT-4o-generated texts were analyzed. AI-generated texts significantly outperformed web pages in understandability, with 80% (24/30) scoring ≥70% in PEMAT-P (P<.001). Readability scores for AI texts (mean 2.9, SD 0.4) were also higher than those for web pages (mean 2.4, SD 1.0; P=.009). However, web pages included more visual aids and actionable guidance such as practical instructions, which were largely absent in AI-generated content. Government agencies authored 90 (75.6%) out of 119 web pages, but only 31 (26.1%) included proper attribution. Most web pages (117/119, 98.3%) disclosed sponsorship and ownership.
    CONCLUSIONS: AI-generated texts were easier to understand and read than traditional web-based materials. However, web-based texts provided more visual aids and practical guidance. Combining AI-generated texts with traditional web-based materials may enhance the effectiveness of health education materials and improve accessibility to a broader audience. Further research is needed to explore the integration of AI-generated content into public health communication strategies and policies to optimize information delivery during health crises such as the mpox outbreak.
    Keywords:  AI-generated health education; Japan; artificial intelligence; health communication; mpox; patient education; readability
    DOI:  https://doi.org/10.2196/70604
  9. Eur J Pediatr. 2025 Oct 11. 184(11): 676
      ChatGPT-4 is a widely used large language model that provides instant answers to a variety of health-related questions in different medical fields. This study aims to evaluate the reliability, quality, accuracy, and readability of ChatGPT's responses to frequently asked questions regarding physical activity in children with cystic fibrosis (CF).The responses of ChatGPT-4 to 60 frequently asked questions related to physical activity and its effects on the condition of children with CF were categorized into five thematic (S1-S5) groups. These responses were then evaluated for reliability, quality, accuracy, and readability using the modified DISCERN (mDISCERN) tool, the Global Quality Scale (GQS), a five-point Likert scale, and the Flesch Reading Ease Scale (FRE), respectively.The mean scores for mDISCERN, GQS, and accuracy ranged from 3.38 (S2) to 3.82 (S4), 3.91 (S2, S4) to 4.25 (S5), and 4.27 (S1, S4) to 4.78 (S3), with overall means of 3.5, 3.98, and 4.38, respectively. The readability mean scores varied from 29.99 (S5) to 46.31 (S3), with a total mean of 38.07. The ICC values for the mDISCERN, GQS, and accuracy were 0.746, 0.666, and 0.665, respectively.
    CONCLUSION: This study revealed that ChatGPT-4 provides moderate to high levels of reliability, quality and accuracy in responses about physical activity in children with CF. Low FRE scores showed most responses were "difficult" for the target age group. Although ChatGPT-4 serves as a useful supplementary tool for patients with CF, professional supervision and further validation are essential for safe and effective use in clinical contexts.
    WHAT IS KNOWN: • Physical activity benefits children with cystic fibrosis (CF), yet access to reliable, understandable educational materials is limited. AI tools like ChatGPT-4 are increasingly used in health communication, but their reliability, accuracy, and readability remain uncertain.
    WHAT IS NEW: • This study systematically evaluates ChatGPT-4 responses to CF-related physical activity questions, showing moderate-to-high reliability, quality, and accuracy, but low readability, highlighting the need for adaptation for pediatric use.
    Keywords:  Artificial intelligence; ChatGPT4; Counseling; Cystic fibrosis; Physical activity
    DOI:  https://doi.org/10.1007/s00431-025-06488-9
  10. J Clin Nurs. 2025 Oct 13.
       AIMS: To evaluate the artificial intelligence-assisted lymphedema education material in patients undergoing breast cancer surgery.
    DESIGN: A comprehensive, successful methodological design was used to evaluate the portability of the expandable, AI-supported lymphedema education material for breast cancer. The study was reported in accordance with the STROBE statement (see Data S1 for the completed STROBE [Strengthening the Reporting of Observational Studies in Epidemiology] checklist). When preparing the AI-supported lymphedema education material managed with breast cancer treatment, it is first determined in the education of patients. Then, the commands of the Chat GBT-4 program are included in the scope of the transferred education content. For the created education content, readability was first evaluated and expert opinion was taken for the final version of the draft.
    METHODS: While preparing the AI-assisted lymphedema education material in the study, expert opinions were obtained, and the educational needs of the patients were determined by scanning the literature. Then, 12 commands were given in the ChatGPT-4 program to create the educational content. Formulas were used to evaluate the readability of the created educational content in Turkish and the readability of the health literature. The validity of the lymphedema education material was presented to 10 experts. The experts evaluated the understandability and actionability of the educational material using the Patient Education Materials Evaluation Tool and the Global Quality Scale, which evaluates the quality of the educational material.
    RESULTS: It was concluded that the readability index of the lymphedema education material for Turkish was 67.3, and the Turkish readability level was 'easily understandable'. The readability index of health literature was found to be 11.28, 9.68, 10.58, 39.0, and 11.26, respectively. When the internal consistency coefficient between the experts was examined, it was found to be 0.74. It was determined that the Patient Education Materials Evaluation Tool understandability score average was 92.10 ± 9.03, and the actionability score average was 81.60 ± 18.47. The Global Quality Scale score average, which evaluates the suitability and quality of the content of the AI-supported educational material, was found to be 4.10 ± 0.87.
    CONCLUSION: At the end of the study, it was determined that the educational material was reasonable regarding understandability and actionability. The Turkish readability level was also reasonable and easily understandable.
    IMPLICATION FOR THE PROFESSION: This study is one of the proactive attempts to use AI in preparing educational materials for nurses and healthcare professionals.
    PATIENT OR PUBLIC CONTRIBUTION: No patient or public contribution.
    Keywords:  artificial intelligence; chatGPT; educational material; lymphedema; patient education
    DOI:  https://doi.org/10.1111/jocn.70123
  11. Health Informatics J. 2025 Oct-Dec;31(4):31(4): 14604582251388879
      Introduction: Patients increasingly use chatbots to obtain medical information, a trend that has provoked both optimism and pessimism. Numerous studies have evaluated the quality and readability of these outputs. This study synthesizes these findings through a cross-sectional meta-synthesis. Methods: We identified studies that evaluated responses using the DISCERN instrument, designed to assess the quality of written material. Additionally, we only included studies that also evaluated readability. We recorded the chatbot used, DISCERN scores, the number of words in each question, the number of questions asked, the number of DISCERN evaluators, the readability of responses, and the year the study was conducted. We also assessed the influence of each publication's journal ranking using the Journal Citation Indicator. Results: We identified 42 studies that conducted 86 tests. Chatbot response readability decreased as response quality increased. Forty-nine tests produced responses ranked "good" or better, and only 10 scored below college-level readability. We significantly increased readability by adding the phrase "write responses at sixth-grade reading level" to prompts that previously produced post-graduate reading level responses in published studies. Discussion: Variable quality and poor readability of chatbot responses reinforce pessimism about their utility. Nevertheless, appropriate "prompt engineering" provides scope to enhance response quality and readability.
    Keywords:  ChatGPT; DISCERN; chatbots; large language model; patient engagement; prompt engineering; readability
    DOI:  https://doi.org/10.1177/14604582251388879
  12. Arch Orthop Trauma Surg. 2025 Oct 17. 145(1): 477
       INTRODUCTION: The purpose of this study was to identify the most frequent questions a patient might encounter in an internet search about robotic-assisted total knee arthroplasty (RATKA), and to identify and categorize the answers to these questions to assess the suitability of Chat Generative Pre-Trained Transformer (ChatGPT) and Google search engine as an online health information source for patients.
    METHODS: The 20 most frequently asked questions (FAQs) were identified by entering the search term "Robot-Assisted Total Knee Replacement" into both Google Search and ChatGPT-4. For Google, a clean search was performed, and the 20 FAQs were extracted from the "People also ask" section. For ChatGPT-4, a specific prompt was used to generate the 20 most frequently asked questions. All identified questions, along with the corresponding answers and cited references, were systematically recorded. A modified version of the Rothwell system was used to categorize questions into 10 subtopics. Each reference was categorized into the following groups: commercial, academic, medical practice, single surgeon personal, or social media. The questions and sources obtained from ChatGPT and Google were compared using Fisher's exact test.
    RESULTS: The percentage distribution of questions by category between Google and ChatGPT was as follows: indications/management (15% vs. 25%), technical details (35% vs. 30%), evaluation of surgery (0% vs. 0%), risks/complications (5% vs. 5%), restrictions (10% vs. 0%), specific activities (15% vs. 5%), timeline of recovery (10% vs. 20%), pain (0% vs. 5%), longevity (0% vs. 0%), and cost (10% vs. 10%). Answers to questions were more frequently sourced from academic websites in ChatGPT compared to Google (70% vs. 20%; p-value = 0.0025).
    CONCLUSION: ChatGPT offers a promising alternative to traditional search engines for patient education, particularly in the context of preparing for RATKA. Compared to Google, ChatGPT provided significantly fewer references to commercial content and offered responses that were more aligned with academic sources.
    LEVEL OF EVIDENCE: Level IV, Survey study Internet sources.
    Keywords:  Artificial intelligence; ChatGPT; Google; RATKA; Robotic-assisted total knee arthroplasty; Total knee arthroplasty
    DOI:  https://doi.org/10.1007/s00402-025-06085-3
  13. Neurooncol Pract. 2025 Oct;12(5): 901-911
       Background: Patients with primary brain tumors navigate a devastating diagnosis and cognitive and physical decline. Available educational materials should be easily comprehensible, informative, reliable, culturally sensitive, and patient oriented.
    Methods: We assessed websites of major brain tumor centers in the United States and patient organizations for readability using multiple calculators, quality and reliability using DISCERN and JAMA tools, and cultural sensitivity using the Cultural Sensitivity Assessment Tool scale. We determined whether sites addressed practical, emotional, social, and spiritual needs of a patient. Brain tumor centers were categorized based on NCI-designation and fulfillment of Guiding Principles developed by the American Brain Tumor Association.
    Results: Websites of 91 brain tumor centers and 8 patient organizations were examined. Fewer than 10% of brain tumor centers' websites were readable at an eighth-grade level. There was no significant difference in readability between brain tumor centers and patient organizations. Patient organizations outperformed brain tumor centers on both quality measures, with no differences seen based on the category of centers. Only 48% of brain tumor centers and 63% of patient organizations scored at recommended levels on all cultural sensitivity scales. Most patient organizations, but few brain tumor centers, addressed practical, social, emotional, and spiritual needs.
    Conclusions: Publicly available brain tumor education materials are frequently at a high reading level. Quality and cultural sensitivity can be improved by citing sources, describing treatment risks, describing outcomes without treatment, addressing quality of life during treatment, addressing myths, and visually representing more patients. Patient organizations can provide models for addressing patient needs.
    Keywords:  brain tumors; cultural sensitivity; patient education; quality; readability
    DOI:  https://doi.org/10.1093/nop/npaf035
  14. J Spine Surg. 2025 Sep 30. 11(3): 450-462
       Background: With the increasing use of artificial intelligence (AI) chatbots like ChatGPT for online patient education, Generative Pre-trained Transformer 4 (GPT-4) has emerged as a significant tool for providing accurate health information. This study aims to compare Google and GPT-4 in terms of (I) question types, (II) initial response readability, (III) ChatGPT's ability to modify responses for increased readability, and (IV) numerical response accuracy for the top 10 most frequently asked questions (FAQs) related to cervical disc arthroplasty (CDA).
    Methods: "Cervical disc arthroplasty" was searched on Google and GPT-4 on December 18, 2023. The top 10 FAQs were recorded and analyzed using the Rothwell system for categorization and Journal of the American Medical Association (JAMA) criteria for source quality. Readability was assessed by Flesch Reading Ease and Flesch-Kincaid grade level. GPT-4 was prompted to revise text for low-literacy readability. We used Student's t-tests for a comparative analysis between GPT-4 and Google, setting significance at P<0.05.
    Results: FAQs from Google predominantly related to technical details and evaluation of surgery, paralleling GPT-4's focus, which also included indications/management. No significant differences were found in readability between GPT-4 and Google, displaying a similar Flesch-Kincaid grade level (13.06 vs. 12.24, P=0.41) and Flesch Reading Ease score (36.87 vs. 40.05, P=0.53). Upon prompting GPT-4 to improve the readability of its responses, GPT-4 showed a lower Flesch-Kincaid grade level (6.58 vs. 13.06 vs. 12.24, P<0.001) and a higher Flesch Reading Ease score (76.20 vs. 36.87 vs. 40.05, P<0.001). Numerically, 60% of responses differed, with GPT-4 suggesting a broader recovery period for CDA.
    Conclusions: GPT-4 has the potential to enhance patient education about CDA by customizing complex information for users with lower health literacy levels. This highlights GPT-4's ability to address existing gaps in online resources, benefiting those with lower health literacy.
    Keywords:  GPT-4; Google; cervical disc arthroplasty (CDA); patient education
    DOI:  https://doi.org/10.21037/jss-25-47
  15. J Exp Orthop. 2025 Oct;12(4): e70457
       Purpose: The study aimed to evaluate the accuracy, comprehensiveness, and readability of responses generated by ChatGPT 4.0 to 30 common patient questions about the Bernese periacetabular osteotomy (PAO).
    Methods: Two fellowship-trained orthopaedic surgeons specializing in hip preservation selected thirty questions from a prior study identifying common PAO questions on social media. Each question was entered into ChatGPT 4.0, and the surgeons independently graded responses. Responses were evaluated using an established grading system. Accuracy and comprehensiveness were assessed based on the concordance of response content with current literature. Readability was analysed by calculating the Flesch-Kincaid Grade Level and Flesch-Kincaid Reading Ease.
    Results: Regarding accuracy and comprehensiveness, 98.3% of responses were graded as "excellent" or "satisfactory, requiring minimal clarification." Readability analysis revealed an average Flesch-Kincaid Grade Level corresponding to an 11th-grade reading level (11.09 ± 1.47) and a mean Reading Ease score requiring college level reading comprehension (39.12 ± 8.25) for original responses, 8th-grade reading level (8.16 ± 1.46) requiring high school to college level reading comprehension (51.53 ± 9.62) for simplified responses, and 7th-grade reading level (7.09 ± 1.23) requiring high school level reading comprehension (62.46 ± 7.48) for 6th grade responses.
    Conclusion: ChatGPT 4.0 offered excellent or satisfactory answers to the most common questions surrounding PAO. Asking ChatGPT 4.0 to simplify or respond at a specific reading level may increase the readability of responses. The 4.0 model has shown the potential to be a valuable adjunct for patient education, though the readability may need to be improved via simplified responses.
    Level of Evidence: Level N/A.
    Keywords:  ChatGPT; artificial intelligence; periacetabular osteotomy
    DOI:  https://doi.org/10.1002/jeo2.70457
  16. J Robot Surg. 2025 Oct 14. 19(1): 687
       INTRODUCTION: With increasing accessibility to Artificial Intelligence (AI) chatbots, the precision and clarity of medical information provided require rigorous assessment. Urologic telesurgery represents a complex concept that patients will investigate using AI. We compared ChatGPT and Google Gemini in providing patient-facing information on urologic telesurgical procedures.
    METHODS: 19 questions related to urologic telesurgery were generated using general information from the American Urologic Association (AUA) and European Robotic Urology Section (ERUS). Questions were organized into 4 categories (Prospective, Technical, Recovery, Other) and directly typed into ChatGPT 4o and Google Gemini 2.5 (non-paid versions). For each question, a new chat was started to prevent any continuation of answers. Three reviewers independently reviewed the responses using two validated healthcare tools: DISCERN (quality) and Patient Education Material Assessment Tool (understandability and actionability).
    RESULTS: Mean DISCERN scores (out of 80) were higher for Gemini than ChatGPT in all domains except "Other". Prospective 49.2 versus 39.1; technical 52.3 versus 44.3; recovery 53.7 versus 45.4; other 54.3 versus 56.5; overall 52.4 versus 45.8 (Fig. 1). PEMAT-P understandability uniformly exceeded 70% for both platforms: prospective 80.0% versus 71.7%; technical 80.1% versus 79.8%; recovery 79.2% versus 80.1%; other 79.2% versus 81.3%; overall 79.7% versus 78.1% (Fig. 2). Actionability was uniformly low; only Gemini met the 70% threshold in the prospective domain (Fig. 3). Fig. 1 A Mean DISCERN scores (out of 5) with standard deviations for each response B Mean total DISCERN scores (out of 80) among each question category and overall. Numerical representation in the graph was rounded to the closest whole number for easier interpretation. Graphed value is still the true value. [56.5 represented numerically as 57, etc.…] Fig. 2 A Mean PEMAT-P Understandability scores with standard deviations for each response B Mean PEMAT-P Understandability scores among each question category and overall (70% minimum threshold for responses to be deemed "understandable"). Numerical representation in the graph was rounded to the closest whole number for easier interpretation. Graphed value is still the true value. [71.70% represented numerically as 72%, etc.…] Fig. 3 A Mean PEMAT-P Actionability scores with standard errors for each response B Mean PEMAT-P Actionability scores among each question category and overall (70% is the minimum threshold for responses to be deemed "actionable"). Numerical representation in the graph was rounded to the closest whole number for easier interpretation. The graphed value is still the true value. [65.40% represented numerically as 65%, etc.…] CONCLUSION: ChatGPT and Gemini deliver relevant and understandable information related to urologic telesurgery, with Gemini more consistently providing sources. However, neither chatbot reliably offers actionable responses, limiting their utility as a standalone gateway for patient decision-making.
    Keywords:  AI chatbots; DISCERN; PEMAT; Urologic telesurgery
    DOI:  https://doi.org/10.1007/s11701-025-02871-8
  17. Urol Int. 2025 Oct 09. 1-15
       BACKGROUND AND OBJECTIVE: Patient education materials (PEMs) play a vital role in ensuring patients understand their medical conditions and treatment options. In prostate cancer, complex medical terminology can hamper comprehension and informed decision-making. This study evaluates the readability of prostate cancer PEMs to determine if they meet recommended standards for lay audiences.
    METHODS: A selection of standardized prostate cancer PEMs, including standard surgical consent forms and patient brochures from major German cancer organizations, was analyzed. Readability was assessed using established metrics, including the Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Score (GFS), Simple Measure of Gobbledygook (SMOG) Index, Coleman-Liau Index (CLI), and Automated Readability Index (ARI). Layperson readability was defined as a FRES of 70 (at or below a seventh-grade reading level) and the other readability indexes ≤7, following European Union recommendations.
    RESULTS: The readability of prostate cancer PEMs of both surgical consent forms and patient brochures did not meet the recommended thresholds set by the European Union for layperson summaries. The median FRES for consent forms was 25.9 (SD: 1.52), ranging from 24.3 (prostate biopsy) to 28.0 (open RPx). Patient brochures showed a median FRES of 23.2 (SD: 2.87), with scores of 23.2 (German Cancer Aid), 22.5 (DKFZ), and 28.9 (S3-Guidelines). Section-specific values varied, with the highest FRES observed in the "Basic Explanation and Screening" section of the S3-Guidelines (39.0, SD: 7.09) and the lowest in the "Follow-Up" section of the German Cancer Aid brochure (15.8, SD: 10.35). All grade-level metrics (FKGL, GFS, SMOG, CLI, ARI) exceeded the recommended level of grade 7.
    CONCLUSION: The readability of prostate cancer PEMs in Germany falls short of recommended thresholds for lay comprehension. To enhance clarity and accessibility, the use of automated readability tools and standardized benchmarks (e.g., FRES ≥70, grade level ≤7) is recommended. Involving multidisciplinary teams may further support the development of patient-centered content. Future research should combine readability metrics with patient feedback to evaluate real-world comprehension and usability.
    DOI:  https://doi.org/10.1159/000548884
  18. J Clin Neurosci. 2025 Oct 16. pii: S0967-5868(25)00672-1. [Epub ahead of print]142 111699
      This correspondence comments on the recent article by Shukla and Sun (2025), which compared the readability of online and artificial intelligence-generated patient education materials in neuro-oncology. While the study provides meaningful insights, several methodological aspects deserve further consideration. The exclusive use of zero-shot prompts may have limited the model's adaptive capability, potentially contributing to low readability scores. Employing structured prompting strategies, such as one-shot or few-shot methods, and patient-centered instructions like "Explain in simple terms," could yield more accessible content. Moreover, since ChatGPT has gained internet browsing functionality as of February 2025, future studies integrating this feature may produce different outcomes regarding accuracy, readability, and clinical relevance. Together, these refinements could enhance the educational utility of Artificial Intelligence-generated health information.
    Keywords:  ChatGPT; Patient education; Prompt engineering; Role-based prompting
    DOI:  https://doi.org/10.1016/j.jocn.2025.111699
  19. BMJ Open. 2025 Oct 15. 15(10): e101030
       OBJECTIVES: To determine the quality of drug manufacturers' fact sheets for patients for COVID-19 therapeutics for baricitinib, convalescent plasma, anakinra, molnupiravir, nirmatrelvir/ritonavir, remdesivir, tocilizumab and vilobelimab, and fact sheet readability.
    DESIGN: Cross-sectional document analysis.
    SETTING: Fact sheets on COVID-19 drugs approved by the US Food and Drug Administration from 2020 to 2023.
    PRIMARY AND SECONDARY OUTCOME MEASURES: Quality assessments with the 16-item DISCERN tool scored 16-80 points and 36-item Ensuring Quality of Information for Patients (EQIP) tool scored 0-36, where lower scores indicate low-quality information. We assessed readability with Flesch-Kincaid Reading Ease (ranges from 0 to 100 where higher scores correspond to reading ease). Higher grades indicated hard-to-read information: Flesch-Kincaid grade level (ranges from grades 0 to 18 (college graduate)), Gunning-Fog score (ranges from grades 0 to 20 (college graduate)), Coleman-Liau index (ranges from grade 4 to college graduate), automated readability index (ranging from grades 5 to 22 (college graduate)), Dale-Chall Readability (ranges from grade 4 to college graduate) and simple measure of gobbledygook (ranges from grade 3 to college graduate). Secondary outcomes were word, syllable and sentence counts. We reported percentages and the median (IQR).
    RESULTS: We found 18 fact sheets that described 11 (63.5%) anti-virals (remdesivir (n=4), molnupiravir (n=4) and nirmatrelvir/ritonavir (n=3)) and 7 (37.5%) immune modulators (tocilizumab (n=2), baricitinib (n=2), convalescent plasma (n=1), anakinra (n=1) and vilobelimab (n=1)). DISCERN (median (IQR)) reliability was 4 (IQR 3-4) and 5 (1-5), while DISCERN treatment information was 3 (1-5) and 5 (1-5) for anti-virals and immune modulators, respectively. EQIP (median (IQR)) content was 12 (11-13) and 11 (11-13), identification of information was 4 (3-4) and 3 (3-3) and structure was 9 (8-9) and 9 (9-9) for anti-virals and immune modulators, respectively. Overall, fact sheets had median readability grade levels that ranged from 6.2 to 12.4. Anti-viral and immune modulator fact sheets had median readability grade levels from 6.1 to 12.5. Median (IQR) word, >4 syllable words and sentence counts were 1646.5 (1318.3-1934.8), 25.0 (21.3-29.8) and 118.0 (92.0-152.5) overall; 1758.00 (1200.0-2181.0), 23.0 (15.0-27.0) and 134.0 (82.0-185.0) for anti-virals; and 1461.0 (1341.0-1776.0), 29.0 (23.0-46.0) and 107.0 (105.0-122.0) for immune modulators, respectively.
    CONCLUSIONS: Although of fair quality, the fact sheet reading level was high, and the transparency of sources used was low. Regulatory officials should enforce readable resources from drug manufacturers to guide patients' decision-making surrounding COVID-19 therapeutics.
    Keywords:  COVID-19; Health Equity; MEDICAL ETHICS; THERAPEUTICS
    DOI:  https://doi.org/10.1136/bmjopen-2025-101030
  20. Int J Hyg Environ Health. 2025 Oct 16. pii: S1438-4639(25)00162-2. [Epub ahead of print]271 114680
       BACKGROUND: It is important for environmental health professionals to inform the public about potential chemical risks. Factsheets are a common way to disseminate information to the public, however, there has been little evaluation of whether these materials are fit for purpose.
    OBJECTIVES: This study evaluated the readability of factsheets about per- and polyfluoroalkyl substances (PFAS), contaminants of emerging concern that have impacted communities worldwide.
    METHODS: Using grey literature searches, we identified 36 PFAS fact sheets published by government agencies in countries where PFAS contamination events had occurred. Factsheets were evaluated using the Simple Measure of Gobbledygook (SMOG) readability formula, language complexity, and the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P).
    RESULTS: The factsheets had an average reading grade level of 15.1 with no factsheets meeting the recommended reading grade range of 5-8. On average, almost one quarter of the words used in the factsheets were complex or uncommon words. Only 8 of the factsheets scored above 70% on PEMAT-P, which is the threshold at which factsheets are categorised as understandable.
    CONCLUSION: This study demonstrates that PFAS factsheets are typically not written at an appropriate reading level. We identify several areas for improvement such as using health literacy tools to reduce the complexity of language, incorporating infographics and pop out boxes, and providing concise summaries of information. To increase environmental health literacy, environmental health communicators should draw on the learnings of health communication and utilise existing tools to improve readability.
    Keywords:  Communication; Environmental health literacy; Health education; PFAS; per- and polyfluoroalkyl substances
    DOI:  https://doi.org/10.1016/j.ijheh.2025.114680
  21. Musculoskelet Sci Pract. 2025 Oct 04. pii: S2468-7812(25)00178-X. [Epub ahead of print]80 103430
       BACKGROUND: Patients have unhelpful beliefs about low back pain (LBP), which are associated with worse outcomes. Education may modify these beliefs, but patients with LBP rarely receive education in practice. Patient education materials (PEMs) are a quick, inexpensive intervention to support information provision.
    OBJECTIVES: assess PEMs for understandability, actionability, quality, readability, accuracy, comprehensiveness, and coverage of information about patients' needs to identify the best PEMs for practice.
    METHODS: We searched published literature for PEMs tested in randomized trials or recommended in clinical guidelines. We used the Patient Education Materials Assessment Tool (PEMAT) to assess understandability and actionability, DISCERN to assess quality, the Patient Information and Education Needs Checklist for Low Back Pain (PINE-LBP) to assess information need coverage, and the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade-Level (FKGL) algorithms to assess readability. We assessed accuracy (proportion of treatment recommendations aligning with guidelines) and comprehensiveness (proportion of correctly covered guideline recommendations), and qualitatively synthesized PEM content relating to 21 information and education needs about LBP.
    RESULTS: Nineteen PEMs were included. None were actionable or comprehensive, and many had inaccurate treatment recommendations. There was considerable variation and conflicting information in the content provided across PEMs. Only the My Back Pain website met acceptable standards for more than half (4/7) outcomes.
    CONCLUSIONS: Educational messaging for LBP varies substantially and PEMs require improvement in various areas. The My Back Pain website met acceptable standards across most outcomes and may be the best available option for practice.
    Keywords:  Education materials; Low back pain; Patient education
    DOI:  https://doi.org/10.1016/j.msksp.2025.103430
  22. J Clin Med. 2025 Oct 01. pii: 6968. [Epub ahead of print]14(19):
      Background/Objectives: Patient education materials (PEMs) in ophthalmology often exceed recommended readability levels, limiting accessibility for many patients. While organizations like the AAO provide relatively easy-to-read resources, topics remain limited, and other associations' PEMs are too complex. AI chatbots could help clinicians create more comprehensive, accessible PEMs to improve patient understanding. This study aims to compare the readability of patient education materials (PEMs) written by the American Academy of Ophthalmology (AAO) with those generated by large language models (LLMs), including ChatGPT-4o, Microsoft Copilot, and Meta-Llama-3.1-70B-Instruct. Methods: LLMs were prompted to generate PEMs for 15 common diagnoses relating to cornea and anterior chamber, which was followed by a follow-up readability-optimized (FRO) prompt to reword the content at a 6th-grade reading level. The readability of these materials was evaluated using nine different readability analysis python libraries and compared to existing PEMs found on the AAO website. Results: For all 15 topics, ChatGPT, Copilot, and Llama successfully generated PEMs, though all exceeded the recommended 6th-grade reading level. While initially prompted ChatGPT, Copilot, and Llama outputs were 10.8, 12.2, and 13.2, respectively, FRO prompting significantly improved readability to 8.3 for ChatGPT, 11.2 for Copilot, and 9.3 for Llama (p < 0.001). While readability improved, AI-generated PEMs were on average, not statistically easier to read than AAO PEMs, which averaged an 8.0 Flesch-Kincaid Grade Level. Conclusions: Properly prompted AI chatbots can generate PEMs with improved readability, nearing the level of AAO materials. However, most outputs remain above the recommended 6th-grade reading level. A subjective analysis of a representative subtopic showed that compared to AAO, there was less nuance, especially in areas of clinical uncertainty. By creating a blueprint that can be utilized in human-AI hybrid workflows, AI chatbots show promise as tools for ophthalmologists to increase the availability of accessible PEMs in ophthalmology. Future work should include a detailed qualitative review by ophthalmologists using a validated tool (like DISCERN or PEMAT) to score accuracy, bias, and completeness alongside readability.
    Keywords:  anterior segment; artificial intelligence; ophthalmology; patient education; patient readability
    DOI:  https://doi.org/10.3390/jcm14196968
  23. BJU Int. 2025 Oct 13.
       OBJECTIVE: To evaluate the quality (DISCERN), understandability and actionability (Patient Education Materials Assessment Tool for Printable Materials [PEMAT-P]), readability (Flesch-Kincaid), and misinformation of patient-facing information on interstitial cystitis generated by four publicly available artificial intelligence (AI) chatbots: ChatGPT-4.0, Perplexity, ChatSonic, and Bing AI.
    METHODS: A total of 10 queries derived from Google Trends and Hopkins Medicine content were submitted to each chatbot. Responses were evaluated by two blinded reviewers using validated tools: the DISCERN instrument (reliability/quality), PEMAT-P (understandability/actionability), and Flesch-Kincaid Grade Level (readability). Word count and citation inclusion were also recorded.
    RESULTS: Across chatbots, information quality was moderate with a median (interquartile range [IQR]) DISCERN score of 3/5 (2-3), with Perplexity performing best and Bing AI worst. Understandability was moderate (median [IQR] PEMAT-P score 75% [66.7-83.3%]), highest for ChatSonic with Hopkins Medicine-derived prompts and lowest for ChatGPT with Google Trends inputs. Actionability was consistently poor (median [IQR] score 40% [20-60%]), with ChatSonic performing best and Bing AI lowest. Responses averaged 256 words and college-level readability (median [IQR] Flesch-Kincaid score 25.4 [20.89-28.50]) across all platforms, limiting accessibility. Misinformation was minimal across all platforms. Chatbots referencing clinically curated prompts (Hopkins Medicine) scored higher in understandability and completeness than those responding to public search trends.
    CONCLUSION: Artificial intelligence chatbots offer generally accurate and understandable information about interstitial cystitis but lack actionable guidance and generate content at reading levels above typical patient comprehension. Enhancing readability, actionability, and personalisation may increase their utility as adjunct tools for patient education in functional urology.
    Keywords:  artificial intelligence; chatbot; interstitial cystitis; large language model; patient education; patient information; patient‐centred care; quality of information
    DOI:  https://doi.org/10.1111/bju.70035
  24. J Spine Surg. 2025 Sep 30. 11(3): 430-437
       Background: Many patients refer to internet-based patient education materials (PEMs) to learn about lumbar disc replacement. The purpose of this study is to assess the readability of PEMs on lumbar disc replacement.
    Methods: The Google search engine was queried with the phrase "lumbar disc replacement patient information". Readability scores were calculated for the initial 25 websites that met inclusion criteria by copying the PEM to http://www.readabilityformulas.com. SPSS version 28.0.0 was used to calculate descriptive statistics for each measure.
    Results: The mean average reading level was 12.08±1.73. The mean readability score for Flesch-Kincaid Reading Ease Score was 45.60±9.16. Additional scores include Gunning Fog, 14.50±2.06; Flesch-Kincaid Grade Level (FKGL), 10.94±2.14; The Coleman Liau Index, 12.82±1.50; Simple Measure of Gobbledygook (SMOG) Index, 10.51±1.56; Automated Readability Index, 11.81±2.46; Linsear Write Formula, 11.08±3.49. Zero PEMs were found to be below the 6th-grade or 8th-grade reading level.
    Conclusions: PEM readability is a crucial part of the patient care experience, and the current readability of lumbar disc replacement PEMs is not at an acceptable level. Given their current state, PEMs can make it difficult for a sizable proportion of the general population to comprehend the nature of their medical condition and how to appropriately treat it.
    Keywords:  Readability; lumbar disc replacement; patient education materials (PEMs); spine
    DOI:  https://doi.org/10.21037/jss-25-50
  25. Expert Opin Biol Ther. 2025 Oct 15.
       BACKGROUND: Fecal microbiota transplantation (FMT) is increasingly used in geriatric medicine, including intestinal decolonization of antimicrobial-resistant bacterial pathogens and the treatment of inflammatory bowel disease, graft versus host disease and autism spectrum disorders. The aim of this study was to examine readability of patient-facing FMT information.
    RESEARCH DESIGN AND METHODS: Readability was calculated using Readable software, examining (i) Flesch Reading Ease (FRE), (ii) Flesch-Kincaid Grade Level (FKGL), (iii) Gunning Fog Index and (iv) SMOG Index and two text metrics [words/sentence, syllables/word] for 234 sources of FMT information, from four categories (abstracts/hospital information/patient-facing information/clinical trials).
    RESULTS: Mean readability scores of FMT information for FRE and FKGL were 22.2 ± 1.2 (SEM)) (target > 60) and 14.8 ± 0.2 (target < 8), respectively, with mean words/sentence and syllables/word of 19.2 ± 0.4 and 2.0, respectively. There was no significant difference in readability between scientific abstracts and lay summaries. No information was found that had a readability of less than 7th grade (12-13 year olds).
    CONCLUSION: Readability of FMT information for patients is poor, not reaching readability reference standards. Authors of FMT information should consider using readability calculators when preparing FMT information, so that the final material is within recommended readability reference parameters, to support the health literacy and treatment adherence of readers.
    Keywords:  Clostridioides difficile; Clostridium difficle; fecal microbiota transplantation; fecal transplant; health literacy; readability
    DOI:  https://doi.org/10.1080/14712598.2025.2576509
  26. BMC Med Educ. 2025 Oct 14. 25(1): 1418
       BACKGROUND: Social media has become a prominent source of health educational short videos (HESVs), yet their quality varies significantly, with many being inaccurate, incomplete, or poorly presented. Both healthcare professionals and the public lack clear criteria to evaluate HESVs quality when seeking health information on social media.
    METHODS: This study aims to evaluate the quality of HESVs on YouTube and Douyin based on Lasswell's 5 W communication model, and analyze the key factors for selecting high-quality HESVs. 200 videos were selected from YouTube and Douyin, respectively, between October 1 and November 30, 2024. Four independent reviewers analyzed the quality of HESVs using the Lasswell's Video Quality scale (LassVQ), modified DISCERN (M-DISCERN), and Global Quality Score (GQS). Comparative analysis, correlation analysis, and multiple linear regression were conducted by R 4.3.2 software.
    RESULTS: The median video length was 189 s on YouTube and 88 s on Douyin. Douyin videos received significantly more likes (152190 vs. 116354, P < 0.001) and comments (10496 vs. 2726, P < 0.001) than YouTube videos. The median quality scores on YouTube and Douyin were: LassVQ (3.77 vs. 3.66, P = 0.0254), M-DISCERN (3.12 vs. 2.41, P < 0.001), and GQS (4 vs. 4, P = 0.906). Except for like volume on Douyin (r = 0.25-0.4, P < 0.001), no significant correlation between engagement metrics and the quality of HESVs. Multiple linear regression analysis revealed that key factors for high-quality video selection included visible reference information (YouTube: P = 0.002, Douyin: P = 0.014), necessary relevant information (Douyin: P < 0.001), clear dubbing, louder than background music (YouTube: P = 0.005, Douyin: P = 0.021), perception of knowledge acquisition (Douyin: P = 0.021), perception of action necessity (Douyin: P = 0.031), motivation to share (YouTube: P < 0.001), recommended as a trending video (Douyin: P = 0.033).
    CONCLUSIONS: The quality of HESVs on YouTube and Douyin is inadequate. Users should prioritize trending HESVs produced by experts, with clear dubbing, visible references, essential content, and a clear sense of knowledge acquisition, action necessity, and sharing motivation.
    Keywords:  Health educational short videos (HESVs); Social media; Video quality
    DOI:  https://doi.org/10.1186/s12909-025-08018-5
  27. Cancers (Basel). 2025 Oct 02. pii: 3222. [Epub ahead of print]17(19):
      Introduction: YouTube™ is a widely accessible platform with unfiltered medical information. This study aimed to evaluate the educational value and reliability of YouTube™ videos on Hyperthermic Intraperitoneal Chemotherapy (HIPEC) for advanced epithelial ovarian cancer treatment. Methods: YouTube™ videos were searched using the keywords "ovarian cancer", "debulking surgery", "hyperthermic", and "HIPEC". Patient Education Materials Assessment Tool for Audiovisual Content (PEMAT A/V) score, DISCERN, Misinformation Scale, and the Global Quality Scale (GQS) were employed to assess the clarity, quality, and reliability of the information presented. Results: Of the 150 YouTube™ videos screened, 71 were suitable for analysis and categorized by target audience (general public vs. healthcare workers). Most (57, 80.2%) were uploaded after the "Ov-HIPEC" trial (18 January 2018), with a trend toward more videos for healthcare workers (p = 0.07). Videos for the general public were shorter (p < 0.001) but received more views (p = 0.06) and likes (p = 0.09), though they were of lower quality. The DISCERN score averaged 50 (IQR: 35-60), with public-targeted videos being less informative (p < 0.001), a trend mirrored by the Misinformation Scale (p < 0.001) and GQS (p < 0.001). The PEMAT A/V scores showed 80% Understandability (IQR: 62-90) and 33% Actionability (IQR: 25-100), with no significant difference between groups (p = 0.15, p = 0.4). Conclusions: While YouTube™ provides useful information for healthcare professionals, it cannot be considered a reliable source for patients seeking information on HIPEC for ovarian cancer. Many videos contribute to misinformation by not properly explaining treatment indications, timing, adverse effects, multimodal approaches, or clinical trial findings.
    Keywords:  DISCERN; HIPEC; PEMAT; YouTube™; hyperthermic; ovarian cancer
    DOI:  https://doi.org/10.3390/cancers17193222
  28. J Alzheimers Dis Rep. 2025 Jan-Dec;9:9 25424823251368887
       Background: YouTube is increasingly used by patients and caregivers as a source of health information. However, the quality and reliability of content on Alzheimer's disease dementia (ADD) remain uncertain.
    Objective: This study aimed to determine whether YouTube videos on ADD provide reliable and high-quality information for caregivers and to assess whether the most popular videos are also the most trustworthy.
    Methods: In December 2024, YouTube was systematically searched for ADD-related videos. Two independent physicians reviewed each video, scoring it using modified DISCERN (mDISCERN) for reliability and the Global Quality Scale (GQS) for content quality. Videos were categorized by goal and assessed for quality, accuracy, comprehensiveness, and specific content.
    Results: There were 117 videos included in the study. Using the mDISCERN scale, 70 videos (59.8%) were deemed with good reliability, 33 videos (28.2%) have moderate reliability, and 14 videos (12%) have poor reliability. Using the GQS, 61 videos (51.1%) have high quality, 16 videos (28%) were assessed as excellent quality, 34 videos (29%) as moderate quality, and 7 videos (6%) as low quality. Videos from academic institutions, news agency and physicians exhibited higher mDISCERN and GQS scores compared to other groups and a significant correlation was seen between mDISCERN and GQS (p < 0.001).
    Conclusions: The videos on ADD produced by healthcare professionals and academic institutions have high quality and good reliability, covering disease properties, treatment choices, and patient experiences. However, video popularity does not significantly correlate with content reliability and quality.
    Keywords:  Alzheimer's disease; YouTube; dementia; educational resource
    DOI:  https://doi.org/10.1177/25424823251368887
  29. Obes Surg. 2025 Oct 16.
       BACKGROUND: Obesity is a chronic disease with a rising global prevalence, representing a significant public health burden. Patients increasingly utilize short-video platforms such as TikTok and Bilibili to obtain health information regarding bariatric surgery. The quality and reliability of this content have not been thoroughly evaluated, raising concerns about the potential for misinformation to influence patient decision-making.
    METHODS: A cross-sectional content analysis was conducted on the top 100 videos retrieved from both TikTok and Bilibili using the keyword "bariatric surgery" in Chinese. After excluding irrelevant and duplicate content, a total of 200 videos were included for analysis. Videos were systematically categorized by uploader type and content. Two senior bariatric surgeons independently assessed the videos for quality and reliability using the Global Quality Score (GQS) and a modified DISCERN instrument.
    RESULTS: TikTok videos demonstrated significantly higher user engagement, with greater median likes, collections, shares, and comments compared to Bilibili (p < 0.001). Conversely, Bilibili videos had a significantly longer median duration (p < 0.001). The overall quality of videos on both platforms was suboptimal. However, TikTok videos received modestly higher GQS and DISCERN scores from both reviewers (p < 0.05). Content uploaded by professional institutions achieved the highest quality scores across both platforms (p < 0.001). Professional individuals were the predominant uploaders, accounting for 79.0% of the videos. A strong positive correlation was observed among user engagement metrics (likes, saves, shares, comments; r > 0.9), but these metrics showed no significant correlation with GQS or DISCERN quality scores.
    CONCLUSION: The quality and reliability of bariatric surgery-related educational content on both TikTok and Bilibili are largely inadequate. While TikTok videos demonstrated slightly superior quality scores, professional institutions represent the most reliable source of information. User engagement metrics are poor indicators of video quality. These findings underscore the need for healthcare professionals to guide patients in navigating online health information and for platforms to implement more stringent quality control measures.
    Keywords:  Bariatric surgery; Bilibili; DISCERN; Health information quality; Patient education; TikTok
    DOI:  https://doi.org/10.1007/s11695-025-08317-2
  30. Gynecol Oncol Rep. 2025 Oct;61 101964
       Objective: To evaluate TikTok videos with the hashtag #PapSmear and analyze their educational quality, themes, tone, creator, and engagement metrics.
    Methods: A cross-sectional analysis of the top 150 TikTok videos with #PapSmear was conducted in September 2023. Videos were evaluated by four independent reviewers for engagement metrics, video topic, intent, tone, creator, and themes. The educational quality of videos was assessed using the brief DISCERN tool. Statistical analyses were performed to examine differences and relationships across groups.
    Results: Among the 150 videos reviewed, 75.3 % focused on Pap smears, while 24.7 % discussed pelvic exams. Patients created 60.7 % of videos while healthcare providers contributed 39.3 %. Videos created by healthcare providers were more likely to be neutral or positive in tone and focus on educational content. Videos created by patients tended to have a negative tone, often focusing on personal experiences or comedy. Negative tones were significantly associated with higher engagement by "likes" and "shares" compared to neutral and positive tones. DISCERN scores were low across all videos. Healthcare provider videos did not score significantly higher than patient-generated content (p > 0.05). 10 % (n = 15) of videos addressed trauma related to a Pap smear or pelvic exam.
    Conclusion: Our analysis of the top 150 TikTok videos with #papsmear shows low educational quality across all creators, highlighting the difficulty of providing accurate health information on social media. To improve the quality and impact of health communication on social media, healthcare professionals might consider integrating relatable storytelling and appropriate humor while still prioritizing medical accuracy and trauma-informed messaging.
    DOI:  https://doi.org/10.1016/j.gore.2025.101964
  31. Contraception. 2025 Oct 10. pii: S0010-7824(25)00447-0. [Epub ahead of print] 111256
       OBJECTIVES: This study aims to describe content and assess information quality in top TikTok videos tagged #depo and/or #depoprovera STUDY DESIGN: We evaluated the top 100 most viewed videos tagged #depo or #depoprovera.
    RESULTS: Many creators were young with 44% being between 20-29 years old. Most videos are about a creator's personal experience (52%) or are educational (35%). About one-third (36%) of creators are medical professionals and their videos have higher mean accuracy scores on the DISCERN scale as compared to videos created by laypeople (p=0.005).
    CONCLUSIONS: #DepoProvera TikTok videos often depict negative personal experiences shared by laypeople, influencing patient perceptions of this contraceptive method.
    Keywords:  Contraception; Depo Provera; Health information; Medroxyprogesterone Acetate; Social Media; TikTok
    DOI:  https://doi.org/10.1016/j.contraception.2025.111256
  32. J Cancer Educ. 2025 Oct 15.
      People with cancer and their families are often provided with a range of complex written and verbal information to help them manage treatment and side effects at home. This study explored the health information needs of patients and family members and investigated the influence of video-assisted health education on their understanding of the information. A co-design framework with health consumers and clinicians was used to identify concepts and create videos. Qualitative interviews with a thematic analysis explored their health information needs and the influence of the videos. The sample was persons affected by brain, head and neck and gastrointestinal cancer. Eleven interviews were conducted with patients and family members, aged between 39 and 82 years. The health literacy levels reported by participants highlighted the need for help with medical information and forms. Four themes were developed: sorting through information, acceptability of videos, information presentation and balance of caring. Providing health information in multiple formats and tailoring it to individuals' health literacy levels can reinforce key messages from health professionals and contribute to improved health outcomes. Video-assisted health education enhances patients' and families' understanding and supports informed decision-making about cancer treatment and self-care at home. While digital resources offer a promising avenue for improving comprehension, access and usability are influenced by varying levels of digital literacy, an area that warrants further investigation.
    Keywords:  Cancer; Health literacy; Health professionals; Information; Videos
    DOI:  https://doi.org/10.1007/s13187-025-02753-5
  33. Healthcare (Basel). 2025 Oct 08. pii: 2539. [Epub ahead of print]13(19):
      Background: Online health information seeking emerged as a critical form of public health behavior during the COVID-19 pandemic, generating substantial research interest. However, empirical studies examining health information-seeking patterns among Korean populations and their behavioral outcomes during the pandemic remain limited. Grounded in the information-motivation-behavior skills model, this study investigates online health information-seeking behaviors, including information sources, search terms, and engagement patterns, while also exploring their association with actual health behaviors during the COVID-19 pandemic. Methods: A structured survey was developed based on 1014 adults aged 19 years or older using the 2021 Korean version of the Health Information National Trends Survey (K-HINTS) to obtain nationally representative data. We adopted a structural equation model and analyzed the data using SPSS 25.0 and the WordArt site. Results: Of the respondents, 74.2% sought health information online, with vaccine details being the most widely searched topic. Mobile phones were the most commonly used devices (75.8%), and 98% searched for health information online via mobile devices at least once a week. Information (β = 0.230, p < 0.001), motivation (β = 0.117, p < 0.01), and behavior skills (β = 0.117, p < 0.01) positively influenced consumers' behavioral changes regarding health. Behavioral skills also mediated the influences that information seeking and motivation had on behavioral changes. Conclusions: This study examines four aspects of online health information seeking through nationally representative COVID-19 data in South Korea. Exploring the relationship between information-seeking and actual health behaviors provides crucial insights for predicting post-pandemic consumer behavior and developing effective public health communication strategies for future crises.
    Keywords:  COVID-19; E-health; behavior; consumer health information
    DOI:  https://doi.org/10.3390/healthcare13192539
  34. J Clin Med. 2025 Sep 25. pii: 6795. [Epub ahead of print]14(19):
      Background: Cyberchondria is characterized by heightened health anxiety resulting from excessive online health information seeking, and studies on this topic in the field of hematology are limited. The aim of this study was to examine the levels of cyberchondria and health anxiety among patients attending the hematology outpatient clinic without a diagnosis of malignancy, and to evaluate the relationship between these two factors. Methods: This prospective cross-sectional study was conducted at the hematology outpatient clinic of Recep Tayyip Erdogan University School of Medicine in Rize, Turkey. The 400 patients included in the study were divided into groups according to their reasons for visiting the outpatient clinic: hemoglobin disorders, leukocyte disorders, and platelet disorders. The severity of cyberchondria was assessed using the Cyberchondria Severity Scale-12 (CSS-12), and health anxiety level was assessed using the Short Health Anxiety Inventory (SHAI). Results: The mean age of the 400 patients (255 female, 145 male) was 37.7 ± 11.2 years (18-60 years). The mean SHAI score for patients was 16.1 ± 6.6, and the mean CSS-12 score was 28.7 ± 7.4. Patients presenting with platelet disorders had the highest SHAI scores (18.4 ± 5.6), followed by patients presenting with leukocyte disorders (16.7 ± 6.4) and hemoglobin disorders (15.5 ± 6.8) (p = 0.009). In terms of CSS-12 scores, the highest values were found in patients presenting with leukocyte disorders (31.8 ± 8.5), followed by platelet disorders (30.1 ± 7.7) and hemoglobin disorders (27.6 ± 6.7) (p < 0.001). There was a positive relationship between health anxiety level and the severity of cyberchondria (r = 0.413, p < 0.001) Conclusions: The positive correlation observed between cyberchondria severity and health anxiety level underscores the need to consider psychological effects in hematology patients. This clinical condition may increase the burden of disease and should not be overlooked by physicians.
    Keywords:  cyberchondria; health anxiety; hematology
    DOI:  https://doi.org/10.3390/jcm14196795
  35. Acta Paediatr. 2025 Oct 14.
       AIM: This study explores the trends in healthcare professionals' searches for children's oral suspension of antibiotics and antibiotic prescriptions, as well as COVID-19 diagnoses in children under 5-6 years of age.
    METHODS: In Finland, Physician's Databases serve as a medical online source for healthcare professionals. Searches for seven oral suspensions of antibiotics (amoxicillin, amoxicillin/clavulanic acid, phenoxymethylpenicillin, cephalexin, azithromycin, clarithromycin and erythromycin) were analysed in 2017-2023, alongside children's antibiotic prescriptions from the Social Insurance Institution, as well as children's COVID-19 diagnoses from the register of the Finnish Institute for Health and Welfare.
    RESULTS: There were 1 231 383 antibiotic searches, 492 407 antibiotic prescriptions and 42 293 diagnoses for COVID-19 in children. Amoxicillin was searched most (498 276) with clear seasonal patterns, and the decrease appeared in searches and prescriptions during March 2020-August 2021. Amoxicillin searches also showed an increase with two major peaks during September 2021-August 2023.
    CONCLUSION: Healthcare professionals' online information-seeking for children's oral suspension of antibiotics showed a great decrease in searches after the start of COVID-19 but rapidly increased 2 years later. Information-seeking behaviour may be used to monitor the clinical use of antibiotics in children.
    Keywords:  COVID‐19; anti‐bacterial agents; bacterial infections; health personnel
    DOI:  https://doi.org/10.1111/apa.70352