bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–03–29
28 papers selected by
Thomas Krichel, Open Library Society



  1. Free Neuropathol. 2026 ;7 7
      The terms "Alzheimer's disease" and "Alzheimer disease" are often used interchangeably in the biomedical literature. Yet this seemingly minor grammatical difference carries implications that extend beyond style: the possessive form, marked by the 's eponym, may imply ownership of a disease by an individual, a notion discouraged by several authoritative medical style guides and international health organizations [1]. In this article, we examine the historical emergence of the term "Alzheimer's disease", analyze the trajectories of the possessive and non-possessive eponyms in PubMed-indexed article titles from 1950 to 2025, and assess how the choice of terminology influences literature retrieval. Our analysis indicates that the possessive form has overwhelmingly dominated the literature for decades. However, searches using "Alzheimer's disease" or "Alzheimer disease" retrieve non-identical, only partially overlapping sets of records in PubMed. We argue that adopting the non-possessive form "Alzheimer disease" would improve conceptual clarity, terminological consistency, and the completeness of literature retrieval, particularly in systematic reviews and meta-analyses.
    Keywords:  AMA style guide; All fields; Alzheimer disease; Alzheimer's disease; Down syndrome; Exact phrase; ICD-11; MeSH; NIH editorial style guide; Non-possessive eponym; Possessive eponym; Tourette syndrome
    DOI:  https://doi.org/10.17879/freeneuropathology-2026-9132
  2. J Am Pharm Assoc (2003). 2026 Mar 20. pii: S1544-3191(26)00067-1. [Epub ahead of print] 103082
       OBJECTIVES: To evaluate the quality, reliability, and thematic content of user-generated videos on TikTok concerning glucagon-like peptide-1 (GLP-1) receptor agonists.
    METHODS: This cross-sectional study analyzed the top 400 most popular videos using the hashtags #Ozempic, #Semaglutide, #Mounjaro, and #Tirzepatide. A dual approach was employed, incorporating qualitative thematic analysis and quantitative scoring. The JAMA Internet Health Information Criteria and the DISCERN Consumer Health Information Evaluation Tool were utilized to assess the quality and reliability of the health information provided.
    RESULTS: Of the analyzed content, 98% was produced by individual users or influencers. The reliability-based information quality score for these accounts was significantly lower than that of healthcare professionals (34.53 vs. 52.31). No significant correlation was observed between engagement metrics (views and likes) and information quality. A significant proportion of the content contained risky or misleading information, particularly regarding side effects and safe usage.
    CONCLUSION: The findings suggest that the TikTok algorithm prioritizes engagement over content quality. There is a substantial communication deficit regarding GLP-1 receptor agonists on social media, necessitating better public health strategies to combat misinformation.
    Keywords:  DISCERN score; GLP-1 receptor agonists; Journal of the American Medical Association score; TikTok; Weight loss
    DOI:  https://doi.org/10.1016/j.japh.2026.103082
  3. Aesthetic Plast Surg. 2026 Mar 25.
       BACKGROUND: Large language models (LLMs) are becoming a common source of medical information for patients.
    OBJECTIVE: This study aimed to evaluate and compare the quality and readability of ChatGPT and Google's Gemini in answering frequently asked questions (FAQs) about augmentation mammaplasty (AM).
    METHODS: Ten AM FAQs were submitted to ChatGPT (GPT-4.1 mini) and Gemini (2.5 Flash). Responses were de-identified and independently rated by two board-certified plastic surgeons and one senior resident using the Global Quality Score (GQS). Readability was assessed using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL). Paired comparisons used the Wilcoxon signed-rank test for per-question median GQS, inter-rater agreement used Kendall's W, and readability used paired tests as appropriate.
    RESULTS: Across 60 individual ratings (3 raters × 10 items × 2 models), per-question median GQS was 5 for 9/10 ChatGPT answers and 10/10 Gemini answers; the paired comparison showed no significant difference (Wilcoxon Z = -1.00; p = 0.317; effect size r = 0.32). Inter-rater agreement was W = 0.24 (ChatGPT, p = 0.091) and W = 0.60 (Gemini, p = 0.002). ChatGPT produced more readable outputs (FRE: 46.53 vs 43.70, p = 0.243; FKGL: 9.71 vs 11.43, p = 0.002), indicating approximately two US grade levels of easier reading.
    CONCLUSION: ChatGPT and Gemini both generated high-quality answers to common AM FAQs, with no difference in quality based on GQS. ChatGPT's responses were significantly easier to read according to FKGL. LLMs may support patient education when implemented with clinician oversight to mitigate limitations and prevent misinformation.
    LEVEL OF EVIDENCE IV: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
    Keywords:  Artificial intelligence; Augmentation mammaplasty; ChatGPT; Gemini; Large language models; Patient education; Plastic surgery
    DOI:  https://doi.org/10.1007/s00266-026-05766-7
  4. J Back Musculoskelet Rehabil. 2026 Mar 17. 10538127261433272
      BackgroundArtificial intelligence (AI)-based chatbots are increasingly used as sources of medical information. Given the high prevalence of neck pain as a musculoskeletal symptom, patients may commonly consult such tools for health-related guidance.ObjectiveTo evaluate and compare the performance of ChatGPT 4.0 and Google Gemini in addressing commonly asked patient questions and clinical case scenarios related to neck pain, focusing on their accuracy, quality, understandability, readability, reliability, and usability.MethodsTwenty-four patient-oriented questions and four clinical case scenarios regarding neck pain were submitted to ChatGPT 4.0 and Google Gemini. Responses were evaluated using validated tools: modified DISCERN (mDISCERN) for reliability, Global Quality Scale (GQS) for quality, PEMAT-P for understandability and actionability, and Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) for readability. Case-based responses were assessed for accuracy, safety, and usability on a 7-point Likert scale by two experienced physicians.ResultsGemini demonstrated significantly higher reliability (mDISCERN, p < 0.001), whereas ChatGPT 4.0 had slightly higher, though statistically insignificant, GQS and PEMAT-P scores. Readability metrics were similar: ChatGPT's FRE was 48.78 and FKGL 9.08; Gemini's FRE was 47.12 and FKGL 9.11. Both models' outputs were considered difficult to read. In clinical scenarios, both chatbots showed comparable accuracy, safety, and usability, with minor omissions noted.ConclusionChatGPT 4.0 and Google Gemini provided similar performance in addressing neck pain-related queries. While both may support patient.
    Keywords:  Neck < N; artificial intelligence < A; pain < P
    DOI:  https://doi.org/10.1177/10538127261433272
  5. J Food Sci. 2026 Apr;91(4): e71001
      Consumers increasingly turn to artificial intelligence (AI) systems, including search engines and large language models (LLMs), for immediate food safety guidance. However, the reliability and accessibility of this information for critical public health issues, such as food poisoning, remain unassessed. This study benchmarks the performance of major AI systems: Google, ChatGPT, DeepSeek, and Mistral, by simultaneously evaluating the readability and information quality of their responses to frequently asked questions on food poisoning. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), and Gunning-Fog Index (GFI) indices. Information quality was evaluated by independent experts using the validated DISCERN instrument and Global Quality Scale (GQS). Our analysis revealed a critical divergence in platform performance. Google produced the most readable text (FKGL: 9.05) but the lowest quality information (DISCERN: 30-34; GQS: only 3% of ratings were top-score). Conversely, LLMs provided high-quality information (DeepSeek DISCERN: 70-75; ChatGPT: 62) but at significantly higher reading levels (FKGL: 10.01-11.32), exceeding the recommended sixth-grade level. This demonstrates a fundamental trade-off: search engines optimize for brevity and accessibility, whereas dedicated LLMs prioritize comprehensive, reliable content. This forces consumers to choose between understandable but potentially misleading information and accurate but inaccessible guidance. Our findings highlight an urgent need to bridge this gap between readability and quality, calling for the development of AI systems that deliver authoritative, comprehensible food safety advice to protect public health.
    Keywords:  food safety; information quality; large language models; public health; readability
    DOI:  https://doi.org/10.1111/1750-3841.71001
  6. J Pers Med. 2026 Feb 27. pii: 129. [Epub ahead of print]16(3):
      Background: Multimodal large language model (MLLM)-based systems capable of generating health-related information and diagnostic suggestions are increasingly used for health information retrieval; however, their accuracy, readability, and quality in oral healthcare remain unclear. Oral mucosal diseases comprise a heterogeneous group of conditions affecting the oral lining, ranging from benign and reactive lesions to potentially malignant and malignant disorders. Objective: This study evaluated and compared the diagnostic performance, readability, and information quality of MLLMs with traditional search engines included as comparator platforms, in diagnosing oral mucosal diseases. Methods: A cross-sectional observational study was conducted using 100 validated oral mucosal case scenarios representing benign, malignant, potentially malignant, infectious, and reactive oral lesions. Each scenario was entered into ChatGPT 3.5, ChatGPT 4.5 (Plus), Microsoft Copilot (smart), Grok (xAI), Claude (Sonnet 4.5), DeepSeek v3.1, and search engines Google, Bing, and Yahoo. Diagnostic accuracy, Positive Predictive Value (PPV), and Negative Predictive Value (NPV) were compared against reference diagnoses. Information quality was assessed using the DISCERN tool, and readability was evaluated using Flesch-Kincaid Reading Ease (FRES) and Grade Level (FKGL) scores. Statistical analyses included Cochran's Q and McNemar tests (p < 0.05). Results: ChatGPT 4.5 demonstrated the highest overall diagnostic accuracy (88.5%), PPV (92%), and NPV (88%), followed by DeepSeek v3.1 and Claude (Sonnet 4.5). Traditional search engines performed poorly (accuracy 18-55%). MLLMs achieved higher DISCERN scores (2.84-3.20) but lower readability (FKGL = 11-14) than search engines (FKGL = 6-7). No platform met the recommended sixth-grade reading level for consumer health information. Conclusions: MLLMs, particularly ChatGPT Plus (GPT-4.5), outperformed conventional search engines in diagnostic accuracy and content quality but produced complex, less-readable text. Future AI development should prioritise improving clinical accuracy alongside readability and transparency to ensure equitable access to reliable oral health information.
    Keywords:  ChatGPT; artificial intelligence; diagnostic accuracy; health information quality; oral mucosal diseases; readability
    DOI:  https://doi.org/10.3390/jpm16030129
  7. Digit Health. 2026 Jan-Dec;12:12 20552076261435836
       Objective: This study aimed to systematically evaluate five leading LLMs-ChatGPT, DeepSeek, Copilot, Gemini, and Perplexity-in providing MHD-related health information. The primary objectives were to determine (1) the reliability of MHD-related information generated by LLMs and (2) whether its readability meets the recommended standards for patient educational materials.
    Methods: A cross-sectional comparative design was adopted. The approximate timeframe during which the responses were generated was October 2025. Seventeen frequently asked MHD-related questions were identified using Google Trends and two online patient-caregiver forums. Each query was input into the five LLMs (ChatGPT-4o, Copilot, Gemini 2.5 Pro, Perplexity Pro, and DeepSeek-V3.2-Exp), and their responses were assessed using DISCERN, EQIP, JAMA, and GQS criteria for reliability, alongside FKGL, FRES, SMOG, CLI, ARI, and LWF readability indices. A heatmap analysis was also conducted to evaluate intra-model response variability.
    Results: High inter-rater reliability was confirmed between the two experts (ICC for average measures ranged from 0.851 to 0.879, all P < .001). Significant differences were observed among the five LLMs in both reliability and readability. Overall reliability scores were relatively low; however, Perplexity consistently achieved higher DISCERN, EQIP, and JAMA scores compared with Gemini, ChatGPT, Copilot, and DeepSeek (P < .001). In terms of readability, all models produced texts exceeding the sixth-grade reading level. Their ARI, GFI, FKGL, CLI, and SMOG scores were notably higher than recommended, while FRES scores were substantially below the 80-90 range. Heatmap analysis further demonstrated that although Perplexity and ChatGPT maintained relatively stable mean scores, they exhibited higher variability across different queries.
    Conclusions: Current large language models (LLMs) exhibit significant variability in delivering maintenance hemodialysis information. While all five evaluated models demonstrated limitations in information quality, transparency, and readability, Perplexity performed relatively better overall. However, persistent deficiencies in source attribution, language accessibility, and response consistency limit their immediate clinical and educational utility. Future LLM development should prioritize readability optimization and context-aware customization to better support patient education.
    Keywords:  Large language models; health communication; maintenance hemodialysis; readability; reliability
    DOI:  https://doi.org/10.1177/20552076261435836
  8. JMIR Cancer. 2026 Mar 26. 12 e84234
       Background: Melanoma, a highly aggressive form of skin cancer, is the second most common type of cancer for adolescent and young adult (AYA, ages 15-39 years) patients. AYA patients with melanoma may turn to internet sources, especially artificial intelligence (AI) chatbots, to manage uncertainty about prognosis and treatment.
    Objective: This study aims to evaluate the quality, empathy, and readability of responses generated by leading AI chatbots when addressing the top unmet needs of AYA patients with melanoma receiving treatment.
    Methods: Our research team recently surveyed 152 AYA patients with melanoma using the Needs Assessment Service Bridge, a validated instrument that assesses psychosocial needs for AYA patients with cancer. The survey identified the top 5 needs for advanced AYA patients with melanoma receiving treatment. Each need was reframed into a question and brief clinical history, then entered into each chatbot by 5 individuals who cleared their prequestion and postquestion history. Chatbot responses were evaluated to assess information quality (Global Quality Score [GQS] and DISCERN), accessibility and readability (GQS, Flesch Kincaid Grade Level, Flesch Reading Ease), and perceived empathy (Perceived Empathy of Technology Scale [PETS], including domains of Emotional Responsiveness [PETS-ER], Understanding and Trust [PETS-UT]).
    Results: Across 75 chatbot responses, ChatGPT achieved the highest average quality (mean GQS 4.42, SD 0.32; mean DISCERN 3.24, SD 0.31) and empathy (mean PETS-ER 5.35, SD 1.85; mean PETS-UT 6.36, SD 1.83), though with greater variability. Copilot produced the lowest quality and empathy scores, while Gemini responses were consistently midrange. PETS-UT exceeded PETS-ER across all models, suggesting stronger cognitive empathy than emotional responsiveness. Readability analysis showed outputs exceeded the average US reading level (mean Flesch Kincaid Grade Level 11.82, SD 1.44; mean FRE 38.60, SD 9.00), limiting accessibility. The most readable responses were found in question 2, which also scored higher in quality and empathy, whereas questions 4 and 5 produced the most complex, difficult-to-read responses corresponding with lower quality and empathy ratings.
    Conclusions: AI chatbots can provide moderately accurate and supportive responses to needs of AYA patients with melanoma, but outputs are inconsistent, written above the recommended reading level for health information, and limited in empathy. Question framing strongly influenced chatbot performance, with more emotional prompts drawing greater empathy, and readability aligning with both quality and empathy. Chatbot use in this population should remain adjunctive, with further research needed to standardize quality, improve readability, and enhance empathetic communication.
    Keywords:  adolescent; artificial intelligence; empathy; melanoma; natural language processing; readability; young adult
    DOI:  https://doi.org/10.2196/84234
  9. Cancers (Basel). 2026 Mar 11. pii: 906. [Epub ahead of print]18(6):
       BACKGROUND: Artificial intelligence chatbots are increasingly used by patients to obtain health information, including for prostate cancer. While these platforms offer accessible and conversational responses, concerns remain regarding the quality, usability, and clinical relevance of AI-generated content. This study comparatively evaluated patient-directed prostate cancer information generated by commonly used AI chatbots.
    METHODS: Standardised prostate cancer-related prompts were developed using Google Trends and authoritative healthcare resources. Identical queries were submitted to five publicly accessible AI chatbots: ChatGPT 5.2, Google Gemini, Claude AI, Microsoft Copilot, and Perplexity. Responses were independently assessed by two blinded reviewers using the DISCERN instrument for information quality and the Patient Education Materials Assessment Tool for printable materials (PEMAT-P) for understandability and actionability. Inter-rater reliability was assessed using intraclass correlation coefficients (ICCs). Readability was evaluated using the Flesch-Kincaid Reading Ease score. Descriptive statistics were used for comparative and pooled analyses.
    RESULTS: Overall information quality was moderate, with a pooled median (interquartile range [IQR]) DISCERN score of 56.5 (53.0-61.0). Higher mean DISCERN scores were observed for ChatGPT 5.2 and Microsoft Copilot, whereas lower scores were observed for Claude and Perplexity. PEMAT-P understandability was consistently high across platforms, with a pooled median (IQR) score of 91.7% (83.3-91.7%). In contrast, PEMAT-P actionability was uniformly poor, with a pooled median (IQR) score of 0% (0-0%). Readability analysis demonstrated moderate complexity, with a pooled median (IQR) Flesch-Kincaid Reading Ease score of 50.4 (49.2-52.5) and a median word count of 666 (657-1022). Inter-rater reliability was good for PEMAT understandability (ICC 0.841) and moderate for DISCERN (ICC 0.712).
    CONCLUSIONS: AI chatbots provide highly understandable but only moderately high-quality patient-directed prostate cancer information, with a consistent lack of actionable guidance. Although variation in content quality was observed across platforms, significant limitations remain in evidence transparency and practical patient support. Future development should prioritise integration of evidence-based resources and actionable decision-support tools to enhance the role of AI chatbots in prostate cancer education.
    Keywords:  artificial intelligence; chatbots; health information quality; patient education; prostate cancer
    DOI:  https://doi.org/10.3390/cancers18060906
  10. Health Informatics J. 2026 Jan-Mar;32(1):32(1): 14604582261428308
      ObjectivesThis study aimed to assess the readability of online information about semaglutide while also assessing understandability and quality.MethodsOzempic, Wegovy, and 'semaglutide' were individually searched. The non-sponsored results on the first five pages for each search were screened. The text from the included links were evaluated by two researchers for readability using SMOG and Flesch Reading Ease (FRE), for understandability using Patient Education Materials Assessment Tool (PEMAT) and for quality using DISCERN. A statistician ran reports for medians, interquartile ranges, and frequency statistics.Results61 links met evaluation criteria. Median scores for SMOG and FRE were 13th grade level and College. Fewer than 10% were at or below the recommended reading grade level. The median score of PEMAT was 62%. The median overall score of DISCERN was 4 out of 5.ConclusionsMost education available online about semaglutide medications is not written at the recommended reading level. Patient education on semaglutide needs to be rewritten to be at the recommended 8th grade reading level.
    Keywords:  health literacy; patient education; readability; semaglutide; understandability
    DOI:  https://doi.org/10.1177/14604582261428308
  11. Clin Spine Surg. 2026 Mar 03.
       STUDY DESIGN: Prospective survey study.
    SUMMARY OF BACKGROUND DATA: Patients frequently utilize Internet-based resources to seek information. Cervical laminoplasty is extensively marketed on the Internet, and patients may research their condition for the treatment of cervical spinal stenosis. Previous literature has recommended that the readability of patient education materials (PEM) should not exceed the 6th grade reading level to optimize health literacy.
    OBJECTIVE: This study aims to evaluate the readability of online PEM concerning cervical laminoplasty.
    METHODS: A Google search query was performed using the term "Cervical Laminoplasty patient information." The first 25 websites meeting study inclusion criteria were analyzed for readability using Flesch-Kincaid, average reading level consensus, Gunning Fog, Coleman-Liau, Simplified Measure of Gobbledygook (SMOG), and Linsear Write indices. Descriptive statistics were reported.
    RESULTS: The mean average reading level was 11.1 (1.96). The mean Flesch Kincaid Reading Ease score was 49 (12.6). The mean Gunning Fog Score was 12.2 (2.15), Flesch Kincaid grade level 10.6 (2.62), Coleman Liau SMOG 11.6 (1.92), Automated Readability Index 10.6 (3.13), Linsear Write 68.2 (9.2). One of the twenty-five PEMS included was evaluated to be below the recommended sixth-grade reading level. Five of the PEMs were considered general health information (GHI), and twenty were considered clinical practice (CP). No differences were found between CP and GHI websites (P>0.05).
    CONCLUSIONS: Creating appropriate PEM is integral to achieving optimal health literacy. The current readability of the most accessible PEMs related to cervical laminoplasty is inadequate. As it stands, many patients may not appropriately comprehend the description of their anticipated surgery.
    Keywords:  cervical laminoplasty; online health information; patient education material; readability
    DOI:  https://doi.org/10.1097/BSD.0000000000002045
  12. J Vasc Surg. 2026 Mar 20. pii: S0741-5214(26)00649-X. [Epub ahead of print]
       OBJECTIVE: To assess the readability, quality, transparency, and clinical content of online patient education materials related to major lower extremity amputation due to vascular disease.
    METHODS: We conducted a descriptive audit of online patient education materials on major LEA due to vascular disease. Using Google Chrome Incognito mode, we searched 8 search terms ("leg amputation", "below knee amputation", "above knee amputation", "amputation for poor circulation", "amputation for peripheral artery disease", "vascular surgery leg amputation", "diabetic leg amputation", "foot amputation surgery") and reviewed the first 30 results per term to identify relevant websites and patient education. Readability was assessed using 5 validated tools (Flesch-Kincaid, Gunning Fog, SMOG, Coleman-Liau, Automated Readability index). Quality was assessed using JAMA and DISCERN criteria. Content was analyzed for key clinical elements.
    RESULTS: Of 240 websites screened, 58 were included. Mean readability scores exceeded the recommended sixth-grade level: Flesch-Kincaid (10.47), Gunning fog (12.55), SMOG (12.95), Coleman Liau (10.83), Automated readability index (9.86). The average JAMA Benchmark score was 1.54/4, with 59% reporting currency, 36% disclosure, 33% reporting authorship, and 26% attribution. DISCERN scores averaged 46.8/80 ("fair" quality). Content coverage of key clinical concepts was low: 26% reported alternative procedures, 33% reported surgical risks, 41% operation description, 50% quality of life, 67% prosthetics/mobility aids, 74% post-op care, and 74% indications. DISCERN and content scores were moderately correlated (r=0.5), Flesch-Kincaid and SMOG were highly correlated (r=1.0). Quality and readability scores were uncorrelated, suggesting reliable sources may remain hard to understand.
    CONCLUSION: Despite the burden of major LEA, online patient education resources often have low readability and quality and lack critical content. There is an urgent need to develop high quality, accessible materials to support education and shared decision making.
    Keywords:  Lower extremity amputation; Patient education; online health information; peripheral arterial disease; website
    DOI:  https://doi.org/10.1016/j.jvs.2026.03.432
  13. Ophthalmic Epidemiol. 2026 Mar 25. 1-7
       PURPOSE: To compare the accuracy and readability of ChatGPT-4.0 Mini, Gemini 1.5 Flash and Microsoft Copilot generated responses to fifth year medical school ophthalmology exam questions.
    METHODS: A total of 442 multiple-choice questions were submitted individually to each chatbot. Responses were marked correct or incorrect based on an answer key. Readability was assessed using Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease (FRE) and Simple Measure of Gobbledygook (SMOG) indices.
    RESULTS: There was a statistically significant difference in the overall accuracy rates among the chatbots (p < 0.001). Copilot has the highest accuracy rate (89.4%), followed by ChatGPT (84.2%) and Gemini (76.7%). Readability indices also showed a significant difference (p < 0.001 for all). Copilot demonstrated the lowest linguistic complexity, with the lowest FKGL and SMOG scores and the highest FRE values. In contrast, ChatGPT generated responses with the highest linguistic complexity across all evaluated metrics. Intraclass correlation coefficients for readability metrics ranged between 0.68 and 0.83, with the highest agreement between ChatGPT and Gemini. Despite the statistical significance in readability differences, all responses generally required a high school to early college-level reading ability.
    CONCLUSION: Large language model-based chatbots demonstrated variable performance in answering ophthalmology questions at the medical school level. Among the models evaluated, Microsoft Copilot performed best in both accuracy and readability. These findings suggest that model choice may influence the usefulness of AI-generated content in educational settings.
    Keywords:  Artificial intelligence; ChatGPT-4.0 Mini; Gemini 1.5 Flash; Microsoft Copilot; chatbots; large language models; ophthalmic education
    DOI:  https://doi.org/10.1080/09286586.2026.2651197
  14. BMC Health Serv Res. 2026 Mar 27.
      
    Keywords:  BPPV; Internet search; Patient information; Quality; Readability
    DOI:  https://doi.org/10.1186/s12913-026-14441-1
  15. Plast Reconstr Surg Glob Open. 2026 Mar;14(3): e7542
       Background: Online resources about breast reconstruction are critical to guiding patients through decisions surrounding breast cancer treatment. Accordingly, the objective of this study was to assess the suitability of online materials for breast reconstruction.
    Methods: A comprehensive Google search for "breast reconstruction" was performed, and the resulting sites were evaluated. The readability of each website was evaluated using 6 different readability metrics, and the quality of the websites was assessed with the Journal of the American Medical Association (JAMA) benchmark criteria. The percentage of sites with visual patient representations was ascertained, and the Fitzpatrick scale was used to assess the diversity of these representations. One-way analysis of variance was used to analyze the readability metrics and JAMA benchmark criteria across the various website categories.
    Results: From the 78 individual websites identified, 50 were from academic hospitals, 11 were government/nonprofit organizations, 7 were third-party informational websites, and 10 were private hospital/physician group websites. Across all categories, online breast reconstruction materials were written at an average reading level greater than the 10th grade. Academic hospital center websites had the lowest average JAMA benchmark scores (1.20 ± 0.53, P < 0.0001). Fifty percent of the identified breast reconstruction educational sites included some type of illustrations/visualization, and only 18.0% included representations other than White.
    Conclusions: The current online patient education materials for breast reconstruction exceed the complexity appropriate for the dissemination of medical information to the general public. Concerted efforts should be made to enhance the readability, with special attention toward the incorporation of diverse illustrations/visualizations.
    DOI:  https://doi.org/10.1097/GOX.0000000000007542
  16. Br J Neurosurg. 2026 Apr;40(2): 321-330
       INTRODUCTION: Patients use online videos to learn about their condition and potential treatments. Operative techniques in Deep Brain Stimulation (DBS) vary significantly between institutions. This poses challenges to ensuring patients are adequately and accurately informed. We performed a comprehensive review of YouTube videos describing Deep Brain Stimulation.
    METHODS: Text searches for DBS-related search strings were performed on YouTube. The top 25 de-duplicated videos per search were included. Each video was assessed for differences in procedural technique, educational quality using the JAMA benchmark and DISCERN tools, and audio-visual or editing quality.
    RESULTS: We identified 91 DBS-related YouTube videos with 44% of videos uploaded by academic institutions and 15% by hospitals. Parkinson's disease was the most frequently described condition in 65% of videos. Variations in procedure impacting patient experience and expectations, were discussed in varying proportions: head shaving in 14.3% of videos, potential complications in 23.1%, number of stages in 33.0%, and awake vs asleep surgery in 46.2%. The JAMA benchmark criteria was fulfilled in 12% of videos and the median total DISCERN score was 46, an 'average' quality rating. High-quality images (N = 69, 75.8%), audio/music (N = 73, 80.2%), accessible language (N = 84, 92.3%), and professional production quality (N = 72, 79.1%) were present in most videos.
    DISCUSSION AND CONCLUSION: YouTube videos describing DBS are visually appealing but lack scientific quality and present potentially misleading content for future DBS recipients and caregivers. They should be viewed with caution as a source of medical communication or information for patients.
    Keywords:  Deep Brain Stimulation; Informed Consent; Neurosurgery; Patient Education; Webcasts
    DOI:  https://doi.org/10.1080/02688697.2025.2538488
  17. J Laparoendosc Adv Surg Tech A. 2026 Mar 28. 10926429261435688
       BACKGROUND: Video-based learning is a central tool in minimally invasive surgical training; however, the educational/reporting quality and reliability of online content must be evaluated using objective criteria. This study aimed to compare the educational quality of transanal total mesorectal excision (TaTME) videos published on YouTube and WebSurg and to test the validity of the TaTME-specific StepScore scale.
    METHODS: A cross-sectional content analysis was performed across platforms (total n = 30; YouTube = 15, WebSurg = 15). Videos were scored using Laparoscopic Surgery Video Educational Guidelines (LAP-VEGaS) (0-18), Journal of the American Medical Association (JAMA) (0-4), modified DISCERN (mDISCERN) (5-25), and total mesorectal excision (TME) StepScore (14 steps, 0-28). Correlations were examined using Spearman's ρ with false discovery rate adjustment. Logistic regression and ROC/AUC were used to predict LAP-VEGaS ≥ 11 adequacy; the optimal threshold was determined using the Youden index.
    RESULTS: Total scores and LAP-VEGaS ≥ 11 rates were similar across platforms (YouTube 66.7%; WebSurg 60.0%). StepScore showed a strong correlation with LAPVEGaS and mDISCERN and a moderate correlation with JAMA. Each +1 point increase in StepScore increased the odds of a LAP-VEGaS score ≥ 11. According to the Youden analysis, a StepScore ≥ 15 was found to be the best threshold.
    CONCLUSION: Popular TaTME videos on YouTube and WebSurg appear similar in terms of educational/reporting quality. The procedure-specific StepScore is consistent with general quality measures and can predict LAP-VEGaS ≥ 11 adequacy with a practical ≥15 point target. Using StepScore for video assessment and as a step-by-step instructional checklist may contribute to improving TaTME training standards.
    Keywords:  LAP-VEGaS; WebSurg; YouTube; surgical education; transanal total mesorectal excision; video-based learning
    DOI:  https://doi.org/10.1177/10926429261435688
  18. Digit Health. 2026 Jan-Dec;12:12 20552076261432019
       Aim: To systematically assess the quality, reliability, and content attributes of YouTube videos pertaining to transperineal prostate biopsy (TPB), an innovative method for diagnosing prostate cancer.
    Materials and methods: A systematic search was performed on YouTube on May 18, 2025, using the terms "transperineal prostate biopsy," "TP biopsy," and "perineal prostate needle biopsy." The search was conducted in incognito mode to reduce personalized bias. Videos in the English language uploaded from April 2019 to April 2025 were included. Videos that were duplicate, irrelevant, non-English, or lacking of sound were eliminated. Sixty movies, organized by relevance, were reviewed, resulting in 50 videos that satisfied the inclusion criteria for analysis. Two senior urologists independently evaluated each video using the Global Quality Scale (GQS) and the modified mDISCERN (mDISCERN) tool to measure reliability and quality. Video metadata, including time, views, likes, upload date, and creator category (academic professional, non-academic professional, patient, medical company, or others), was documented.
    Results: The average duration of videos was nearly 7 min, with an average of 8950 views per video. Healthcare workers created 60% of the videos, whereas 10% included advertising material. The predominant subjects addressed were procedural methodologies, anesthesia, and a comparative analysis of infection risk associated with the transrectal technique. The average GQS score was 2.96 ± 0.77, indicating moderate quality, whereas the average mDISCERN score was 2.1 ± 0.9, suggesting low to moderate reliability. Several videos referenced peer-reviewed literature, while numerous others displayed commercial bias. Inter-rater reliability was significant, with intraclass correlation coefficients of 0.893 for GQS and 0.861 for mDISCERN.
    Conclusion: YouTube videos related to TPB show significant range in quality and reliability. Despite being created by healthcare experts, the absence of peer-reviewed references and the prevalence of promotional bias reduce their educational value. There is an immediate necessity for standardized, evidence-based, and unbiased educational resources to enhance patient understanding and help with informed decision-making in prostate cancer diagnosis.
    Keywords:  Transperineal prostate biopsy; YouTube health videos; medical education; patient information; video quality assessment
    DOI:  https://doi.org/10.1177/20552076261432019
  19. Surg Innov. 2026 Mar 23. 15533506261438206
      BackgroundTube thoracostomy is a critical procedure commonly performed in emergency and trauma settings to manage life-threatening conditions such as pneumothorax and hemothorax. As digital platforms become increasingly integrated into procedural education, YouTube has become a widely used supplementary learning resource. However, the educational quality and reliability of its content remain variable. This study evaluates YouTube tube thoracostomy training videos on educational quality, reliability, and popularity.MethodsThis cross-sectional observational study evaluated YouTube videos related to tube thoracostomy using predefined search terms. After screening and applying inclusion and exclusion criteria, 79 videos were included in the analysis. Data collected included video duration, view count, like count, subscriber count, uploader type (institutional or individual), country of upload, and presence of spoken narration. Educational quality and reliability were evaluated using the Global Quality Scale (GQS), DISCERN, and Journal of the American Medical Association (JAMA) criteria. Video popularity was assessed using the adapted Video Power Index (VPI).ResultsOf the analyzed videos, 67.1% were uploaded by individuals and 32.9% by institutional sources. Videos featuring spoken narration, longer duration, and institutional origin demonstrated significantly higher GQS and DISCERN scores and greater popularity indicators (P < .05). Institutional videos exhibited higher educational quality than those uploaded by individuals. However, a direct correlation between video popularity and educational quality was not consistently observed.ConclusionThe educational quality of tube thoracostomy training videos on YouTube varies considerably. Institutional videos of longer duration with spoken narration provide greater educational value. These resources should serve only as supplementary tools and not replace hands-on training. Establishing quality standards for medical training videos and promoting institutional content production are recommended.
    Keywords:  chest tube insertion; surgical education; tube thoracostomy; video-based learning; youtube
    DOI:  https://doi.org/10.1177/15533506261438206
  20. Digit Health. 2026 Jan-Dec;12:12 20552076261434141
       Background: Attention-Deficit/Hyperactivity Disorder (ADHD) is a complex neurodevelopmental disorder requiring professional diagnosis. Recently, short-video platforms such as TikTok and Bilibili have seen a surge in ADHD-related content, driving a trend of self-diagnosis among the public, particularly young adults. The scientific quality and potential risks of this content have not been systematically evaluated. This study aimed to systematically evaluate the quality and reliability of ADHD content on TikTok and Bilibili, analyze its content characteristics, and specifically investigate the prevalence of content encouraging self-diagnosis and its association with user engagement.
    Methods: The top 100 videos from each platform were retrieved using the keywords "ADHD" and "." After a screening process, a total of 164 videos were included for analysis. Two senior clinical psychologists independently assessed the videos using the modified DISCERN (mDISCERN) tool and the Global Quality Score (GQS). Videos were classified by uploader type (e.g., healthcare professionals, patients/influencers) and content theme (e.g., symptom education, self-tests). A novel Self-Diagnosis Risk Scale (SDRS) was also applied. Nonparametric statistical methods were used for data analysis.
    Results: A total of 164 videos were analyzed (88 from TikTok, 76 from Bilibili). Significant platform differences emerged, with Bilibili videos demonstrating superior quality scores (GQS: 3.05 ± 0.91 vs. 2.45 ± 0.88; mDISCERN: 2.62 ± 0.85 vs. 1.88 ± 0.72; both p < 0.001) but TikTok videos showing higher self-diagnosis risk (SDRS: 1.71 ± 0.51 vs. 1.30 ± 0.69; p < 0.001). Healthcare professionals produced the highest quality content (GQS: 3.65 ± 0.68; mDISCERN: 3.15 ± 0.81) with lowest diagnostic risk (SDRS: 0.75 ± 0.49), while patients/influencers created content with the lowest quality and highest risk scores. Critically, a "quality-engagement paradox" was identified: videos with higher self-diagnosis risk received significantly more user engagement (likes: r = 0.45, p < 0.001; shares: r = 0.42, p < 0.001), while quality metrics showed no significant correlation with user engagement measures.
    Conclusions: This study reveals concerning patterns in ADHD-related content on major Chinese short-video platforms, where potentially harmful content encouraging self-diagnosis receives preferential algorithmic promotion over scientifically rigorous material. The inverse relationship between content quality and user engagement suggests current platform mechanisms may inadvertently amplify misleading health information while marginalizing evidence-based content. These findings underscore the urgent need for collaborative interventions involving platform operators, healthcare professionals, and public health educators to develop content guidelines, improve algorithmic curation of health information, and support healthcare professionals in creating engaging, evidence-based content. As social media platforms continue serving as primary health information sources, ensuring quality and safety of mental health content must become a priority for platform governance and public health policy.
    Keywords:  ADHD; Bilibili; TikTok; content analysis; health information quality; self-diagnosis; social media
    DOI:  https://doi.org/10.1177/20552076261434141
  21. Digit Health. 2026 Jan-Dec;12:12 20552076261420887
       Objective: Myocardial infarction (MI) is one of the leading causes of death and disability worldwide. Short-video platforms play an increasingly important role in disseminating health information; however, the quality and reliability of MI-related short videos remain unclear.
    Methods: Using "myocardial infarction" as the keyword, we analyzed 228 MI-related videos from TikTok and Bilibili. After extracting basic characteristics, we evaluated video quality, reliability, and transparency using the Global Quality Score (GQS), modified DISCERN (mDISCERN), JAMA benchmark, and the Video Information and Quality Index (VIQI). Nonparametric statistics were used for group comparisons, and Spearman's rank correlation was applied to assess associations between engagement metrics and quality scores.
    Results: Video content primarily addressed clinical presentation, etiology, and treatment, with relatively little on epidemiology and prevention. Topic distribution was as follows: clinical presentation (22.06%), etiology (22.71%), treatment (19.93%), diagnosis (16.01%), prevention (15.03%), and epidemiology (4.25%). Overall video quality was moderate: GQS 3.0 (IQR: 2-3), mDISCERN 2.0 (IQR: 2-3), JAMA 2.0 (IQR: 2-3), and VIQI 11.0 (IQR: 10-13). Videos uploaded by cardiologists received the highest quality scores (p < .05). No significant correlations were observed between engagement metrics and quality scores.
    Conclusions: MI-related short videos on TikTok and Bilibili demonstrate moderate overall quality with incomplete content coverage. Future efforts should encourage greater participation of cardiologists in health communication, enhance inclusion of epidemiology and prevention content, and support platforms in developing quality accreditation systems and optimizing recommendation algorithms to improve the scientific accuracy, transparency, and communicative value of health information.
    Keywords:  Bilibili; Myocardial infarction; TikTok; cross-sectional study; information reliability; public health; short-video platforms; video quality
    DOI:  https://doi.org/10.1177/20552076261420887
  22. Digit Health. 2026 Jan-Dec;12:12 20552076261433087
       Objective: To evaluate the quality, reliability, and user engagement of endometriosis-related videos on TikTok and Bilibili, identifying variations by platform, uploader type, and content category to inform digital health strategies.
    Methods: The top 100 videos per platform were retrieved using the Chinese keyword for "endometriosis." After excluding irrelevant or promotional content, 195 videos (99 TikTok, 96 Bilibili) were analyzed. Categorization included uploader type (professional individuals, nonprofessionals, institutions) and content (disease knowledge, treatment, Traditional Chinese Medicine, other). Quality was assessed via Global Quality Score (GQS), modified DISCERN (mDISCERN), JAMA benchmarks, and Video Information and Quality Index (VIQI). Engagement (likes, collections, comments, shares) and duration were recorded. Analyses used the Wilcoxon rank-sum, Kruskal-Wallis, Fisher's exact, and Spearman correlations.
    Results: Professionals uploaded 83.6% of videos; disease knowledge dominated (64.1%). Bilibili videos were longer (median 281.5 vs. 64.67 s; P < .0001) with higher GQS (3.29 vs. 3.04; P = .0123), mDISCERN (3 vs. 2; P < .0001), and JAMA (1 vs. 0; P < .0001). TikTok excelled in engagement (e.g., likes 355 vs. 18.5; P < .0001). Professional sources scored higher (P < .001-.003). Treatment content was most engaging but shorter (P < .001). Engagement correlated internally (P > .7) but weakly with quality (P < .3).
    Conclusions: Videos show moderate quality, with Bilibili emphasizing reliability and TikTok virality. Professional content is superior, but the popularity-quality disconnect highlights needs for verification and education to reduce misinformation.
    Keywords:  Bilibili; Endometriosis; TikTok; health information quality; short-video platforms
    DOI:  https://doi.org/10.1177/20552076261433087
  23. Digit Health. 2026 Jan-Dec;12:12 20552076261433814
       Background: Osteosarcoma is a rare and aggressive bone malignancy, yet public awareness remains insufficient. As social media platforms have become key sources of health information, this study aims to evaluate the quality and reliability of osteosarcoma-related videos on these platforms.
    Methods: This study collected 100 osteosarcoma-related videos from each platform, TikTok and Bilibili, based on their default ranking, resulting in 200 videos initially screened and 183 included after exclusions. Video characteristics were collected, including duration, likes, saves, comments, and shares. The Global Quality Scale (GQS) and Modified Discrimination Score (mDISCERN) were used to assess video quality and reliability. The completeness score (CS) was applied to evaluate five key aspects of the disease: etiology, clinical manifestations, diagnosis, treatment, and diagnosis. Finally, correlation analysis is carried out to explore the relationship among video features, audience engagement indicators, and video quality.
    Results: A total of 183 osteosarcoma-related videos were included. Clinical manifestations, treatment, and diagnosis were the most frequently addressed topics, whereas etiology and prognosis received comparatively less attention. TikTok videos had a median GQS of 2(Q1 = 2.00, Q3 = 3.00), a median mDISCERN of 3 (Q1 = 2.00, Q3 = 4.00), and a median CS of 4 (Q1 = 2.00, Q3 = 4.00). In contrast, Bilibili videos demonstrated higher quality, with a median GQS of 3 (Q2 = 2.00, Q3 = 3.00), a median mDISCERN of 3 (Q1 = 3.00, Q3 = 4.00), and a median CS of 4 (Q1 = 3.00, Q3 = 6.00). Videos produced by healthcare professionals achieved significantly higher scores compared to those uploaded by non-professionals (p < 0.01). Spearman correlation analysis revealed no significant association between video features and quality scores.
    Conclusion: In conclusion, the overall quality and reliability of osteosarcoma-related videos on short video platforms were low. Videos uploaded by healthcare professionals and those on the Bilibili platform demonstrated relatively higher quality. These findings highlight the necessity of strengthening the regulation of health-related content on short video platforms and promoting greater involvement of healthcare professionals.
    Keywords:  Bilibili; Osteosarcoma; TikTok; health information quality; short video platforms
    DOI:  https://doi.org/10.1177/20552076261433814
  24. Front Public Health. 2026 ;14 1783552
       Objective: To systematically compare the quality of educational videos about anxiety and depression among university students on YouTube and Bilibili, and to provide evidence-based guidance for cross-cultural digital mental-health education.
    Methods: Before 20 November 2025, we searched YouTube and Bilibili with English and Chinese keywords and collected the first 100 videos returned by default ranking on each platform. After applying inclusion and exclusion criteria, the remaining videos were evaluated by a third assessor in a double-blind manner using the Video Information and Quality Index (VIQI), the Global Quality Score(GQS)and the modified DISCERN (mDISCERN) scales to assess scientific accuracy, safety and educational value. Platform differences were analyzed with non-parametric tests and correlation analyses.
    Results: The final sample comprised 80 YouTube and 77 Bilibili videos. Median views, likes, and comments were markedly higher on Bilibili (p < 0.05). Verified accounts supplied 43.75% of YouTube content but only 28.57% of Bilibili content; licensed mental-health professionals appeared in fewer than 6% of videos on either platform. YouTube favoured television-style or documentary formats, whereas Bilibili relied heavily on single-speaker narratives and animations. YouTube outperformed Bilibili on overall VIQI, GQS, and mDISCERN scores (p < 0.01). On Bilibili, high user engagement correlated moderately to strongly with quality, yet absolute quality scores remained low.
    Conclusion: Platform architecture, not popularity, drives content quality. YouTube's longer, institution-produced videos set the benchmark, whereas Bilibili trades scientific rigor for real-time chat and high engagement. Both sites remain short of licensed professionals. To prevent digital platforms from amplifying student anxiety, we recommend (a) embedding a quality-weighted algorithmic boost and (b) a sustained "verified expert + student co-creation" pipeline that disseminates evidence-based content at scale.
    Keywords:  Bilibili; YouTube; anxiety; cross-cultural; depression; online platforms; quality assessment; university students
    DOI:  https://doi.org/10.3389/fpubh.2026.1783552
  25. Sci Rep. 2026 Mar 21.
      
    Keywords:  Depression; Douyin; Patient education; Short videos; Social media; TikTok; Video quality
    DOI:  https://doi.org/10.1038/s41598-026-45237-2
  26. J Multidiscip Healthc. 2026 ;19 581865
       Background: Deep venous thrombosis (DVT) is a prevalent, life-threatening condition with inadequate public awareness. Social media platforms are significant sources of health information. However, they often suffer from variable content quality. This study systematically evaluated DVT-related videos on TikTok and Bilibili, which represent the dominant short-video and medium-to-long video platforms in China.
    Methods: The top 150 DVT-related videos from each platform, TikTok and Bilibili, sorted by default, were retrieved and screened. The basic characteristics of the included videos, as well as the user engagement metrics, were recorded. The quality of the included videos was assessed using the Global Quality Scale (GQS), the modified DISCERN (mDISCERN) score, and the Journal of the American Medical Association (JAMA) benchmark criteria.
    Results: TikTok videos had higher engagement (all P < 0.001) and shorter duration (median 88s vs 233.5s, P < 0.001). Quality-wise, except for a good GQS rating for Bilibili, quality scores across both platforms were generally moderate or poor. Specifically, Bilibili scored higher on GQS (median 4 vs 3, P < 0.001), while TikTok performed better on mDISCERN and JAMA (both P < 0.001). Furthermore, user engagement was negatively correlated with GQS scores but positively correlated with both mDISCERN and JAMA scores, revealing a potential mismatch between popularity and professional quality.
    Conclusion: Despite high user engagement, DVT videos on both platforms demonstrate deficient informational quality and reliability, underscoring a significant gap in accessible public health education. Therefore, multi-stakeholder collaboration is imperative to enhance content standards and facilitate effective health dissemination.
    Keywords:  deep venous thrombosis; public health education; quality; reliability; social media
    DOI:  https://doi.org/10.2147/JMDH.S581865
  27. Digit Health. 2026 Jan-Dec;12:12 20552076261431847
       Objective: Understanding the factors that shape public support for tobacco control policies is essential for effective legislation. This study aims to examine how online and offline health information seeking behaviors (HISB) among chronic disease patients influence their support for tobacco control policies.
    Methods: Using data from a national survey in China (N = 745), this study developed and empirically tested a parallel mediation model examining the direct associations between online and offline HISB and support for tobacco control policies, as well as the indirect paths through perceived social disapprove of smoking and negative smoking outcome expectancies.
    Results: Results indicated that online HISB was positively associated with support for tobacco control policies, both directly and indirectly through increased perceived social disapproval of smoking and negative smoking outcome expectancies. In contrast, offline HISB showed no direct association with policy support and exhibited an indirect negative pathway through reduced negative smoking outcome expectancies.
    Conclusions: Findings highlight the positive role of online HISB, and the potential negative role of offline HISB, in shaping support for tobacco control policies. We therefore recommend promoting online channels for health information seeking, especially among adults aged 40 to 70 and those from lower socioeconomic backgrounds, who rely more on offline media and face higher chronic disease risk. Online campaigns should emphasize the social unacceptability of smoking and negative smoking outcome expectancies. In parallel, stricter regulation of pro-tobacco content in offline media, especially subtle promotional exposure, is needed, along with increased frequency and depth of antitobacco coverage in traditional media.
    Keywords:  Health communication; information seeking behavior; outcome expectations; social norms; tobacco control
    DOI:  https://doi.org/10.1177/20552076261431847