bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–05–17
39 papers selected by
Thomas Krichel, Open Library Society



  1. J Can Health Libr Assoc. 2025 Dec;46(3): 87-103
       Introduction: The initial introduction of LibGuides created a seismic shift in bioinformatic information dissemination. This study reviewed bioinformatics LibGuides from Canadian universities to establish a consistent overlap of frequently recommended materials that would spell out a canon of bioinformatics resources.
    Methods: We undertook a manual review of the 11 bioinformatics-specific LibGuides created by Canadian university libraries.
    Results: We found four (n=4) LibGuides focused solely on bioinformatics and seven (n=7) guides with subsections dedicated to bioinformatics. Overall, there were 566 resources distributed across 11 LibGuides, with little overlap across guides. Most (n=440) were distinct resources. The most common resource type was database and the most frequently appearing resources (n=5) were BLAST Database, Protein Data Bank, and PubMed.
    Discussion: We identified clear variations in the intended audience for these subject guides, as well as commonalities across all 11. The diversity of listed resources reflects the diversity and interdisciplinary nature of bioinformatics. Despite this variety of resources, we observed a uniformity in those resources' funding or hosting, as they are often American in origin. This now represents a concern for librarians and other information professionals when it comes to using and teaching bioinformatics resources.
    DOI:  https://doi.org/10.29173/jchla29878
  2. Res Synth Methods. 2026 May 12. 1-13
      To explore the perceptions of, and barriers to, grey literature searching among medical researchers and journal editors. A cross-sectional survey of authors of systematic reviews and the editors of the journals in which the reviews were published. Systematic reviews indexed in MEDLINE, spanning a 4-week period in 2019. We excluded protocols. We asked whether the reviewers were performing a grey literature search. If they were, we asked about their approach to the grey literature search and relevant guidance. If they were not, we asked about their rationale for this. We elucidated understandings of grey literature from all reviewers. The survey to journal editors asked about their perceptions towards grey literature. A consecutive sample of 1,229 systematic reviews was included. A total of 155 authors responded, and a total of 46 journal editors responded. The majority (57%) of reviewers reported performing a grey literature search. However, there was no consensus on types of grey literature items or sources among the reviewers. The most frequent barrier to grey literature searches was concern about the quality/detail of the grey literature items. Editors expressed negative perceptions towards grey literature, rooted in suspicion of its quality. While many reviewers reported performing a grey literature search, there was a diverse understanding of the term grey literature. Concerns about the quality of grey literature exist among reviewers and editors alike. There is a marked discrepancy between the best-practice guidelines and the gatekeepers of journals.
    Keywords:  grey literature; review methodology; search strategy
    DOI:  https://doi.org/10.1017/rsm.2026.10087
  3. Behav Genet. 2026 May 15.
      Preprint servers and open science platforms have revolutionized the scientific process. A fundamental feature of these platforms is a lack of peer review-virtually anyone with an internet connection can upload their research in a few clicks. Although this setup has facilitated rapid dissemination of results and open access to research, it has also enabled fringe researchers to post and share pseudoscientific, genetically informed studies of differences in behavior that often advance racial hereditarian and eugenic claims. Because preprint archives are now routinely used by mainstream academics, preprints grant a degree of legitimacy to fringe research that otherwise may have been relegated to a blog post or fringe publication. Previous studies have documented individual examples of pseudoscientific, genetic studies of group differences being posted on preprint archives, but the scope of this problem remains unclear, making it difficult to formulate responses and potential solutions. The present study quantified and characterized pseudoscientific studies of group differences in behavior-including studies that used genetic methods-housed on popular preprint servers and open science collaboration platforms. Dozens of such preprints were identified. Preprinted studies on group differences often analyzed controversial phenotypes, most frequently intelligence and related traits, and furthered classical, widely rejected hereditarian and eugenic theories. Genetically informed analyses rested on fundamentally flawed assumptions about heritability and polygenic scores. The Preprint Problem is indicative of a broader effort to weaponize mainstream academic research and its mechanisms, including Open Science, and a recent resurgence of scientific racism and eugenics. Potential responses to these challenges are introduced.
    Keywords:  Eugenics; Open science; Racial hereditarianism; Scientific racism; Weaponization
    DOI:  https://doi.org/10.1007/s10519-026-10260-6
  4. J Can Health Libr Assoc. 2025 Dec;46(3): 104-114
       Objective: Funding bodies such as Canada's Tri-Agency have implemented requirements for grant recipients to encourage improved research data management (RDM) practices and data sharing. Consequently, RDM and data sharing have become a higher priority for researchers and stakeholders supporting the research process, including librarians. Health sciences research can present special challenges to those wishing to share and use research data, as access to sensitive data must be restricted. This study examines the data sharing practices of researchers funded by the Canadian Institutes of Health Research (CIHR) in recent years.
    Methods: We ran a search of PubMed Central to identify papers funded by CIHR that were published between 2020 and 2023 and had associated data. From the resulting records, we drew a sample of 368 articles. Using Qualtrics for each article, we recorded if and how data was shared and what types of documentation were provided alongside the data. Results were exported to and analyzed using Microsoft Excel.
    Results: We found that 69% of papers included a data availability statement. 34% of articles made at least some data readily accessible, while 31% indicated that some data was available via request or application. Only 9% of articles supplied the kinds of documentation that would support reuse of the data.
    Conclusion: Those seeking to reuse Canadian health sciences research data continue to face significant hurdles. We offer ideas for health sciences librarians looking to support researchers in their efforts to make data available and usable while respecting restrictions required due to ethical considerations.
    DOI:  https://doi.org/10.29173/jchla29830
  5. Patient Educ Couns. 2026 May 02. pii: S0738-3991(26)00206-5. [Epub ahead of print]149 109673
       OBJECTIVE: Despite extensive guidelines, the readability of health information remains poor. This study addresses a persistent explanatory gap by shifting the focus from recipient-side limitations to producer-side cognitive frameworks, specifically the "curse of knowledge."
    METHODS: Using a critical interpretive synthesis and inference to the best explanation, we reexamined recurring patterns of low readability identified in the existing literature (e.g., use of jargon, abstract statistics, and insufficient procedural context). We then comparatively assessed the explanatory scope of the curse of knowledge relative to traditional frameworks such as health literacy and the information deficit model.
    RESULTS: Existing frameworks explain why patients struggle to understand health information but provide limited insight into why experts consistently produce complex and inaccessible content. Our analysis suggests that the curse of knowledge-experts' structurally constrained inability to simulate a pre-knowledge state-operates as a central, generative mechanism alongside institutional and normative factors underlying this persistence. This manifests in the misattribution of obviousness to specialist terminology and concepts and the invisibility of tacit knowledge in procedural guidance. We also identify a paradox whereby experts, motivated by professional responsibility for accuracy, accumulate information and inadvertently increase recipients' cognitive load.
    CONCLUSION: Persistent limited readability of health information is not merely a technical or stylistic failure but a structural phenomenon rooted in the cognitive constraints of expertise. Therefore, readability challenges should be repositioned from individual efforts to the design of organizational information production systems.
    PRACTICE IMPLICATIONS: Improving readability requires a systemic redesign rather than isolated linguistic adjustments. Key strategies include: (1) institutionalizing patient and public involvement (PPI) as an external cognitive audit that surfaces experts' implicit assumptions, (2) implementing standardized templates for numerical and procedural information to reduce reliance on individual expert judgment, and (3) leveraging AI as a cognitive bridge to retranslate specialized knowledge into lay-accessible frameworks.
    Keywords:  Curse of knowledge; Health literacy; Information deficit model; Organizational health literacy; Patient and public involvement; Readability
    DOI:  https://doi.org/10.1016/j.pec.2026.109673
  6. Ann Afr Med. 2026 May 14.
       INTRODUCTION: Multiple sclerosis (MS) is a complex neurological disorder requiring effective patient education. With the increasing use of artificial intelligence (AI) in healthcare, evaluating the readability of AI-generated content compared to evidence-based resources is essential to ensure patient accessibility.
    AIMS: This study aims to compare the readability of patient education guides on the diagnosis and management of MS generated by the AI tool Google Gemini with a standard clinical reference, UpToDate.
    METHODOLOGY: Content was analyzed using the metrics including word count, sentence count, difficult word count/percentage, flesch reading ease (FRE), Flesch-Kincaid Grade Level (FKGL), and simple measure of gobbledygook (SMOG) index. The Wilcoxon signed-rank test was used for statistical comparison.
    RESULTS: Statistically significant differences (P < 0.05) were found for word count, sentence count, and difficult word count. UpToDate had a significantly higher median word count (6332.5 vs. 800.5) and sentence count (199.5 vs. 44.0) than Google Gemini, indicating it was more verbose and complex. Google Gemini produced content that was relatively easier to read based on FRE, FKGL, and SMOG Index, although these differences were not statistically significant (P > 0.05). Importantly, neither source met the recommended 6th to 8th-grade readability levels for patient health materials.
    CONCLUSIONS: The overall readability scores (FRE, FKGL, and SMOG) were similar, and neither platform delivers information at the comprehension level recommended for the average patient. AI tools such as Google Gemini may serve as a useful adjunct for brief, patient-oriented information, but further refinement is needed to improve accessibility.
    Keywords:  Artificial intelligence; Google Gemini; Intelligence artificielle; UpToDate; google gemini; lisibilité; multiple sclerosis; readability; sclérose en plaques; uptodate
    DOI:  https://doi.org/10.4103/aam.aam_775_25
  7. Vavilovskii Zhurnal Genet Selektsii. 2026 Apr;30(2): 293-298
      The imperative to re-analyze existing public sequencing data is central to modern biology, driven by new hypotheses and advanced analytical methods. However, this effort is critically hampered by the profound heterogeneity of repository data, particularly the non-standardized, free-text descriptions of biological experiments. This lack of structural and semantic homogeneity prevents systematic search, integration, and comparative analysis, effectively locking away the full potential of accumulated datasets. Advances in Natural Language Processing (NLP) offer a pivotal pathway to overcome this bottleneck by transforming unstructured text into computable, homogeneous information. The integrated Entrez database system, maintained by the National Center for Biotechnology Information (NCBI), provides sophisticated programmatic access via an API to primary sequencing data and its associated metadata, including detailed experimental descriptions. This interface enables researchers to identify and retrieve relevant data through keyword searches, including those based on gene names, and to apply modern NLP techniques to transform textual metadata into structured information. The output is formatted data ready for integration into local databases, accompanied by a systematic list of links for downloading primary files. The Alembic software package offers a comprehensive and automated solution for the entire workflow. Designed as a locally deployable client-server system, Alembic incorporates state-of-the-art transformer-based AI algorithms for analyzing the biomedical text that accompanies sequencing data. Its core utilizes the openly available AIONER platform, which is built upon the PubMedBERT model trained on the PubMed repository, to ensure efficient and accurate recognition of biomedical named entities (e. g., genes, diseases). This provides users with structured and meaningful keyword search results. By delivering a curated list of datasets, Alembic streamlines the path from search to analysis. Researchers can efficiently identify high-value targets and obtain a complete package of metadata and primary data to construct a tailored local repository. This positions Alembic as a universal solution that overcomes the fragmented approach of existing tools, offering an integrated workflow for diverse public sequencing data.
    Keywords:  biomedical text mining; data harmonization; natural language processing ; omics data integration; semantic annotation
    DOI:  https://doi.org/10.18699/vjgb-26-33
  8. Int J Med Inform. 2026 May 06. pii: S1386-5056(26)00203-0. [Epub ahead of print]216 106463
       BACKGROUND: Systematic Literature Reviews (SLRs) are essential in biomedical research, particularly for informing public health policy and clinical decision-making. However, the manual generation of Boolean queries for literature searches is resource-intensive, prone to errors, and difficult to scale. Recent advances in large language models (LLMs) have demonstrated potential, yet most existing approaches rely on zero-shot prompting of commercial models, overlooking the cost-efficiency and domain adaptability of fine-tuned open-source alternatives.
    METHODS: This study proposes a novel, three-stage framework that employs medium-sized, open-source generative models, specifically BioGPT and BioT5, for automated Boolean query generation over PubMed. We develop and release datasets comprising PubMed article titles, MeSH terms, and keywords, and fine-tune the models using both title-only and title-plus-metadata prompts. We evaluate performance on two benchmark datasets: CLEF TAR and FASS-BSLR. Our experiments include comparisons with state-of-the-art baselines, prompt-based large language models, and ablation studies exploring the effects of training data size, metadata inclusion, and post-processing with PubMed's Automatic Term Mapping.
    RESULTS: Fine-tuned BioGPT outperforms both traditional TAR models and commercial LLMs across key retrieval metrics. On the CLEF TAR dataset, it achieves a Precision of 0.2544, F1 of 0.2392, MAP@1000 of 0.1424, and NDCG@1000 of 0.2490, which surpasses all baselines. On the FASS dataset, it reaches a Recall of 0.1801 and NDCG@1000 of 0.0900, again outperforming all competing models. While slightly behind BioGPT, BioT5 still outperforms most baselines. Notably, BioGPT's Recall of 0.1801 on FASS is more than twice that of PubMed-Title and PubMed-Keyword, and exceeds GPT-3.5 Turbo, GPT-4, Gemini-2, and Llama-3.
    CONCLUSION: This work demonstrates that fine-tuned, open-source, medium-sized generative models can match or exceed the performance of much larger commercial LLMs in Boolean query generation for biomedical SLRs. These models offer a cost-effective, privacy-preserving, and scalable alternative for structured retrieval of biomedical scholarly texts.
    Keywords:  Automated boolean query generation; Biomedical systematic literature reviews; Clinical decision-making; Medium-sized open-source generative models
    DOI:  https://doi.org/10.1016/j.ijmedinf.2026.106463
  9. STAR Protoc. 2026 May 13. pii: S2666-1667(26)00186-3. [Epub ahead of print]7(2): 104533
      We present a protocol to evaluate ChatGPT's ability to generate disease-centric biomedical associations. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature. The protocol includes a self-consistency strategy to assess generative reliability across ChatGPT models. To address ontology exact-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval-Augmented Generation (RAG) powered by open-source large language models (LLMs). This enables LLMs to establish truth over content generated by other LLMs and expose hallucination.
    Keywords:  Computer sciences; Health sciences; genetics
    DOI:  https://doi.org/10.1016/j.xpro.2026.104533
  10. Front Psychiatry. 2026 ;17 1782288
       Background: Pediatric depression shows age-specific symptoms that hinder recognition and delay care, while parents and adolescents increasingly turn to online sources, including large language models, for mental health information and guidance. The quality of such information depends on readability, factual accuracy, completeness, and emotional tone. This study compared responses from 3 contemporary large language models (LLMs) to frequently asked questions about pediatric depression to assess their suitability as informational tools.
    Methods: A cross-sectional analytical study design was used. 15 standardized frequently asked questions covering definition, causes, clinical features, diagnosis, prevention, treatment, and prognosis of pediatric depression were submitted to ChatGPT-5, Microsoft Copilot GPT-5 in Smart Research mode, and DeepSeek 3.1V. Responses were collected verbatim. Readability was assessed using seven established indices. Accuracy and completeness were independently scored on a 0 to 6 scale using a predefined rubric. Sentiment was measured with sentiment scores. One-way analysis of variance (ANOVA) with Tukey post hoc statistical analysis was performed.
    Results: Readability was different among the various models. DeepSeek 3.1V achieved the highest Flesch Reading Ease Score of 54 to 55 and the lowest Flesch-Kincaid Grade Level of about 9.5 thus indicating easier comprehension. ChatGPT-5 showed intermediate readability with scores of 49 to 50 and grade level about 10.5. Copilot-5 had the lowest Reading Ease score of 43 to 44 and the highest grade level near 10.8. Accuracy on a 0 to 6 scale was highest for Copilot-5. ChatGPT-5 showed the greatest completeness, whereas other models had variable coverage in detailed clinical items.
    Conclusion: Large language models (LLMs) provide information on pediatric depression but show varying levels of readability, accuracy, and completeness. DeepSeek 3.1V provides greater linguistic accessibility, Microsoft Copilot GPT-5 shows stronger factual consistency, and ChatGPT-5 provides more comprehensive coverage. These artificial intelligence (AI) chatbot systems require human understanding before use in pediatric mental health education or guidance.
    Keywords:  Adolescent depression; ChatGPT; Copilot GPT; DeepSeek; emotional tone analysis; large language models; readability assessment; sentiment analysis
    DOI:  https://doi.org/10.3389/fpsyt.2026.1782288
  11. Neurogastroenterol Motil. 2026 May;38(5): e70344
       BACKGROUND: Patients increasingly consult artificial intelligence (AI) chatbots for health information, yet the reliability and accessibility of AI-generated content for complex conditions like dysphagia remain unvalidated. We conducted the first head-to-head comparison of leading large language models for dysphagia patient education, evaluated by an international expert panel.
    METHODS: Forty-six validated questions across four clinical domains were submitted to ChatGPT-4.0 (OpenAI) and Claude 3.7 (Anthropic) in March 2025. Ten blinded experts from six countries rated responses for scientific accuracy (5-point Likert), clarity (5-point Likert), and misinformation (binary). Readability was assessed using Flesch Reading Ease, Flesch-Kincaid Grade Level, and SMOG Index. Between-model comparisons used Wilcoxon signed-rank tests with Cohen's d effect sizes.
    KEY RESULTS: No significant differences emerged for scientific accuracy (ChatGPT: 3.87 ± 0.36 vs. Claude: 3.93 ± 0.35; p = 0.26; d = 0.16), clarity (4.12 ± 0.34 vs. 4.15 ± 0.27; p = 0.67; d = 0.11), or mean misinformation rates (both 2.15; p = 0.96). Strong inter-model correlation existed for accuracy (rs = 0.678; p < 0.001). Critically, both models produced content far exceeding recommended readability levels: SMOG indices of 14.95 ± 2.40 years (ChatGPT) and 17.37 ± 2.67 years (Claude) required extensive education (p < 0.001; d = 0.95) versus the recommended 6-7 years. Categorical analysis showed Claude generated three times more misinformation-free responses (19.6% vs. 6.5%; p = 0.077).
    CONCLUSIONS AND INFERENCES: Leading AI chatbots demonstrate equivalent, acceptable accuracy for dysphagia information but produce content inaccessible to most patients due to excessive complexity. The strong inter-model correlation suggests shared limitations in medical training data. Before clinical implementation, AI-generated patient education requires mandatory readability optimization to address the substantial health literacy gap identified in this study.
    Keywords:  artificial intelligence; chatbots; deglutition disorders; health literacy; patient education as topic; readability
    DOI:  https://doi.org/10.1111/nmo.70344
  12. Digit Health. 2026 Jan-Dec;12:12 20552076261450741
       Objective: Artificial intelligence (AI) chatbots are increasingly used by patients seeking medical information. However, the accuracy and educational quality of such tools in the context of anesthesia remain unclear. This study aimed to evaluate and compare the appropriateness of responses generated by three widely accessible AI platforms-ChatGPT, Gemini, and Copilot-regarding frequently asked questions about general anesthesia.
    Methods: Fifty anesthesia-related questions were developed by two anesthesiologists and categorized into four domains: General Information and Process, Safety and Risks, Pain, Comfort, and Recovery, and Preoperative Preparation. Each question was entered in English into the free, publicly available versions of ChatGPT, Gemini, and Copilot. Ten blinded anesthesiologists rated the responses using a 5-point Likert scale (1 = very inappropriate to 5 = very appropriate). Mean scores were compared using one-way ANOVA with Tukey's post-hoc tests, and inter-rater reliability was assessed using Cronbach's α.
    Results: ChatGPT achieved the highest overall mean score (4.68 ± 0.50), followed by Gemini (4.22 ± 0.63) and Copilot (3.28 ± 0.50), with significant differences among all platforms (p < 0.001). ChatGPT consistently outperformed the others across all four domains. Qualitative observations from evaluator comments suggested that ChatGPT's concise summaries improved readability, Gemini provided more structured responses with more scholarly-style references, and Copilot was clear but often less detailed. Inter-rater reliability was high (Cronbach's α = 0.89).
    Conclusion: Among free-access AI chatbots, ChatGPT provided the most accurate and comprehensive explanations regarding general anesthesia. While Gemini and Copilot offered partial value, professional oversight remains essential to ensure safe and contextually accurate patient education in preoperative care.
    Keywords:  AI chatbot; anesthesia education; artificial intelligence; chatgpt; copilot; gemini; patient information; preoperative counseling
    DOI:  https://doi.org/10.1177/20552076261450741
  13. Front Oral Health. 2026 ;7 1813936
       Background: Large Language Models (LLMs) are increasingly used by caregivers to obtain pediatric health information. However, concerns persist regarding the accuracy, reliability, and readability of AI-generated content, especially in pediatric dentistry, where caregiver comprehension is crucial.
    Objective: To conduct an exploratory feasibility assessment of evaluating accuracy, quality, reliability, and readability of responses generated by ChatGPT-4, Google Gemini, and DeepSeek to common pediatric dentistry queries.
    Methods: This exploratory comparative cross-sectional feasibility study utilized 15 patient-oriented pediatric dentistry questions identified through structured searches and expert screening. Each question was submitted verbatim to ChatGPT-4, Gemini, and DeepSeek under standardized conditions. Responses were independently evaluated by three calibrated pediatric dentistry experts using the Global Quality Scale (GQS), a modified DISCERN tool, and the Accuracy of Information Index (AOI). Readability was assessed using the Flesch Reading Ease Score (FRES) and the Flesch-Kincaid Grade Level (FKGL). Inter-examiner reliability was assessed using intraclass correlation coefficients (ICC). Statistical comparisons between LLMs were performed using a fixed-effects model with post-hoc pairwise analysis. Inter-examiner agreement was further evaluated using Bland-Altman analysis. A p-value of <0.05 was considered statistically significant.
    Results: Overall scoring was consistent across examiners, with minor variability observed across domains. A linear mixed-effects model conducted separately for each domain demonstrated that LLM type significantly influenced GQS scores (F = 7.90, p = 0.00), with Gemini and DeepSeek outperforming ChatGPT. No significant differences were observed for AOI (p = 0.44) and DISCERN (p = 0.06). Bland-Altman analysis indicated minimal inter-examiner bias; however, the limits of agreement were relatively wide considering the scale range, reflecting variability between individual ratings. Single-measure ICC demonstrated poor agreement (ICC = 0.26), while higher reliability observed when scores were averaged (ICC = 0.90).
    Conclusion: This study offers an exploratory feasibility assessment of LLM evaluation in pediatric dentistry. While the models generally produced high-quality outputs, variations in accuracy, readability, and significant inter-examiner variability highlight important methodological challenges. These findings represent preliminary groundwork and require validation in larger, clinically diverse, real-world settings. LLMs may serve as supportive informational tools; however, their outputs should be interpreted cautiously and used to complement, not replace professional clinical judgment.
    Keywords:  ChatGPT; accuracy; artificial intelligence; assessment; health information; large language models; pediatric dentistry; quality
    DOI:  https://doi.org/10.3389/froh.2026.1813936
  14. J Paediatr Child Health. 2026 May 12.
       AIM: Families increasingly turn to artificial intelligence (AI) tools for information about autism spectrum disorder (ASD), often during early childhood when concerns about development, diagnosis and intervention first emerge. From a paediatric perspective, these tools increasingly function as informal sources of health information alongside primary care and developmental services. This study aimed to evaluate the quality of autism-related information generated by widely used AI platforms.
    METHODS: Using a descriptive research design, responses generated by six freely accessible AI platforms (ChatGPT, Gemini, Microsoft Copilot, Perplexity, Brave and Grok) were examined. Each platform was asked 15 autism questions with well-established scientific answers. Responses were evaluated across five dimensions relevant to paediatric health communication: accuracy, readability, language framing, actionability and reference quality.
    RESULTS: Substantial variability was observed across AI platforms. Several tools produced generally accurate explanations of ASD; however, readability levels consistently exceeded recommended guidelines for paediatric health materials (i.e., the 6th-8th grade reading level). Most responses relied primarily on medicalized language rather than neurodiversity-affirming framing. Actionable guidance was limited, with only a minority of responses offering concrete next steps for families navigating early paediatric decision-making. Reference practices varied widely, with some platforms providing numerous credible sources and others offering few or none.
    CONCLUSIONS: Although AI tools can support parental understanding of ASD, differences in clarity, tone, usability and transparency may shape families' expectations prior to or during paediatric consultations. These findings highlight the need for thoughtful use of AI-generated autism information and suggest that families may benefit from guidance from paediatric professionals when interpreting AI-based responses.
    Keywords:  ChatGPT; Gemini; Grok; artificial intelligence; autism; brave; copilot; perplexity
    DOI:  https://doi.org/10.1111/jpc.70433
  15. J ISAKOS. 2026 May 07. pii: S2059-7754(26)00068-4. [Epub ahead of print] 101132
       INTRODUCTION/OBJECTIVES: Large language models (LLMs) such as ChatGPT are increasingly used to generate patient education materials; however, default ChatGPT responses often exceed recommended readability levels set by the American Medical Association (AMA) and National Institutes of Health (NIH) health-literacy recommendations. The purpose of this study was to evaluate the readability and educational quality of ChatGPT-generated patient education on meniscal surgery and to determine whether a standardized plain-language prompt could improve readability without compromising accuracy, relevance, or depth.
    METHODS: Sixteen standardized patient-focused questions regarding diagnosis, management, and prevention of meniscus tears were submitted to ChatGPT-5 and ChatGPT-4o, with three replicates per question to ensure standardization. Responses were assessed for accuracy against the American Academy of Orthopaedic Surgeons (AAOS) OrthoInfo and scored for relevance and depth using 5-point Likert scales. Readability was assessed using Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES). All baseline responses were subsequently rewritten using a plain-language prompt targeting a sixth to eighth grade reading level. Pre- and post-prompt readability metrics were compared using paired t-tests. Inter-rater reliability was measured with Cohen's kappa.
    RESULTS: Both models demonstrated 100% factual accuracy across all baseline responses compared with OrthoInfo. Mean relevance and depth scores were high for ChatGPT-5 (4.49 ± 0.22; 4.39 ± 0.26) and ChatGPT-4o (4.55 ± 0.15; 4.53 ± 0.08). Baseline readability exceeded recommendations (FKGL 11.4-12.1; FRES 37-40). The plain-language prompt significantly improved readability for both models, reducing FKGL by approximately 5 grade levels and increasing FRES by 30-38 points (P < 0.001), with no loss of accuracy, relevance, or depth.
    CONCLUSION: ChatGPT generates accurate and relevant patient-directed content on meniscal surgery; however, readability frequently exceeded established health literacy standards. A simple, reproducible plain-language prompt reliably reduced reading level into the target range, offering a practical strategy for sports medicine surgeons to enhance informed consent discussions and patient education materials.
    LEVEL OF EVIDENCE: IV.
    DOI:  https://doi.org/10.1016/j.jisako.2026.101132
  16. PRiMER. 2025 ;9 54
       Introduction: ChatGPT, a large language model created by OpenAI, has emerged as a new source of online medical information. This study aimed to evaluate the appropriateness, readability, and educational value of ChatGPT's responses to frequent patient internet queries regarding 10 common primary care diagnoses.
    Methods: The responses generated by ChatGPT regarding the 10 most frequently encountered primary care diagnoses were assessed for appropriateness and readability by two primary care physicians. Responses were judged based on educational value in four categories: basic knowledge, diagnosis, treatment, and prevention. We used a 5-point Likert scale based on accuracy, comprehensiveness, and clarity to determine appropriateness. ChatGPT responses that received ratings of 4-5 in all three criteria were considered appropriate. Conversely, if the outputs received ratings of 1-3 in any category, they were deemed inappropriate. We performed readability assessments using the Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKGL) formulas to determine if the responses were at the recommended average American's seventh to eighth grade reading level.
    Results: Most (92.5%) responses were deemed appropriate unanimously by both reviewers. ChatGPT provided more appropriate responses regarding basic knowledge compared to diagnosis, treatment, and prevention. The ChatGPT responses demonstrated a college graduate reading level, as indicated by the mean FRE score of 25.64 and the median FKGL score of 12.61.
    Conclusion: Our comprehensive analysis found that ChatGPT's responses were appropriate most of the time. These findings suggest that ChatGPT has potential to be a supplementary educational tool for patients seeking health information online.
    DOI:  https://doi.org/10.22454/PRiMER.2025.412807
  17. Brain Inj. 2026 May 09. 1-11
       BACKGROUND: Online health-information gathering has become a hallmark of modern society and has the potential to positively impact health outcomes. This study examines the accuracy and quality of online resources for adults with concussion.
    METHODS: A scoping review of online Canadian and American concussion websites was conducted to identify adult concussion resources, which were then screened for accuracy, according to the Living Concussion Guideline. All eligible resources were then assessed using the SMOG tool, for readability, and the PEMAT tool, for understandability and actionability.
    RESULTS: Out of 256 resources sourced, 193 were determined to be accurate and underwent quality assessments. The mean scores were as follows: total PEMAT (71%), understandability (76%), actionability (57%), and SMOG (school-grade 11.25). Five print resources were considered high quality for both tools, and four videos were identified as high quality based on understandability and actionability.
    CONCLUSIONS: The accuracy, quality, and readability of online concussion resources remain poor in general, with some notable exceptions. Highest quality resources had a clear purpose, information presented in small sections, active voice, and less than grade 10 reading levels. It is the responsibility of those creating online resources for patients to ensure they use current information and adhere to specific design elements.
    Keywords:  Concussion; adult; online resources; patient resources; quality; readability
    DOI:  https://doi.org/10.1080/02699052.2026.2670686
  18. BMC Geriatr. 2026 May 14.
       BACKGROUND: Functional decline and high disease prevalence increases medication needs among older adults, making medication safety a pressing concern. Yet older adults often face limited access, uneven quality, and comprehension barriers. This study investigates medication information sources and associated factors among older adults in Nanjing, China, to guide improved dissemination and safety strategies.
    METHODS: A questionnaire survey based on the Comprehensive Model of Information Seeking (CMIS) was conducted among older adults in Nanjing, China. Descriptive statistics assessed use of seven medication information sources, and binary logistic regression identified factors associated with source selection.
    RESULTS: Among 165 participants, 90.3% used multiple sources in the past six months, averaging 4.7 per person. Doctors (86.7%), non-professional interpersonal networks (68.5%), and professional medical materials (66.1%) were most common, whereas internet use was lowest (32.1%). Higher perceived information-seeking ability significantly increased use of the internet (OR = 2.90), professional materials (OR = 2.68), doctors (OR = 4.46), pharmacists (2.21), nurses (OR = 2.40), and traditional media (OR = 2.37). Source characteristics also influenced choices: information quality affected doctors (OR = 19.33), pharmacists (OR = 2.39), and nurses (OR = 3.16); comprehensibility influenced pharmacists (OR = 2.67), nurses (OR = 3.16), traditional media (OR = 3.01), non-professional interpersonal networks (OR = 4.75), and professional medical materials (OR = 5.40); accessibility was associated with traditional media (OR = 3.33) and non-professional interpersonal networks (OR = 5.61); and credibility strongly predicted non-professional interpersonal networks use (OR = 19.69). Medication experience and perceived utility were additional predictors.
    CONCLUSION: Older adults in China rely on doctors, professional medical materials and heavily on non-professional interpersonal networks, with limited use of internet sources. Enhancing physician-family communication, improving source content, and strengthening health information literacy may improve medication information access and safety.
    Keywords:  Comprehensive Model of Information Seeking (CMIS); Information acquisition; Information source selection; Medication information; Older adults; Rational drug use
    DOI:  https://doi.org/10.1186/s12877-026-07514-7
  19. J Community Genet. 2026 May 14. pii: 61. [Epub ahead of print]17(3):
      Patients who are referred for genetic counseling and/or genetic testing may conduct web searches to try to gather more information prior to their appointment. However, little is known about the reading level of such resources or the suitability of the information they provide. This study aims to determine the readability and suitability of top-ranked webpages after general searches about genetic counseling and if there is a difference in these metrics depending on which type of organization (i.e., government, non-profit) authored the webpage. Twenty webpages were identified using Google. Searches of the questions "What is a genetic counselor?", "What is genetic testing?", "Why do I need genetic testing?", and "What happens at a genetic counseling appointment?" were completed and the top 5 pages were taken from each. These webpages were then analyzed using the readability tools of Flesch-Kincaid (FK) and Standardized Measure of Gobbledygook (SMOG). Both FK and SMOG provide an assessment of readability based on grade level with the goal of this study to find resources under grade level 8. Additionally, the webpages were analyzed using the Suitability Assessment of Materials (SAM) on 6 categories. To complete the SAM analysis, two reviewers completed the tool for each webpage. Sponsor type was determined based on the primary goal of the group that supported or published the webpage. When comparing between questions, the average FK scores for the webpages were between 8th and 12th grade and the average SMOG scores were between 11th and 14th grade. Most webpages rated as adequate on the SAM scale. The results of this study highlight the continued need for evaluation of patient resources, especially those on the internet, to ensure they are meeting the needs of the rising number of individuals being referred to genetic services.
    Keywords:  Genetic counseling; Genetic testing; Health literacy; Public health; Readability; Suitability
    DOI:  https://doi.org/10.1007/s12687-026-00896-6
  20. Am J Surg. 2026 May 05. pii: S0002-9610(26)00213-8. [Epub ahead of print] 117030
       BACKGROUND: This study assessed the quality, content, and readability of patient-centered online health information about diverticulitis.
    METHODS: Five common diverticulitis-related terms were searched on Google. The first 50 websites for each term were reviewed. Popularity, quality, core content, and readability were evaluated using Google Trends, the DISCERN instrument, core content domains based on American Society of Colon and Rectal Surgeons patient education, and an online readability calculator.
    RESULTS: Of 250 search results, 176 met inclusion criteria, yielding 98 unique websites. The median total DISCERN score was 48.5 ("fair"), and only 5.1% of websites were rated excellent. 73.5% exceeded the recommended eighth-grade reading level. More than 85% lacked information on recurrence risk or emerging research, and approximately 40% failed to describe diagnostic testing.
    CONCLUSIONS: Online health information on diverticulitis demonstrates considerable variability in content, quality and readability. To better support patient-centered care and informed decision-making, the development of accessible, high-quality online resources should be prioritized.
    Keywords:  DISCERN; Diverticulitis; Online; Quality; Readability
    DOI:  https://doi.org/10.1016/j.amjsurg.2026.117030
  21. Diabetes Metab Res Rev. 2026 May;42(4): e70178
       BACKGROUND: Social media has become a common source of educational information on type 2 diabetes. However, evidence regarding the quality of this content remains limited. This study evaluated the quality of short Spanish-language videos about type 2 diabetes published on Facebook, Instagram, TikTok, and YouTube.
    METHODS: A cross-sectional study analysed 400 short videos, including the top-ranked videos identified per platform (n = 100 each), collected between March 4 and 25, 2026. Spanish-language short-form videos (≤ 10 min) on type 2 diabetes intended for a general audience were included, while promotional or humourous content was excluded. Informational quality was assessed using the Journal of the American Medical Association (JAMA) benchmark criteria and the Global Quality Score (GQS).
    RESULTS: Overall, quality was low, with a mean GQS score of 2.5 ± 0.82 (on a 5-point scale) and low adherence to JAMA criteria, including authorship (35.5%), attribution (5.8%), and disclosure (1.0%). In multivariable analysis, videos on YouTube (β: 0.6; 95% CI: 0.3-0.8) and Instagram (β: 0.3; 95% CI: 0.1-0.5) had higher GQS scores compared with TikTok. Videos featuring a health professional were also associated with higher GQS scores (β: 0.4; 95% CI: 0.2-0.6). Compared with videos shorter than 30 s, videos lasting 31-60 s (β: 0.3; 95% CI: 0.0-0.6), 61-120 s (β: 0.5; 95% CI: 0.2-0.8), and more than 120 s (β: 0.7; 95% CI: 0.4-1.0) had higher GQS scores.
    CONCLUSION: Spanish-language type 2 diabetes-related videos on social media generally showed low quality. Higher quality was associated with content featuring healthcare professionals, longer video duration, and platforms such as YouTube and Instagram.
    Keywords:  health communication; social media; type 2 diabetes mellitus
    DOI:  https://doi.org/10.1002/dmrr.70178
  22. J Child Orthop. 2026 May 07. 18632521261438642
       Background: Pavlik harness is a widely accepted first-line treatment for Developmental dysplasia of the hip in infants. Given the increasing use of online video platforms by caregivers seeking medical information, this study aimed to evaluate the content, quality, and reliability of the most-viewed YouTube™ videos related to the Pavlik harness.
    Methods: A YouTube™ search was conducted using the terms "Pavlik harness," "Pavlik harness treatment," "Pavlik harness overview," "Pavlik harness how to apply," "Pavlik harness application," and "Pavlik harness tips." 48 videos were included for analysis. Data collected included upload source, video length, date of upload, number of views, likes, dislikes, comments, and the interaction index. Video quality and reliability were evaluated using the Global Quality Scale (GQS), Journal of the American Medical Association (JAMA) benchmark criteria, and DISCERN instrument.
    Results: Of the 48 videos analyzed, 26 (54.2%) were classified as high quality, 10 (20.8%) as intermediate, and 12 (25%) as low quality. Videos uploaded by healthcare professionals and academic institutions had significantly higher GQS, JAMA, and DISCERN scores compared to those uploaded by non-medical sources(p < 0.001). High-quality videos also had a higher number of likes per day and views per day (p = 0.001 and p = 0.001, respectively).
    Conclusion: Nearly half of the most-viewed YouTube™ videos on this topic were of intermediate or low quality. Pediatric orthopedic specialists and professional societies should be encouraged to contribute high-quality, evidence-based videos to guide caregivers appropriately. Parents should be advised to rely on videos uploaded by reputable academic sources to ensure accurate and safe application of the Pavlik harness.
    Keywords:  Pavlik harness; YouTube™; developmental dysplasia of the hip; parent education; pediatric orthopedics
    DOI:  https://doi.org/10.1177/18632521261438642
  23. Int J Dent Hyg. 2026 May 12.
       BACKGROUND: Oral hygiene is essential for maintaining oral and general health, yet its practice remains suboptimal in many developing countries. YouTube has emerged as a major source of health-related information, though the quality and reliability of content are often uncertain.
    AIM: The study aimed to assess the quality and reliability of oral hygiene-related YouTube videos using the Global Quality Scale (GQS) and Modified Quality Criteria for Consumer Health Information (mDISCERN) tool, with a secondary objective to examine and correlate viewer engagement metrics, including viewing rate and interaction index, in relation to video quality and reliability.
    METHODS: A cross-sectional qualitative content analysis was performed on 100 YouTube videos uploaded between February 2020 and February 2025. Relevant keywords were identified via Google Trends, and videos were selected in incognito mode. Videos were assessed for quality and reliability using GQS and mDISCERN scores, while engagement metrics included views, likes, subscribers, viewing rate, and interaction index.
    RESULTS: Videos uploaded by dental professionals scored significantly higher on both GQS and mDISCERN compared to other health professionals or laypersons (p = 0.0001). Longer video duration was positively associated with higher quality and reliability scores (p = 0.02 and p = 0.01). Conversely, videos from channels with more subscribers tended to have lower quality and reliability (p = 0.05 and p = 0.01). Viewer engagement was greater for higher-quality videos despite lower subscriber counts.
    CONCLUSION: Most YouTube oral hygiene videos exhibited moderate to excellent quality, with dental professionals producing the most reliable content. High-quality videos tend to engage viewers more effectively, highlighting the need for authoritative content creation to improve public oral health education.
    Keywords:  YouTube; health education; internet; oral hygiene; social media
    DOI:  https://doi.org/10.1111/idh.70090
  24. Thromb J. 2026 May 11.
       BACKGROUND: Deep vein thrombosis (DVT) is a form of venous thromboembolism, occurring in approximately 1.6 out of every 1000 people annually. Social media platforms have become influential tools for disseminating health information.This study aimed to comprehensively evaluate the content, quality, and reliability of DVT-related videos on TikTok, Bilibili, and YouTube.
    METHODS: A comprehensive search was conducted on Bilibili, TikTok (Douyin), and YouTube using three keywords - "", "Deep vein thrombosis", and "DVT" - across all platforms.Two reviewers independently assessed video characteristics, categorized content (definition, etiologies and causations, symptoms, treatment, prevention, and complications), and evaluated quality using validated tools [Global Quality Scale(GQS), The Journal of the American Medical Association (JAMA), and modified DISCERN (mDISCERN) scores].
    RESULTS: Analyzing 300 DVT-related videos revealed distinct content distribution patterns across platforms. On TikTok, prevention (24%), etiologies and causations (23%), and symptoms (21%) predominated; on Bilibili, treatment (30%), prevention (29%), and complications (16%) were most common; while on YouTube, treatment (25%), symptoms (24%), and etiologies and causations (18%) were prioritized. Quality assessment showed moderate overall scores: median GQS of 4 (Q1-Q3: 3-4) for TikTok and YouTube, compared to 3 (Q1-Q3: 3-4) for Bilibili (p = 0.02).TikTok videos had significantly higher modified DISCERN scores (4, Q1-Q3: 3-4) compared to Bilibili (3, Q1-Q3: 3-4) and YouTube (3, Q1-Q3: 2-3) (p < 0.0001). TikTok and Bilibili videos had higher JAMA scores (2, Q1-Q3: 2-2) than YouTube (2, Q1-Q3: 1-2) (p < 0.0001).
    CONCLUSIONS: Although DVT-related videos on social media offer valuable information on prevention and treatment, content on diagnosis and epidemiology is lacking.Videos created by medical professionals showed higher quality across all assessment metrics.Efforts should focus on improving the comprehensiveness and reliability of health information on social media platforms to enhance public awareness of DVT.
    Keywords:  Cross-sectional study; Deep vein thrombosis; Health information quality; Social media
    DOI:  https://doi.org/10.1186/s12959-026-00834-z
  25. Digit Health. 2026 Jan-Dec;12:12 20552076261443720
       Backgrounds: Gallbladder cancer is a highly invasive malignant tumor characterized by challenging early diagnosis and poor prognosis. With the widespread adoption of short-video platforms in China, the public increasingly accesses health information through channels such as TikTok (Chinese version) and Bilibili. However, the quality and reliability of gallbladder cancer-related videos on these platforms have not been systematically evaluated.
    Objective: This study aims to evaluate content characteristics, information quality, and reliability of gallbladder cancer-related videos on TikTok and Bilibili in China, thereby providing evidence-based guidance for optimizing health information dissemination through short video content.
    Methods: A total of 158 videos (99 from TikTok, 59 from Bilibili) were included in the final analysis. We extracted basic information and user interaction data from these videos. Video quality, reliability, and information coverage were assessed using the Global Quality Scale (GQS), modified DISCERN tool (mDISCERN), and Content Completeness Score (CS). Nonparametric statistical methods and chi-square tests were used for data analysis.
    Results: Regarding general information, Bilibili videos are notably longer (170 seconds vs. 82 seconds, p < 0.001), while TikTok videos achieve higher scores across all engagement metrics (likes: 391 vs. 10, p < 0.001; collections: 110 vs. 9, p < 0.001; comments: 67 vs. 1, p < 0.001; shares: 74 vs. 7, p < 0.001). Regarding uploader types, TikTok predominantly featured specialist physicians (66.67%), while Bilibili primarily showcased knowledge disseminators (33.90%). Regarding quality scores, TikTok videos demonstrated significantly higher mDISCERN scores than Bilibili videos (p = 0.049), while Bilibili videos achieved significantly higher CS scores (p = 0.030). Additionally, the identity of content creators is a key determinant of video quality. Video engagement metrics bear no relation to video quality scores (GQS, mDISCERN and CS).
    Conclusion: In summary, TikTok videos are more interactive, whereas Bilibili videos tend to be longer and offer more comprehensive content. However, videos on both platforms suffer from insufficient information completeness and inconsistent quality, with popularity failing to reflect scientific accuracy. It is recommended that platforms, healthcare professionals and content creators collaborate to collectively enhance the overall quality and dissemination effectiveness of health information.
    Keywords:  Bilibili; TikTok; gallbladder cancer; information quality; mDISCERN; reliability; short videos
    DOI:  https://doi.org/10.1177/20552076261443720
  26. Digit Health. 2026 Jan-Dec;12:12 20552076261443766
       Background: Myocardial infarction (MI) is a leading cause of global mortality, making public education on its prevention, early recognition, and pre-hospital first aid essential. In China, short-video platforms like TikTok (Douyin) and Bilibili have become primary channels for health information dissemination. However, the quality and reliability of MI-related content on these platforms have not been systematically evaluated. This study aimed to assess and compare the quality, reliability, and accuracy of the most widely viewed (high-visibility) short videos about MI on TikTok and Bilibili and to identify factors associated with video quality.
    Methods: This study conducted a cross-sectional content analysis of the top 100 highest-ranking MI-related Chinese short videos retrieved via each platform's default ranking algorithm from both TikTok and Bilibili. Two independent cardiologists evaluated the videos using the Global Quality Score (GQS), the DISCERN instrument, and a newly developed Medical Information Accuracy Score (MIAS). Basic video characteristics, uploader information, and user engagement metrics were extracted.
    Results: Overall, the quality of high-visibility MI-related videos on both platforms was suboptimal. Bilibili videos demonstrated significantly higher quality, reliability, and medical accuracy than those on TikTok (median GQS: 3 vs. 2, P<.001; median DISCERN: 5 vs. 3, P<.001; median MIAS: 7 vs. 4, P<.001). Videos posted by medical professionals and healthcare institutions were of significantly higher quality than those from non-professionals or media/commercial outlets (P<.001). Content created by cardiologists was of the highest quality. Critically, there was no significant correlation between video popularity (i.e.,likes and comments) and informational quality or reliability (P>.05). However, video duration showed a moderate positive correlation with higher quality and reliability scores (GQS: ρ=0.45, P<.001; DISCERN: ρ=0.41, P<.001).
    Conclusion: The quality of information in high-visibility myocardial infarction-related videos on China's major short-video platforms is concerningly poor, with Bilibili providing more reliable and accurate content than TikTok. The source of a video is a key determinant of its quality, but engagement metrics such as "likes" do not reliably signal informational quality. A collaborative effort among healthcare professionals, platform regulators, and the public is urgently needed to improve the quality of this critical health information.
    Keywords:  Bilibili; TikTok; information quality; myocardial infarction; short videos; social media
    DOI:  https://doi.org/10.1177/20552076261443766
  27. Medicine (Baltimore). 2026 May 08. 105(19): e48723
      Nasopharyngeal cancer is a cancer originating from the nasopharyngeal epithelium. Health education serves as an effective measure for prevention and treatment. With the rapid development of short videos, platforms such as TikTok and Bilibili have become primary sources for health information. However, the quality of nasopharyngeal cancer-related videos on these platforms remains unexplored. This study aims to evaluate the information quality of short videos related to nasopharyngeal cancer on the TikTok and Bilibili platforms. A cross-sectional study was conducted on October 1, 2025; the top 100 short videos related to nasopharyngeal cancer were collected from TikTok and Bilibili through searches in Chinese. After extracting basic information, the global quality score (GQS) and the modified DISCERN (mDISCERN) tool were used to assess each video's quality and reliability. The GQS rating scale ranges from 1 (poor quality) to 5 (high-quality). The modified DISCERN scores range from 0 to 5, with higher scores indicating greater reliability. Additionally, Spearman correlation analysis was applied to examine relationships between video characteristics, GQS, and DISCERN scores. A total of 200 videos were included in the analysis, with 58.5% of the videos uploaded by healthcare professionals, 19.5% by science communicators, 14.0% by general users, and 8.0% by organizational users. The median GQS and mDISCERN scores for nasopharyngeal carcinoma videos were 3 and 2, respectively. TikTok videos showed significantly higher engagement compared to Bilibili, with median likes of 1018 vs 16, collections of 313 vs 14, comments of 206 vs 2, and shares of 282 vs 5 (all P < .001). No statistically significant differences were observed between TikTok and Bilibili in GQS or mDISCERN scores. The main content themes on both platforms were clinical manifestations and treatment, while prognosis was rarely discussed. Correlation analysis revealed strong correlations among interactive data points, but weak or negative correlations between interactive data and GQS or mDISCERN scores. The informational content and quality of nasopharyngeal cancer-related videos on TikTok and Bilibili require improvement. Enhanced management of health science popularization videos on short video platforms is needed to ensure the dissemination of accurate and reliable health information.
    Keywords:  global quality score; modified DISCERN; nasopharyngeal cancer; quality assessment; short videos
    DOI:  https://doi.org/10.1097/MD.0000000000048723
  28. BMC Oral Health. 2026 May 11.
       BACKGROUND: Short-video platforms have become major channels for disseminating public health information. This study evaluated the quality, reliability, and content completeness of videos on impacted wisdom teeth on TikTok and Bilibili. It examined whether platform type, uploader category, and user engagement metrics were associated with these outcomes.
    METHODS: A cross-sectional analysis was conducted on November 20, 2025. The top 150 videos from each platform were retrieved using the standard Chinese clinical term for impacted wisdom teeth. After screening, 199 videos were included. Quality and reliability were assessed using the Global Quality Score (GQS), modified DISCERN (mDISCERN), and Journal of the American Medical Association (JAMA) benchmark criteria. Content completeness was assessed using an 8-item clinical checklist. Uploaders were categorized as specialized healthcare professionals (SHCPs), non-specialized healthcare professionals (NSHCPs), and individual users (IUs). Non-parametric tests and Spearman's correlation analysis were used.
    RESULTS: Overall video quality and reliability were moderate, with median GQS, mDISCERN, and JAMA scores of 3.00. An "indication-heavy, contraindication-light" pattern was observed: indications (76.38%) and definitions (68.34%) were frequently covered, whereas contraindications (5.53%) were rarely addressed. Videos uploaded by SHCPs had significantly higher quality and reliability scores than those uploaded by IUs (all P < 0.001). Engagement metrics, including likes and shares, were not positively correlated with the established quality measures.
    CONCLUSIONS: Short videos on impacted wisdom teeth on Chinese short-video platforms were of moderate quality and provided limited coverage of contraindications and other risk-related information. Videos uploaded by specialized healthcare professionals tended to provide more reliable and complete information than those from non-professional sources. In contrast, user engagement metrics were not reliable indicators of informational quality. These findings may inform efforts to strengthen evidence-based dental communication on short-video platforms.
    Keywords:  Health communication; Impacted wisdom teeth; Information quality; Short-video platforms; Social media
    DOI:  https://doi.org/10.1186/s12903-026-08553-7
  29. Digit Health. 2026 Jan-Dec;12:12 20552076261447941
       Background: Chikungunya fever (CHIKF) is an arboviral disease caused by the Chikungunya virus, which has increasingly emerged as a global public health concern, particularly in regions like China. Short video platforms such as TikTok and Bilibili have become vital channels for disseminating health information. Given the significant reach of these platforms, it is essential to evaluate the quality and reliability of videos related to CHIKF, ensuring that the public receives accurate and credible information.
    Methods: This cross-sectional study collected the top 100 videos from each platform, ranked by comprehensive sorting. Video quality was assessed using four validated tools: the Global Quality Scale (GQS), modified DISCERN (mDISCERN), the Journal of the American Medical Association (JAMA) benchmarks, and the Video Information and Quality Index (VIQI), which evaluate educational, informational, and audiovisual quality.
    Results: A total of 166 videos were analyzed. Overall scores were at a moderate level: 3.00 (IQR: 2.00-4.00), mDISCERN 3.00 (IQR: 2.00-4.00), JAMA 2.00 (IQR: 2.00-2.00), and VIQI 10.00 (IQR: 8.00-13.00). Compared with Bilibili, TikTok videos demonstrated significantly higher user engagement, including likes, comments, and shares (all p<0.05). In terms of quality, TikTok videos showed better information flow and significantly higher VIQI scores (p<0.05). Videos uploaded by medical professionals achieved significantly higher GQS, mDISCERN, and JAMA scores compared with those uploaded by general users (p<0.05).
    Conclusions: This study found that the overall quality, reliability, and transparency of CHIKF-related videos remain suboptimal. Videos uploaded by medical professionals performed best across multiple quality indicators. These findings highlight the need to strengthen content oversight on platforms and encourage medical professionals to actively participate in the creation of CHIKF-related science videos to improve the accuracy of public health information.
    Keywords:  bilibili; chikungunya fever; public health communication; social media; tiktok
    DOI:  https://doi.org/10.1177/20552076261447941
  30. J Craniofac Surg. 2026 May 12.
       BACKGROUND: Upper eyelid ptosis is a commonly encountered ophthalmological disease that reduces both visual performance and quality of life. Short-video platforms have become popular sources for health information. The objective of this study was to evaluate the reliability, accuracy, and content features of short-form videos addressing upper eyelid ptosis available on social media platforms TikTok and Bilibili.
    METHODS: We searched "upper eyelid ptosis" and collected the top 150 videos from each platform using default rankings. Assessment utilized mDISCERN, Global Quality Score (GQS), and Journal of the American Medical Association (JAMA) benchmark for video quality evaluation. We assembled and examined basic video properties, content scope, and user engagement metrics. Platform and uploader category comparisons addressed general characteristics and video quality parameters.
    RESULTS: Analysis included 214 videos. Median video length was 63.50 seconds (IQR: 43.75-104.75). Most videos covered etiology (74.8%), clinical manifestations (84.6%), diagnosis (79.4%), and treatment (81.8%). However, epidemiology, prevention, and prognosis were rarely discussed. Video quality proved inadequate: median GQS reached 3.00 (IQR: 2.00-3.00), mDISCERN scored 2.00 (IQR: 1.00-4.00), and JAMA achieved 2.00 (IQR: 2.00-3.00). Content from professional healthcare providers attained superior quality (median GQS and mDISCERN = 3.00, P < 0.001 ), whereas individual users generated the highest engagement levels. User interaction metrics showed no significant association with video quality scores.
    CONCLUSIONS: Overall quality of ptosis-related short videos on TikTok and Bilibili remained inadequate, displaying systematic deficiencies in content organization. Professional healthcare provider videos showed enhanced quality but reduced engagement rates. Our findings also emphasize the need to encourage healthcare professionals to become more involved in health-related social media dissemination to enhance the effectiveness of reliable medical content sharing on short-video platforms.
    Keywords:  Content quality; health information; short videos; social media; upper eyelid ptosis
    DOI:  https://doi.org/10.1097/SCS.0000000000012766
  31. Orthop J Sports Med. 2026 May;14(5): 23259671261428460
       Background: TikTok videos on orthopaedic topics receive high engagement, but the quality of content on knee osteotomies is unclear.
    Purpose: To assess the quality, reliability, and educational value of TikTok videos on knee osteotomy.
    Study Design: Cross-sectional study.
    Methods: TikTok was searched for "knee osteotomy,""high tibial osteotomy," and "distal femoral osteotomy," yielding 789 videos. A total of 191 met inclusion criteria. Video metrics (duration, views, likes, shares), uploader type (private user, physical therapist, physician, researcher), and content type (patient experiences, physical therapy and rehabilitation, anatomy, surgical technique) were recorded. Quality was assessed using the DISCERN instrument, Journal of the American Medical Association (JAMA) benchmark criteria, and Global Quality Score (GQS). Associations between video metrics and quality scores were analyzed using Spearman rank correlation, and Mann-Whitney U tests evaluated differences in scores by uploader type and content type.
    Results: Most videos were posted by private users (145; 75.9%) and focused on patient experiences (128; 67.0%). Mean duration was 34.4 ± 40.8 seconds (range, 4-317 seconds). Videos received a mean of 3122.3 ± 16,845.1 likes (range, 0-197,400 likes), 223.9 ± 1891.3 shares (range, 0-25,900 shares), and 166,863.0 ± 1,002,969.9 views (range, 70-12,700,000 views). Mean DISCERN, JAMA, and GQS scores were 32.1 ± 18.4, 1.7 ± 1.1, and 2.6 ± 0.9. Video duration, shares, and views correlated with all quality scores (P < .05), while likes correlated weakly with DISCERN only (P < .05). Videos from health care professionals (physicians, physical therapists, researchers) achieved significantly higher quality scores than private users (DISCERN 56.0 ± 14.2 vs. 24.5 ± 11.9; GQS 3.6 ± 0.8 vs. 2.2 ± 0.5; JAMA 3.2 ± 1.0 vs. 1.3 ± 0.7; all p < 0.001). Educational videos (anatomy, physical therapy/rehabilitation, surgical technique) scored significantly higher quality scores than patient experience videos (DISCERN, 52.8 ± 16.8 vs 21.9 ± 6.9; GQS, 3.6 ± 1.0 vs 2.1 ± 0.3; JAMA, 2.9 ± 1.2 vs 1.1 ± 0.3; all P < .001).
    Conclusion: TikTok videos related to knee osteotomy demonstrated overall low quality. Although videos produced by health care professionals achieved higher quality scores, overall content quality remained limited.
    Keywords:  TikTok; distal femoral osteotomy; high tibial osteotomy; knee osteotomy; patient education; social media
    DOI:  https://doi.org/10.1177/23259671261428460
  32. Can J Surg. 2026 May-Jun;69(3):69(3): E243-E250
       BACKGROUND: Anterior cruciate ligament (ACL) tear is a common injury, and accurate and inaccurate medical information is readily accessible on social media platforms. We sought to assess the quality of rehabilitation exercise videos available on TikTok regarding ACL tears.
    METHODS: We used the keywords "ACL rehab exercises" to search the TikTok database, and we reviewed the first 113 videos. We retrieved information such as the number of views and likes for statistical analysis. We considered 2 types of users: general users and health care professionals. We included videos with a subject about a rehabilitation exercise for ACL tears and in the English language. We excluded videos that were duplicates and that followed a TikTok trend. Two raters assessed the quality of videos with the Modified DISCERN (mDISCERN) instrument and the ACL Exercise Education Score (AEES). A third author resolved any disagreement. We calculated descriptive statistics and used the Mann-Whitney test for nonparametric samples to compare the quality of content between groups.
    RESULTS: We included 106 TikTok videos in the analysis. Fifty-five videos were created by general users and 51 videos by health care professionals. Videos by general users had significantly more likes (57 700 v. 788, p = 0.02). The overall quality of videos was poor, according to the mDISCERN instrument (score 1.99 ± 1.18), although health care professionals had significantly better-quality videos (score 3.10 ± 0.04) than general users (score 0.96 ± 0.09) (p < 0.001). Significance was similar with the mean AEES (score 7.62 v. 4.85, p = 0.002).
    CONCLUSION: Although health care professionals produced better-quality videos than general users, they were not the main producers of content in this study. Moreover, the overall quality and reliability of all the videos included in this study was poor. We encourage health care professionals to create a greater number of videos that contain quality information and that fit the TikTok format.
    DOI:  https://doi.org/10.1503/cjs.006625
  33. Medicine (Baltimore). 2026 May 08. 105(19): e48541
      Vascular cognitive impairment (VCI) severely affects patients' quality of life and imposes a substantial burden on families and society. Health education on VCI is an effective approach for preventing and slowing disease progression. Currently, short videos and WeChat official account articles demonstrate significant potential in disseminating health information. However, no evaluation of the quality and content of VCI-related information on these platforms currently exists. Therefore, we conducted a cross-sectional assessment of VCI-related information quality on the Chinese platforms TikTok, Bilibili, and WeChat official accounts. A total of 72 Chinese short videos from short video platforms (TikTok and Bilibili) and 61 articles from WeChat official accounts were screened for inclusion. The quality and reliability of the videos and articles were evaluated using the Global Quality Scale (GQS), Journal of the American Medical Association (JAMA) benchmark criteria, and Discern Instrument for Systematically Critical Evaluation of Reliable Nealth Information. Content completeness was assessed on the basis of established VCI guidelines. Comparative analyses were conducted on information from various platforms and publishers. The scores of WeChat official account articles were significantly greater than those of short videos across all the metrics: Discern Instrument for Systematically Critical Evaluation of Reliable Nealth Information score (48.57 ± 7.33 vs 39.10 ± 6.35, P < .001), GQS score (median 4.00 [IQR 3.00-4.00] vs 3.00 [2.00-3.00], P < .001), JAMA score (median 1.00 [1.00-2.00] vs 1.00 [1.00-1.00], P < .001), and total content score (6.52 ± 1.79 vs 4.38 ± 1.91, P < .001). Short videos on TikTok had higher total content scores than those on Bilibili did (4.94 ± 1.98 vs 3.69 ± 1.59, t = 2.90; P = .004). Videos and articles published by doctors, medical institutions, and healthcare vertical media achieved significantly higher total content and JAMA and GQS scores than those published by general individuals did. The overall quality of VCI-related short videos and articles was moderate. Compared with short videos, WeChat official account articles demonstrated higher quality and superior content completeness. Short videos on TikTok outperformed those on Bilibili in content scoring. Information from doctors, medical institutions, and healthcare vertical media publishers was of higher quality than that from ordinary individual publishers.
    Keywords:  WeChat official account; quality; short videos; vascular cognitive impairment; vascular dementia
    DOI:  https://doi.org/10.1097/MD.0000000000048541
  34. Urogynecology (Phila). 2026 May 06.
       IMPORTANCE: Interstitial cystitis/bladder pain syndrome (IC/BPS) is a common, stigmatizing, and poorly understood condition. With more people turning to TikTok for health information, it is important to assess the quality of IC/BPS-related content on this platform, which has not been studied to date.
    OBJECTIVES: The objectives of this study were to evaluate how IC/BPS is discussed on TikTok, assess the quality of the information provided, and determine the utility of these videos as patient education tools.
    STUDY DESIGN: We conducted a cross-sectional, content analysis study using a web scraping software to identify the top 100 most-played TikTok videos under 4 IC/BPS-related hashtags. A codebook was developed to document video qualities and content. The DISCERN scale was applied to evaluate health information quality, and the Patient Education Materials Assessment Tool (PEMAT) was used to evaluate understandability/actionability.
    RESULTS: The top 100 videos had a median of 71,400 plays, 116 comments, and 9 seconds duration. Half were created by patients, and half by health care workers. Most shared personal experiences with neutral or negative tones. More videos demonstrated distrust in health care than trust. Two thirds mentioned symptoms, and nearly half mentioned treatments. Videos scored poorly on the DISCERN scale and the Actionability PEMAT, but highly on the Understandability PEMAT.
    CONCLUSIONS: TikTok is a platform where patients actively engage with IC/BPS-related content. While videos are widely accessible and understandable, they often lack high-quality and actionable information. Clinicians should consider further engaging with TikTok to improve the accuracy, quality, and utility of educational content on IC/BPS.
    DOI:  https://doi.org/10.1097/SPV.0000000000001837
  35. Front Psychiatry. 2026 ;17 1817890
       Background: Late-life depression is common in older adults and is often under-recognized. Short-video platforms have become a major source of mental health information. However, content quality and transparency remain uncertain.
    Methods: We conducted a cross-sectional assessment of highly viewed videos on late-life depression on three Chinese platforms. We searched each platform using the keyword in Chinese "Late-life depression". We selected the top 200 videos by view count on Douyin, Rednotes (Xiaohongshu), and BiliBili. After exclusions, 562 videos were included (Douyin, n=188; Rednotes, n=188; BiliBili, n=186). Two medically trained raters scored videos using the Global Quality Score (GQS), modified DISCERN (mDISCERN), and JAMA benchmark criteria. We also coded content categories and creator types. We assessed platform differences using non-parametric tests. We examined associations between a limited engagement proxy, defined as the comment-to-view ratio, and quality scores using Spearman correlation.
    Results: Video duration differed across platforms (p<0.001). Engagement indicators were higher on Douyin and Rednotes than on BiliBili. Symptoms were the most common topic on all platforms. Prevention and intervention ranked second on Douyin and Rednotes. On BiliBili, causes and case-based analysis were also common. Overall quality was moderate. Mean GQS ranged from 2.96 to 3.05. Transparency was limited. Mean JAMA ranged from 1.91 to 2.04. Reliability was slightly higher on BiliBili based on mDISCERN. Creator type was strongly associated with scores. Expert and institutional videos scored higher than general and marketing-oriented accounts. Correlations between visible audience interaction and quality were weak.
    Conclusion: Highly viewed late-life depression videos on major Chinese platforms show moderate quality and limited transparency. Exposure does not reliably signal higher-quality information. Platforms and health authorities should strengthen source disclosure and promote evidence-based content from qualified creators.
    Keywords:  health communication transparency; information quality; late-life depression; short-video; social media health information
    DOI:  https://doi.org/10.3389/fpsyt.2026.1817890
  36. Acta Ortop Bras. 2026 ;34(2): e297030
       Objective: To assess the quality and reliability of information on adhesive capsulitis provided to the public on Instagram.
    Method: The 100 most relevant posts on #adhesivecapsulitis and the 100 posts on #frozenbrown hashtags were analyzed on a single day. The evaluation was carried out by three professionals at different levels - a general practitioner, an orthopedist, and a shoulder and elbow surgeon - using the Global Quality Score (GQS), the modified DISCERN, and the Frozen Shoulder Specific Score (FSSS). In addition, interobserver agreement was checked.
    Results: The majority (65.5%) of the 200 posts analyzed presented incomplete information, scoring 1 or 2, representing low scientific foundation and limited clinical applicability. The average scores on the three scores indicated unsatisfactory quality of the content. However, agreement between the evaluators was high, demonstrating the reliability of the methods used.
    Conclusion: Although Instagram is widely used as a source of health information, it has poor-quality content on adhesive capsulitis. These findings reinforce the importance of medical guidance and the production of more accurate, evidence-based information materials. Level of evidence IV; observational, descriptive, cross-sectional.
    Keywords:  Adhesive capsulitis; Frozen shoulder; Internet; Social Media
    DOI:  https://doi.org/10.1590/1413-785220263402e297030
  37. BMJ Open. 2026 May 12. 16(5): e102637
       OBJECTIVE: To explore how Chinese patients with precancerous ear, nose and throat (ENT) lesions experience using artificial intelligence (AI)-driven chatbots for health-related information seeking, with particular attention to perceived benefits, challenges and influences on information-seeking practices.
    DESIGN: Descriptive qualitative study.
    SETTING: Department of Otolaryngology, West China Hospital, Sichuan University, China.
    PARTICIPANTS: 12 adult patients with clinically diagnosed precancerous ENT lesions who had used AI-driven chatbots at least three times in the previous 30 days to seek health-related information about their condition were purposively recruited. Face-to-face, semistructured interviews were conducted between 11 October 2024 and 10 November 2024.
    RESULTS: Interviews were analysed using Colaizzi's method. Four themes were identified. First, participants described AI chatbots as an immediate and accessible source of health information, particularly when questions arose outside clinical encounters or during periods of uncertainty. Second, many reported moving away from conventional search engines towards conversational information seeking, valuing direct and synthesised responses over link-based retrieval. Third, participants emphasised that obtaining useful answers depended on learning to ask clear and specific questions, suggesting that effective prompting was an important user skill. Fourth, some participants perceived chatbot interaction as emotionally safer than asking healthcare professionals certain questions, particularly when they felt embarrassed, worried about asking repetitive questions or feared being judged.
    CONCLUSIONS: Among this sample of Chinese patients with precancerous ENT lesions, AI-driven chatbots were perceived as a convenient and conversational supplementary source of health information. Participants valued their accessibility and interactional ease, but also indicated that their usefulness depended partly on users' ability to formulate effective prompts. Some participants additionally perceived chatbots as a more comfortable channel for asking sensitive or basic questions. The findings suggest that AI chatbots may have a complementary role in patient information support, but further research is needed to evaluate response accuracy, safety and appropriate integration into clinician-led care.
    Keywords:  Artificial Intelligence; Digital Technology; ONCOLOGY; OTOLARYNGOLOGY; QUALITATIVE RESEARCH
    DOI:  https://doi.org/10.1136/bmjopen-2025-102637
  38. Health Commun. 2026 May 10. 1-11
      For an increasing share of individuals, the Internet is a primary source for health information seeking. While prior research has largely depended on self-reports or aggregated search data, both approaches face limitations in accuracy and granularity. Building on recent advances in user-centric tracking, this study examines behavioral manifestations of online health information-seeking behaviors (HISB) using 3 months of tracking data from 728 German Internet users. We contribute to the evidence on the prevalence of online HISB and test whether psycho-motivational predictors from the Planned Risk Information Seeking Model (PRISM) are transferable to actual online behavior. Results indicate a limited explanatory power of PRISM variables. Specifically, the model's predictive relevance appears tied to user interest and engagement with health information rather than subsequent behavioral manifestations. These findings suggest boundary conditions of existing HISB models and emphasize the need to integrate observable behavior and downstream outcomes for a more comprehensive understanding of online health information seeking.
    DOI:  https://doi.org/10.1080/10410236.2026.2670540
  39. Br J Pain. 2026 May 06. 20494637261447443
       Background: Patients with fibromyalgia syndrome (FMS) often report prolonged diagnostic pathways and inadequate care, prompting reliance on self-management and online health information. This study aimed to quantify the association between specific clinical experiences and the sources and extent of health information-seeking behaviours in people with FMS.
    Methods: A cross-sectional online survey was completed by adults who self-reported a diagnosis of FMS. Measures assessed symptom severity, diagnostic and treatment experiences, frequency and duration of symptom flares, perceived stigma and caring from healthcare professionals, and engagement with health information sources, including traditional and digital platforms. Associations between clinical experiences and health information-seeking behaviours were examined using non-parametric tests and hierarchical regression analyses.
    Results: A total of 384 adults completed the survey (75.3% female; median age 41 years). Most participants reported experiencing symptom flares (88.0%), occurring approximately every 2-3 weeks and lasting a median of 3 days. Participants reported significantly more negative than neutral experiences across multiple diagnostic and treatment variables, including diagnostic difficulty and challenges accessing specialist care. Most respondents (84.1%) actively sought health information, most commonly from healthcare professionals, websites, and online network platforms. Nearly half reported difficulties accessing satisfactory health information. Greater diagnostic difficulty, difficulty finding a specialist, and higher perceived caring from healthcare professionals independently predicted engagement with a wider range of health information sources.
    Conclusions: Patients with FMS frequently report dissatisfaction with their clinical experiences. Positive and negative diagnostic and treatment experiences are associated with the extent of health information-seeking among people with FMS. These findings highlight the importance of clinical experiences in shaping how patients seek health information and underscore the role of supportive clinical relationships in fostering informed and collaborative care.
    Keywords:  chronic pain; clinical pathway; diagnosis; fibromyalgia syndrome; health information; online support; treatment
    DOI:  https://doi.org/10.1177/20494637261447443