bims-librar 2026-05-24 papers

bims-librar

Biomed News

on Biomedical librarianship

Issue of 2026–05–24
37 papers selected by
Thomas Krichel, Open Library Society

Making Medicine Material: Objects and Materiality in the History of Medicine.
Medical Departments in Public Libraries.
VectorSage: enhancing PubMed article retrieval with advanced semantic search.
[Science and library in one life - János Marton (1943-2018)].
Curriculum collection alignment in pharmacy education: insights from Namibia for global practice.
Evaluation of AI Citation Accuracy in Anterior Segment Research.
Development of a Validated Lay Checklist (Info Without Side Effects) for Assessing Health Information on Websites: Mixed Methods Study.
Exploring health information needs and preferences of culturally and linguistically diverse (CALD) women during pregnancy: A mixed methods study.
Pharmacy students' use of e-learning resources: a repeated cross-sectional study in 2021 and 2026.
Influence of smart device use, online health information seeking, and self-perceived health on health-promoting behaviors among university students.
Structural Inequalities in Online Health Information Seeking: Cross-National Multilevel Study.
Evaluation of AI-generated versus registered dietitian-authored nutrition responses: a cross-sectional study.
Trust and Technology Acceptance: Comparing Traditional Search Engines and Artificial Intelligence for Colorectal Cancer Information Seeking.
Quality of Responses Generated by Artificial Intelligence Chatbots for Frequently Asked Questions by Caregivers of Presurgical Nasoalveolar Molding Therapy.
Large language models and child mortality: opportunities and challenges in answering public queries on under-5 causes.
Accuracy, readability, and content coverage of AI-generated responses to questions on functional appliances.
Assessing ChatGPT's responses to office ergonomics and spine health questions.
A bilingual evaluation of chatbot performance in bruxism-related information: accuracy and readability across two models.
An Assessment of ChatGPT Responses to Common Postoperative Rhinoplasty Questions.
Letter: Evaluation of the accuracy and readability of large language model responses on menopause and hormone therapy.
Digital Experts in Lymphedema: Assessing the Quality and Readability of Responses from ChatGPT and Gemini.
Quality of online heatstroke information across four countries: a quantitative content analysis.
Assessment of online information about osteomyelitis of jaw: An infodemiologic study.
Educational quality of YouTube videos on robot-assisted thymectomy: a LAP-VEGaS-based evaluation.
Quality and educational value of youtube videos on inguinal hernia surgery: a cross-sectional study.
Quality and educational value of TikTok videos on rehabilitation exercises for triangular fibrocartilage complex injuries: A cross-sectional study.
Quality and guideline adherence of child-oriented toothbrushing videos on YouTube: a comparative study of Turkish and English content.
Quality of information about potentially malignant oral disorders on TikTok: a cross-sectional analysis in the context of social media.
Short-form video sharing platforms as a source of information for children's growing pains: A cross-sectional content analysis study.
Assessing the quality of general anesthesia-related short videos on TikTok: a cross-sectional study.
Stratified and combined analysis of the quality of lumbar spinal stenosis-related videos on major Chinese short video platforms.
Reliability and quality of cognitive impairment educational content on Douyin and Bilibili: A cross-sectional content analysis.
A cross-sectional study on the quality of pediatric autism-related videos on short video platforms.
Analysis of content, quality, and reliability of acute cholecystitis-related Chinese videos on TikTok and Bilibili: A cross-sectional study.
Accuracy and Actionability of TikTok Content on Cosmetic Limb Lengthening: A Comparison between Healthcare Professional and Non-professional Sources.
Quality and reliability of chest Pain-Related short-form health videos on social media: A cross-sectional content analysis.
Attitudes and Practices of Adults Regarding the Use of TikTok for Health Information: A Cross-sectional Study.

Bull Hist Med. 2026 ;100(1): 1-30

Making Medicine Material: Objects and Materiality in the History of Medicine.

Claire L Jones.

This positioning paper surveys recent historical scholarship on objects and material culture relating to health and medicine. Analyzing scholarship under four main themes-professional medicine; everyday health; spaces, places, and environments; and consumption, globalism, and colonialism-the paper demonstrates the vibrancy and breadth of the field; it outlines how scholarship across these themes has viewed a wide range of objects and forms of materiality as not only integral to medical knowledge and health practices in the past but has often uncovered alternative, subversive, and multiple ways of knowing and doing that are inaccessible by the written word alone. The paper ends by providing possible future directions for scholars keen to further explore the role of materiality in medical knowledge making and practice. In particular, it suggests how scholars might fruitfully expand the material categories they work with, adopt "material time" and include more critical self-awareness in their research.
JAMA. 2026 May 21.

Medical Departments in Public Libraries.

DOI: https://doi.org/10.1001/jama.2025.15808
Bioinform Adv. 2026 ;6(1): vbag116

VectorSage: enhancing PubMed article retrieval with advanced semantic search.

Yasas Wijesekara, Rahul Brahma, Mehdi Lotfi, Marcus Vollmer, Lars Kaderali.

Motivation: The exponential growth of academic literature has presented unprecedented opportunities. However, it also underscores the need for advanced search methodologies to support efficient knowledge discovery. While effective for structured queries, traditional keyword-based search engines often struggle with the inherent variability of language, where the same concept can be expressed in many ways, leading to incomplete or imprecise retrieval of relevant research. Another issue that must be considered is that of lexical ambiguity, such as polysemy or homonymy, whereby several words and abbreviations can have multiple meanings. This results in items placed in the results list that are irrelevant to the search context. Recent advances in natural language processing have enabled semantic similarity techniques that move beyond basic text matching toward context-aware search.
Results: We developed VectorSage (https://vectorsage.nube.uni-greifswald.de), an advanced biomedical search system for retrieving PubMed abstracts using a hybrid approach that combines term relevance scoring with embedding-based semantic similarity. VectorSage employs a global ranking mechanism to enhance further search relevance by sorting the retrieved documents, ensuring a balance between semantic relevance and keyword specificity. This method enables efficient literature exploration and knowledge discovery.

DOI: https://doi.org/10.1093/bioadv/vbag116
Orv Hetil. 2026 May 17. 167(20): 805-808

[Science and library in one life - János Marton (1943-2018)].

Lívia Vasas.

DOI: https://doi.org/10.1556/650.2026.HO2882
Med Ref Serv Q. 2026 May 18. 1-19

Curriculum collection alignment in pharmacy education: insights from Namibia for global practice.

Menete N Shatona.

  This study evaluated the alignment of the University of Namibia Library collection with the updated Bachelor of Pharmacy curriculum to assess its adequacy and responsiveness to competency-based education. Through curriculum mapping and a policy-based 1:20 adequacy ratio, the review identified a strong alignment across anatomy, pharmacology, microbiology, and pharmaceutical practice. While two core microbiology texts surpassed the adequacy thresholds, applied pharmaceutical practice as well as pharmaceutical compounding and dispensing fell slightly short.However, gaps remain in pharmaceutics, toxicology, pharmacokinetics, and pharmaceutical calculations. Addressing these gaps requires closer collaboration between librarians and faculty, as well as increased investment in digital and open-access resources.

Keywords:  Competency-based education; Namibia; curriculum alignment; digital access; library collection development; pharmacy education

DOI:  https://doi.org/10.1080/02763869.2026.2657364
Cesk Slov Oftalmol. 2026 ;82(Ahead of Print): 1-5

Evaluation of AI Citation Accuracy in Anterior Segment Research.

Mustafa Civelekler, Mehmet Çıtırık.

   AIMS: To conduct a pilot evaluation of the citation accuracy of four contemporary artificial intelligence (AI) models - ChatGPT (OpenAI GPT-5.1), Copilot (Microsoft Copilot 4.2), DeepSeek (DeepSeek-R1), and Gemini (Google Gemini Ultra 2.5) - in generating PubMed-style references for corneal, conjunctival, and eyelid disease research, and to identify common error patterns.
MATERIAL AND METHODS: Thirty-five standardized clinical paragraphs were selected from The Review of Ophthalmology (4th edition). Each AI model was prompted to generate AMA 11-style references relevant to the provided text, simulating a literature retrieval task. Generated citations were assessed for accuracy, DOI matching, and clinical relevance. In a second validation phase, citations were independently reviewed by two ophthalmology experts and classified as fully cited, partially cited, or not cited. Statistical comparisons of accuracy proportions among models were performed using chi-squared tests.
RESULTS: DeepSeek demonstrated the highest citation accuracy (78.6%, 22/35), followed by ChatGPT (51.4%, 18/35), and Copilot (51.4%, 18/35). Gemini showed the lowest accuracy (12.9%, 5/35). Differences in accuracy rates across models were statistically significant (χ² = 19.0, df = 3, p < 0.001). Expert validation confirmed DeepSeek's relative advantage, with 42.9% (15/35) of its references classified as fully cited, compared with Copilot (20.0%, 7/35), ChatGPT (11.4%, 4/35), and Gemini (11.4%, 4/35). The most frequent error types were DOI mismatches and the generation of irrelevant or unverifiable references.
CONCLUSION: This pilot study indicates that contemporary AI models, particularly those like DeepSeek, show potential in assisting with citation generation. However, the observed error rates, including instances of hallucination, remain substantial. These findings underscore that rigorous human verification is indispensable when using AI for academic referencing in specialized medical fields, and highlight the need for continuous, version-specific benchmarking as these tools evolve.

Keywords:  artificial intelligence; citation accuracy; corneal disease; conjunctival disorders; eyelid diseases; large language models

DOI:  https://doi.org/10.31348/2026/21
J Med Internet Res. 2026 May 20. 28 e69529

Development of a Validated Lay Checklist (Info Without Side Effects) for Assessing Health Information on Websites: Mixed Methods Study.

Ursula Griebler, Irma Klerings, Christina Koscher-Kien, Benedikt Lutz, Eva Krczal, Dominic Ledinger, Iris Mair, Robert Emprechtinger, Filiz Keser Aschenberger, Bernd Kerschner.

   Background: The internet has become a major source of health information; yet, the quality of health information on websites varies considerably. Users' ability to evaluate either the factual accuracy or the trustworthiness of health information on websites is limited, as around half of the European people have limited health literacy. Existing checklists and tools are either prepared for research purposes or to be used by health care professionals. They do not account for the lay user perspective, since they are too long and complicated to be used by laypersons, or were developed for printed health information only.
Objective: The aim of the study was to develop and validate a checklist that enables laypersons to evaluate the trustworthiness of health information on websites without requiring prior training.
Methods: We used a multistage mixed methods approach including (1) a comprehensive literature review to identify existing tools and quality criteria, (2) an expert Delphi study with 6 specialists in patient communication and health information, (3) 2 rounds of cognitive interviews with 19 lay users, (4) application testing on 15 selected web pages with information about health interventions with 20 additional lay users, (5) a determination of the factual correctness of 100 web pages with health information by assessing the difference between the claimed and factual strength of the evidence on these web pages, and (6) validation testing by research team members on these 100 web pages using a Bayesian logistic regression model to analyze the predictive validity. In the final step, we integrated all quantitative and qualitative results to select the final checklist items.
Results: From an initial pool of 1740 items extracted from 73 documents, we systematically reduced the list through multiple evaluation and testing rounds. To ensure the checklist is user-friendly, we involved a diverse group of potential users. The final product, the Info Without Side Effects (iWISE) checklist, contains seven items that assess key aspects of health information trustworthiness, including the absence of advertising, balanced presentation of information, the limited use of professional jargon, origination from an independent organization, citation of sources, mention of scientific validation, and the presence of a publication date. The checklist demonstrated the ability to distinguish between evidence-based and nonevidence-based health information web pages in the German language: the validation testing showed that when all the items were marked with yes, there was a nearly 100% probability that the health information was also factually correct.
Conclusions: The iWISE checklist represents a user-friendly, validated tool for evaluating the trustworthiness of health information about interventions on websites. With only 7 items, it is easy to remember and could significantly improve critical health literacy. Future research should test its reliability for social media posts and health information videos.

Keywords:  checklist; critical health literacy; health information; health information websites; online health information

DOI:  https://doi.org/10.2196/69529
Patient Educ Couns. 2026 May 16. pii: S0738-3991(26)00226-0. [Epub ahead of print]149 109693

Exploring health information needs and preferences of culturally and linguistically diverse (CALD) women during pregnancy: A mixed methods study.

Nitya Nagesh, Bonnie R Brammall, Thara Govindaraju, Rebecca L Madill, Gretchen Coombs, Daphne Flynn, Myra Thiessen, Helena J Teede, Cheryce L Harrison.

   OBJECTIVES: Pregnancy is associated with increased health information needs. Migrant, culturally and linguistically diverse (CALD) women face increased barriers accessing credible, timely and culturally appropriate health information. We explored health information preferences and navigation of digital information to identify current gaps and opportunities to better meet the needs of this population during pregnancy.
METHODS: A mixed-methods design explored pregnancy-related information needs, preferences and engagement with digital health resources. Semi-structured interviews were analysed using inductive thematic analysis, and findings were triangulated with quantitative findings to explore common and emerging themes.
RESULTS: Overall, 17 participants, with a mean age of 32.2 (4.5) years, born in South, East and Southeast Asia, were recruited. While primary healthcare providers were considered the most trustworthy source of information, health information was primarily sourced digitally for reasons including ease of readability (77%) and, interactive and engaging features (59%). Thematic analysis of interviews identified themes: of 1/ blended use of informal and formal sources for health information, 2/ cultural preferences in maternal health information, 3/ the influence of cultural beliefs and practices on maternal health behaviours and decision making, 4/ visual and interactive digital resources as a learning preference and 5/ accessing information between antenatal care interactions.
CONCLUSIONS: migrant women of CALD background prefer easily accessible, visual, interactive, and culturally tailored online health information during pregnancy. Results emphasise the role of healthcare providers as a conduit to credible and reliable digital information. Co-designing digital health tools is critical in this population to promote culturally responsive, relatable and engaging health information during pregnancy.
PRACTICE IMPLICATIONS: Findings reinforce the need for health systems and practitioners to co-design and co-produce equitable and accessible pregnancy health resources that meet the needs of all women during pregnancy. This includes ensuring resources not only consider language translation, but ensure information is culturally appropriate, inclusive, relatable, trustworthy and engaging.

Keywords:  Culturally and linguistically diverse; Digital health; Digital health engagement; Digital health resources; Maternal health; Pregnancy health information

DOI:  https://doi.org/10.1016/j.pec.2026.109693
BMC Res Notes. 2026 May 20.

Pharmacy students' use of e-learning resources: a repeated cross-sectional study in 2021 and 2026.

Seeba Zachariah, Alaa Fouda, Zahraa Altamimi, Asma Qayyum, Fatimaalzhra Hayder, Juny Sebastian.

   OBJECTIVE: Understanding pharmacy students' engagement with e‑learning resources is essential for supporting evidence‑based learning. The COVID‑19 pandemic temporarily intensified reliance on digital materials, creating a unique opportunity to examine how usage patterns evolve once mandatory digital access ends. This study compared pharmacy students' use and preferences for e‑learning resources in 2021, when digital access was required, and in 2026, when usage was voluntary within a more mature digital ecosystem.
RESULTS: A repeated cross‑sectional survey was administered to all students enrolled in a five‑year entry‑to‑practice pharmacy program at a medical university in the United Arab Emirates. Data were collected in November-December 2021 and January-February 2026. A total of 171 students participated in 2021 and 134 in 2026. While core reading attitudes remained stable, digital engagement increased significantly over time (p = 0.01). Students in 2026 reported more frequent e‑library use, greater electronic reading time, and substantially higher utilization of AccessPharmacy (22.2% to 67.2%; p < 0.001), ClinicalKey (25.1% to 42.5%; p = 0.001), and UpToDate (36.3% to 81.3%; p < 0.001). Perceptions of accessibility and learning effectiveness also shifted toward e‑books. These findings indicate sustained and growing reliance on digital learning resources well beyond the pandemic period.

Keywords:  Digital learning; Digital library; E-learning; E-resources; Pharmacy education; Student preferences

DOI:  https://doi.org/10.1186/s13104-026-07878-4
Digit Health. 2026 Jan-Dec;12:12 20552076261452415

Influence of smart device use, online health information seeking, and self-perceived health on health-promoting behaviors among university students.

In-Kyoung Kim.

   Objectives: This study investigated the association of smart device usage-particularly smartwatch use-online health information utilization, and subjective health status on health-promoting behaviors among university students in South Korea. As digital technologies are increasingly integrated into daily routines, an individual's capacity to access and apply health information using mobile and wearable devices represents an important factor related to healthy behaviors.
Methods: An online survey was administered to university students in urban settings to evaluate the associations among smart device usage patterns, utilization of online health information, subjective health status, and health-promoting behaviors. Collected data were subjected to descriptive statistics, correlation analysis, and multiple regression to determine the key factors associated with of health-promoting behaviors.
Results: The study found that increased utilization of online health information and more favorable subjective health status correlated with greater participation in health-promoting behaviors. Furthermore, both the frequency and duration of health information searches on smartphones, together with the main smartwatch functions utilized, demonstrated significant associations with these behaviors. The results underscore that proactive use of digital health tools is associated with enhanced personal health management.
Conclusions: The findings of this study underscore the significance of advancing digital health literacy and fostering efficient utilization of wearable devices such as smartwatches in relation to self-care and healthy living among university students. This evidence may support the development of targeted digital health promotion initiatives and policies designed for the modern digital environment.

Keywords:  health behaviors; health information seeking behavior; self perceived health; smart device; students

DOI:  https://doi.org/10.1177/20552076261452415
J Med Internet Res. 2026 May 19. 28 e88110

Structural Inequalities in Online Health Information Seeking: Cross-National Multilevel Study.

Petra Raudenská, Elena Link.

   Background: Online health information-seeking behavior (OHISB) has become an increasingly common component of contemporary health self-management. Individuals use a wide range of digital sources, including websites, social media platforms, and mobile apps, to obtain health-related information. However, substantial disparities persist in who seeks health information online, and which populations benefit from digital health resources. While previous research has largely focused on individual-level determinants, cross-national evidence on structural influences remains limited.
Objective: This study aims to assess between-country variation in OHISB, examine associations between individual-level characteristics and OHISB, and investigate how country-level structural conditions are associated with cross-national differences in OHISB, net of individual-level characteristics.
Methods: Data were drawn from the Health and Health Care II module of the International Social Survey Programme (ISSP 2021-2024; n=35,592; 32 countries). OHISB was measured as any use of the Internet to search for health-related information during the past 12 months. Multilevel logistic regression models were estimated. Country-level indicators were reduced using principal component analysis into 4 composite indices. Robustness checks included analyses excluding respondents without internet access and models incorporating survey weights.
Results: OHISB varied substantially across countries (intraclass correlation coefficient=0.177). At the individual level, younger age, higher education, female respondents, recent health problems, doctor visits, unmet medical needs, and perceived usefulness of the internet were associated with higher odds of OHISB. At the macro level, the socioeconomic and health development showed the strongest association (odds ratio=1.52 per SD; P=.003) and explained a substantial share of between-country variation. Cultural hierarchy-individualism was associated with OHISB in separate models but attenuated when adjusted for development. Cross-level interactions indicated that the gender gap and the role of perceived usefulness were more pronounced in higher-development contexts, although these findings were exploratory.
Conclusions: OHISB is associated with both individual characteristics and broader structural conditions. Socioeconomic and health development appear to play a key contextual role in shaping cross-national differences in digital health engagement, highlighting the importance of addressing both individual and structural dimensions of digital health inequalities.

Keywords:  ISSP; International Social Survey Programme; cross-national comparison; digital divide; digital health; health inequalities; multilevel analysis; online health information seeking; socioeconomic development; structural determinants

DOI:  https://doi.org/10.2196/88110
Mhealth. 2026 ;12 17

Evaluation of AI-generated versus registered dietitian-authored nutrition responses: a cross-sectional study.

Kathryn Ayres, Maryam Nadery, Pia Henfridsson.

   Background: Artificial intelligence (AI) has emerged as a potential tool in nutrition counseling, but its performance compared with registered dietitians (RDs) remains unclear. This study aimed to compare the clinical quality, empathy, and readability of nutrition responses by a large language model (LLM, ChatGPT-4o) with those provided by licensed RDs, as assessed by RD evaluators.
Methods: In this cross-sectional study, 100 nutrition-related questions were selected from public online forums where RDs had provided answers. Each question was paired with an AI-generated response. Licensed RDs (n=8), blinded to the source, rated responses for quality and empathy (5-point Likert scales) and overall performance (0-100). Readability was assessed using the Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), syllables per word, and words per sentence. Statistical analyses included independent two-tailed t-tests, z-tests for proportions meeting a threshold for "acceptable" (≥4), Pearson correlations, and sensitivity analyses by response length.
Results: AI-generated responses scored higher than RD-authored responses for quality (4.48±0.31 vs. 2.56±0.76; P<0.001), empathy (4.62±0.37 vs. 3.21±0.62; P<0.001), and overall performance (91.10±5.38 vs. 66.83±14.71; P<0.001). AI scores clustered at the upper end, while RD scores were more variable. Quality and empathy were not correlated for AI (r=-0.10, P=0.32) but showed a moderate positive correlation for RDs (r=0.37, P<0.001). Nearly all AI responses met the ≥4 threshold for quality (96%) and empathy (97%), compared with few RD responses (3% and 14%; P<0.001). Word count did not differ, and longer RD responses were not associated with higher ratings. RDs' responses were more readable, with higher FRES (53.5±13.5 vs. 46.2±12.5; P<0.001), and simpler vocabulary (1.60±0.1 vs. 1.73±0.1 syllables/word, P<0.001), though both groups averaged a 10th-grade level on the FKGL, exceeding Centers for Disease Control and Prevention (CDC) and National Institutes of Health (NIH) recommendations.
Conclusions: AI-generated nutrition responses demonstrated consistently high perceived quality and empathy, independent of length, while RD-authored responses display greater variability but higher readability. These findings highlight both the promise and the limitations of LLMs in nutrition counseling, suggesting that AI may complement, but not replace, human expertise, provided that accuracy, transparency, and professional oversight are maintained.

Keywords:  Artificial intelligence (AI); chatbot; empathy; nutrition counseling; registered dietitian (RD)

DOI:  https://doi.org/10.21037/mhealth-2025-70
Cancer Control. 2026 Jan-Dec;33:33 10732748261454474

Trust and Technology Acceptance: Comparing Traditional Search Engines and Artificial Intelligence for Colorectal Cancer Information Seeking.

Brad Love, Charulata Ghosh, Weijia Shi, Karly Quaack, Michael Mackert.

  IntroductionThe integration of artificial intelligence (AI) into health information seeking is transforming health promotion. Understanding how users accept and trust these communication technologies is critical for health communication and cancer control. This study examined how the Technology Acceptance Model II (TAM II) applies to colorectal cancer information seeking, comparing link-based search (e.g., Google search) versus generative response paradigms (e.g., ChatGPT/AI) while examining trust, perceived threat, and contextual factors in technology use decisions.MethodsA prospective, randomized 2×2 factorial experiment was conducted with 764 Texas adults randomly assigned to conditions to view either Google search results or ChatGPT responses for colorectal cancer symptoms, presented in either high-concern or low-concern scenarios. Participants completed validated measures including TAM II constructs adapted from Davis (1989) and Kamal et al (2020), multidimensional trust scales, Extended Parallel Process Model threat measures (Witte, 1992), and technology-related stress items, all demonstrating acceptable reliability (α > .77). Data analysis included two-way ANOVAs, correlation analysis, and stepwise regression modeling.ResultsGoogle search received significantly higher ratings than AI across all Technology Acceptance Model II constructs. Technology preferences appeared to reflect multiple factors including interface familiarity, trust in information sources, and usability expectations, with traditional search benefiting from established user mental models and transparent source attribution. Trust emerged as the strongest predictor of behavioral intention. No significant main effects were found for concern level, and no interaction effects emerged between technology type and concern level, indicating that technology preferences remained consistent regardless of symptom severity.ConclusionsFor cancer control and prevention, these findings suggest that patients seeking colorectal cancer symptom information may be more likely to trust and act upon traditional search results than AI-generated responses, focusing on technology use intentions for health information seeking that directly inform cancer screening and care-seeking behaviors, potentially affecting screening behaviors and care-seeking timing. Current AI implementations may not optimally serve health information needs with lower acceptance potentially related to limited source transparency and increased cognitive demands compared to familiar search interfaces, as suggested by preference patterns. Cancer control professionals should anticipate that the growing integration of AI into health information seeking may influence the public's cancer symptom evaluation and screening behaviors.

Keywords:  Artificial Intelligence (AI); Colorectal Cancer (CRC); Technology Acceptance Model (TAM); digital health literacy; health information seeking; trust

DOI:  https://doi.org/10.1177/10732748261454474
Cleft Palate Craniofac J. 2026 May 20. 10556656261451939

Quality of Responses Generated by Artificial Intelligence Chatbots for Frequently Asked Questions by Caregivers of Presurgical Nasoalveolar Molding Therapy.

Madhanraj Selvaraj, J Monisha, Bhaskar Nivethitha, Aditi Bedi, Tamal Das, Balasubramanian Madhan.

  ObjectiveThe study compared the quality of responses generated by three artificial intelligence chatbots (AICs) for frequently asked questions (FAQs) by caregivers of presurgical nasoalveolar molding (PNAM) therapy.Material and MethodsTwenty-three FAQs on PNAM were posed to WhatsApp Meta AI (Llama 4), ChatGPT-4o, and Gemini 2.5 Flash, under the same conditions. Their responses were evaluated and compared for accuracy, completeness, reliability (Modified DISCERN Score), readability (Flesch-Kincaid readability ease [FKRE] score, simple measure of Gobbledygook [SMOG] index), and global quality score (GQS) by three Orthodontists.ResultsThe responses from Gemini and ChatGPT were more accurate than Meta (medians of 5.67, 5.33, and 5, respectively; P < .001). While Gemini outperformed others in completeness (median of 3 vs 2.33, P < .001) and reliability (means of 3.41 ± 0.27, 3.13 ± 0.24, and 2.98 ± 0.58, P < .001), Meta's responses were more readable (mean FKRE of 43.2 ± 7.97, 39.9 ± 10.3, 36.7 ± 9.17, and SMOG of 9.99 ± 1.32, 11.24 ± 2, 12.46 ± 1.4). For global quality, Gemini fared best, followed by ChatGPT and Meta (median GQS of 4.67, 4.33, and 3.67, respectively; P < .001).ConclusionsAll the AICs performed well in terms of accuracy, moderately in completeness and reliability, and sub-optimally in readability. Meta AI showed comparatively lower accuracy and completeness, but better readability than the other two AICs. These highlight the potential use of AI Chatbots as adjunct tools for caregiver education on PNAM and the need to optimize the content before clinical use.

Keywords:  ChatGPT; Gemini; WhatsApp; artificial intelligence; nasoalveolar molding

DOI:  https://doi.org/10.1177/10556656261451939
Front Public Health. 2026 ;14 1646475

Large language models and child mortality: opportunities and challenges in answering public queries on under-5 causes.

Yi Yang, Tingxi Zhu, Hongju Chen, Yao Zhang, Chao Zhang, Dengjun Liu, Yanxia Mao, Jiaxi Wu, Tao Xiong.

   Background: Reducing under-5 mortality remains a global health priority. Large language models (LLMs) are increasingly used by the public to access medical information. However, current evidence evaluating LLMs' performance in public-facing child health communication is scarce.
Methods: We selected the top five search terms related to each of the five leading causes of under-5 mortality (prematurity, pneumonia, birth asphyxia, malaria, and diarrhoea) using Google Trends, generating 25 representative public queries. Responses were collected from four LLMs (ChatGPT-4.0, Claude 3.5 Sonnet, Bing AI, and Gemini) and independently evaluated by four pediatricians. We used the DISCERN instrument for information reliability; 5-point Likert scales for accuracy, completeness, and comprehensibility; Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) indices for readability; and the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) for understandability and actionability. Differences among models were evaluated with Kruskal-Wallis and ANOVA tests, with statistical significance set at p < 0.05.
Results: We found significant performance variations among the four models across most evaluation metrics. Bing AI achieved the highest total DISCERN score (median 42) and the highest reliability subscore (Section A median 28). Claude consistently underperformed across multiple domains. Notably, readability was poor for all models, with high language complexity (mean FKGL score 12.4). Critically, actionability scores were near zero for all models on the PEMAT-P scale, reflecting a universal lack of clear and practical behavioral guidance.
Conclusion: While LLMs can generally provide accurate health information, limitations in readability and actionability restrict their practical application in public health communication. Future development should prioritize language simplification and clearer behavioral guidance to enhance their value in public-facing child health communication.

Keywords:  accuracy; actionability; child mortality; comprehensiveness; large language models; pediatrics; public health communication; quality

DOI:  https://doi.org/10.3389/fpubh.2026.1646475
BMC Oral Health. 2026 May 22.

Accuracy, readability, and content coverage of AI-generated responses to questions on functional appliances.

Serene A Badran, Snigdha Pattanaik, Sarah Jumaah, Abdulrahman Salmeh, Meena Al-Saadi, Taiba Al-Mizban, Nadia Al-Zaidi, Abdullah Hanifa, Mahmoud K Al-Omiri.

   BACKGROUND: Patients increasingly rely on Large Language Models (LLMs) for health information, yet the accuracy and readability of AI-generated dental advice remain variable across different clinical domains and Artificial Intelligence (AI) models. This study therefore aimed to compare the readability, accuracy, and comprehensiveness of responses generated by four leading AI models (ChatGPT-4o Mini, ChatGPT-5, Google Gemini 2.5 Flash, and DeepSeek V3) to patient questions on functional appliances.
METHODS: Thirty-eight frequently asked questions were identified using a structured Google search and categorized into three domains: "treatment fundamentals and general information," "lifestyle and practical concerns," and "appointments and long-term results". Each question was independently answered by the four AI models. Readability was assessed using the Flesch-Kincaid tools. Accuracy and comprehensiveness were independently rated by two blinded orthodontists.
RESULTS: AI-generated responses were generally accurate and comprehensive but difficult to read, requiring college-level literacy. ChatGPT-5 produced the lowest readability scores (most difficult-to-read responses; P < .001). Although Gemini 2.5 Flash achieved the highest comprehensiveness scores across all three domains, these differences were not statistically significant. Treatment-related questions yielded lower readability scores than lifestyle-related queries across all models (P < .001). No single model demonstrated superior performance across all evaluated domains.
CONCLUSION: AI-generated information on functional appliances was generally accurate and comprehensive but often exceeded recommended patient literacy thresholds. Readability must be considered alongside informational quality when deploying AI tools for patient education.

Keywords:  Accuracy; Artificial intelligence; ChatGPT; Functional Appliances; Gemini; Health Literacy; Large Language Models; Orthodontics; Readability

DOI:  https://doi.org/10.1186/s12903-026-08616-9
Int J Occup Saf Ergon. 2026 May 17. 1-9

Assessing ChatGPT's responses to office ergonomics and spine health questions.

Caglayan Pinar Ozturk, Tahir Keskin, Ferdi Baskurt.

  The objective of this study was to assess ChatGPT's responses to common office ergonomics and spine health questions. ChatGPT was asked the 50 most frequent questions about 'office ergonomics and spine health' and evaluated by 10 experts on a scale of 1-4 points. The consistency between the experts' ratings was calculated with the intraclass correlation coefficient (ICC). The Flesch-Kincaid reading level was used to evaluate the readability of answers. The results indicated that ChatGPT demonstrated the capacity to provide generally satisfactory information concerning ergonomics and spinal health in office workers, with good inter-rater reliability (ICC = 0.838) as well as a readability level equivalent to a middle school level. This study indicated that although the responses were considered adequate, that they cannot replace experts' clinical judgment, particularly in the personalization of exercise prescriptions and the provision of individualized and comprehensive approaches.

Keywords:  ChatGPT; ergonomics; generative artificial intelligence; office worker; spine

DOI:  https://doi.org/10.1080/10803548.2026.2668216
BMC Oral Health. 2026 May 16.

A bilingual evaluation of chatbot performance in bruxism-related information: accuracy and readability across two models.

Merve Cennet Altuntaş, Gülbahar Erdinç Akyol.

   BACKGROUND: Artificial intelligence (AI) chatbots have increasing applications in healthcare; however, their accuracy and readability across different languages remain unclear. Therefore, this study aimed to compare the performance of ChatGPT-5 and DeepSeek-V3 on bruxism-related questions in English and Turkish.
METHODS: Responses generated by ChatGPT-5 and DeepSeek-V3 to 20 questions were evaluated in Turkish and English, yielding four chatbot-language conditions. Accuracy was independently scored by two prosthodontists using a Modified Global Quality Score, and inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). Readability was defined as the ease of reading and was assessed using the Flesch Reading Ease (FRE) scale for English texts and the Ateşman readability formula for Turkish texts. Descriptive statistics were calculated, differences in accuracy across conditions were analyzed using the Friedman test, and the relationship between accuracy and readability was examined using Spearman's rank correlation coefficient.
RESULTS: Accuracy scores of responses generated by ChatGPT-5 and DeepSeek-V3 were generally high across both Turkish and English. No statistically significant differences in accuracy were observed among the four model-language combinations based on the Friedman test (χ² (3) = 3.204, p=0.361). Inter-rater reliability analysis demonstrated moderate agreement for single measures and good agreement for average measures between the two evaluators (ICC = 0.623 and 0.767, respectively). Comparison of Atesman readability scores and FRE-based English readability analyses revealed no statistically significant differences between the two chatbots, as confirmed by Mann-Whitney U tests (Turkish: p = 0.646; English: p = 0.745).
CONCLUSION: ChatGPT-5 and DeepSeek-V3 demonstrated comparable accuracy in bruxism-related responses across Turkish and English. However, high readability levels limit patient accessibility, and no association was found between accuracy and readability, indicating that simplification is necessary for effective patient education.

Keywords:  Artificial intelligence; Bruxism; Conversational agents; Health communication; Multilingualism

DOI:  https://doi.org/10.1186/s12903-026-08581-3
J Craniofac Surg. 2026 May 18.

An Assessment of ChatGPT Responses to Common Postoperative Rhinoplasty Questions.

Nagehan Dilşad Erdoğmuş Küçükcan, Rana Kapukaya, Hülya Binokay.

  Rhinoplasty is a commonly performed aesthetic and functional procedure, and patients frequently seek postoperative information beyond routine follow-up visits. With the increasing use of artificial intelligence-based tools for health-related information, ChatGPT has emerged as a widely accessed conversational platform; however, evidence regarding the quality, reliability, and clinical appropriateness of its responses in the postoperative rhinoplasty setting remains limited. This methodological, cross-sectional study evaluated ChatGPT responses to 15 standardized postoperative rhinoplasty questions submitted in Turkish using a single publicly available model version. Responses were recorded verbatim and independently assessed by a multidisciplinary panel of 10 experienced clinicians using a structured scoring framework evaluating accuracy, completeness, safety, and patient-friendly language. Interrater reliability was analyzed using intraclass correlation coefficients. Overall, ChatGPT responses were generally rated as patient-friendly and conservative in tone. The mean total score indicated moderate overall quality, with higher scores observed for safety and patient-friendly language, while greater variability was noted in accuracy and completeness, particularly for questions related to activity resumption and procedure-specific postoperative precautions. Single-measure interrater reliability was low, whereas average-measure reliability improved substantially when ratings from all evaluators were considered. These findings suggest that ChatGPT may serve as an accessible and patient-friendly source of postoperative information following rhinoplasty; however, variability in accuracy and completeness underscores the importance of clinician oversight. ChatGPT should therefore be regarded as a supplementary informational resource rather than a substitute for individualized surgeon-patient communication.

Keywords:  Artificial intelligence; ChatGPT; patient education; postoperative care; rhinoplasty

DOI:  https://doi.org/10.1097/SCS.0000000000012905
Menopause. 2026 Jun 01. 33(6): 757

Letter: Evaluation of the accuracy and readability of large language model responses on menopause and hormone therapy.

Aya Mudrik.

DOI: https://doi.org/10.1097/GME.0000000000002791
Lymphology. 2026 ;59(1): 27-34

Digital Experts in Lymphedema: Assessing the Quality and Readability of Responses from ChatGPT and Gemini.

C S Pirincci, E Cihan, E Uzelpasaci, A B Pacaci.

This study evaluated the quality and readability of responses provided by ChatGPT and Gemini to frequently asked questions related to lymphedema. Ten frequently asked questions about lymphedema were selected by expert therapists for submission to ChatGPT and Gemini. The initial responses from ChatGPT and Gemini were recorded without follow-up queries. Five independent experts (therapists) specialized in this field evaluated responses from ChatGPT and Gemini using a four-point rating scale. Readability levels were analyzed using the Flesch-Kincaid Grade Level through WordCalc software. Results indicated that except for questions 3 and 6, ChatGPT and Gemini provided similar responses, with significant differences observed in those two questions (p = 0.014). ChatGPT's answers to questions 1, 2, 3, 4, 8, 9, and 10 were found to be more readable than Gemini's, while Gemini's answers to questions 5, 6, and 7 were more readable. Overall, approximately 70% of ChatGPT's responses were found to be easier to read compared to Gemini. The results of this study indicate that both AI search engines offer comparable responses to research questions. However, when evaluating clarity and accessibility of the answers, ChatGPT was found to be more understandable and user-friendly than Gemini.

Keywords: Artificial intelligence; Cancer; ChatGPT; Gemini; Lymphedema
Health Educ Res. 2026 Mar 31. pii: cyag015. [Epub ahead of print]41(3):

Quality of online heatstroke information across four countries: a quantitative content analysis.

Shinya Ito, Emi Furukawa.

Effective heatstroke prevention depends not only on access to information but also on whether online health materials support understanding and actionable decision-making. However, evidence comparing the quality and presentation of online heatstroke information across countries remains limited. We conducted a quantitative content analysis of 120 publicly available heatstroke-related webpages from Japan, the United States, the United Kingdom, and Australia (30 per country), with one World Health Organization webpage included as a reference. Readability, information volume, credibility, and information presentation were assessed using established metrics, including the Patient Education Materials Assessment Tool for Print Materials. Cross-country comparisons and sensitivity analyses focusing on webpages targeting the general public were performed. UK webpages showed higher readability, whereas Japanese webpages more frequently incorporated visual aids and tools that facilitate preventive actions, resulting in higher understandability and actionability scores. However, actionability did not reach adequate levels in any country. Online heatstroke information differs substantially across countries. While readability improves accessibility, greater integration of visual and action-supporting design elements is needed to better promote preventive action.

DOI: https://doi.org/10.1093/her/cyag015
Digit Health. 2026 Jan-Dec;12:12 20552076261452401

Assessment of online information about osteomyelitis of jaw: An infodemiologic study.

Muath Saad Alassaf, Jawan Alyenbaawi, Zainab Alghamdi, Taif Shadi Alahmadi, Shatha Ahmed Alhujaily, Ashraf Abdelfattah, Ayyob Aboalkhair, Hasan Albeshir, Mahmood Samman, Omar Mohamed Mansour.

   Background: Osteomyelitis is a serious inflammatory disease affecting the bone and bone marrow that requires early recognition and management. As the internet and digital platforms increasingly become the primary sources of health-related information for patients, the quality and readability of online content are critical in influencing patient knowledge and decision-making. Despite this, the reliability and accessibility of osteomyelitis related online content remains insufficiently explored.
Objective: The aim of this study was to assess the quality and readability of osteomyelitis of the jaw related English websites and to classify them by their institutional affiliation.
Methods: A systematic online search was performed using Google search engine. The first 100 search results for 2 different terms were reviewed according to predefined inclusion and exclusion criteria. Eligible websites were analyzed using quality assessment tools, including the DISCERN instrument and JAMA benchmarks. Readability was evaluated using (FRES), (FKGL), and SMOG index. Websites were then classified according to their institutional affiliation: commercial, non-profit, medical/dental center, or governmental/university based.
Results: Among the 200 websites screened, 27 fulfilled the inclusion criteria. Most were affiliated with medical or dental centers, whereas governmental and university websites represented the lowest proportion. The readability analysis showed that most of the content was written at a complex level, exceeding the commonly recommended 6th to 8th grade level of reading. Additionally, more than half of the analyzed websites did not meet JAMA standards, particularly in terms of transparency and disclosure.
Conclusion: The study's findings revealed significant limitations in the quality and readability of web-based information addressing osteomyelitis. Considering the expanding dependence of patients on online health resources, healthcare providers and academic organizations should actively contribute to creating and promoting reliable, accessible, and evidence-based digital content. Implementing standardized evaluation instruments, including the DISCERN tool and JAMA benchmarks, can help to enhance both credibility and comprehensibility of online health information for public audiences.

Keywords:  DISCERN; JAMA benchmarks; infodemiology; jaw infections; online health information; osteomyelitis of the jaw; readability; web-based

DOI:  https://doi.org/10.1177/20552076261452401
BMC Med Educ. 2026 May 18.

Educational quality of YouTube videos on robot-assisted thymectomy: a LAP-VEGaS-based evaluation.

Okan Karataş, Ayşegül Güler, Nilay Çavuşoğlu Yalçın, Muharrem Özkaya.

   BACKGROUND: Video-based learning has become an integral component of surgical training, especially with the growing implementation of minimally invasive techniques such as robot-assisted thoracic surgery (RATS). However, the educational value of freely accessible online videos remains unclear. We aimed to assess the educational quality of the most-viewed YouTube videos on RATS thymectomy using a structured evaluation framework and to analyze the relationship between video-related parameters and educational performance.
METHODS: A structured search of YouTube was conducted on March 11, 2026, using the predefined search term "robotic thymectomy." The 100 most-viewed videos were screened, and 38 videos met the inclusion criteria. Videos were assessed by two experienced thoracic surgeons using the 9-item LAP-VEGaS (Laparoscopic Surgery Video Educational Guidelines) scoring system, which was applied as a structured framework for evaluating general surgical video educational quality. Video characteristics were recorded and analyzed in relation to educational quality.
RESULTS: The mean LAP-VEGaS score was 9.76 ± 3.42 (range: 4-16), and only 17 videos (44.7%) achieved a score ≥ 11, indicating acceptable educational quality. Most videos were uploaded by individual surgeons and were predominantly of 720p resolution. View count was positively correlated with likes and time since upload but showed no significant association with LAP-VEGaS score.
CONCLUSION: Among highly viewed English-language YouTube videos on RATS thymectomy, a considerable proportion did not meet expected standards for structured surgical education. Common popularity metrics, particularly view count, are not reliable indicators of educational value. Structured and standardized approaches to video creation should be promoted to enhance the overall quality and trustworthiness of online surgical education materials.

Keywords:  Educational quality; LAP-VEGaS; Online video platforms; Robotic surgery; Thymectomy

DOI:  https://doi.org/10.1186/s12909-026-09429-8
Hernia. 2026 May 20. pii: 223. [Epub ahead of print]30(1):

Quality and educational value of youtube videos on inguinal hernia surgery: a cross-sectional study.

Volkan Sayur, Can Uc, Bahadır Baki, Erkan Guler, Taylan Ozgur Sezer.

   BACKGROUND AND AIM: With the increasing use of digital platforms in surgical education, YouTube has become a widely accessible resource for trainees. However, the absence of peer review raises concerns regarding the reliability and educational value of its content. This study aimed to evaluate the educational quality, reliability, and instructional value of inguinal hernia repair videos on YouTube using multiple validated scoring systems.
MATERIALS AND METHODS: A systematic search was conducted on YouTube, and 50 videos meeting predefined inclusion criteria were analyzed. Videos were independently assessed by two blinded reviewers using the Global Quality Scale (GQS), Journal of the American Medical Association (JAMA) criteria, Video Power Index (VPI), Laparoscopic Video Educational Guidelines and Scoring (LAP-VEGaS), DISCERN, and Health on the Net (HONcode) criteria. Videos were also categorized according to their source.
RESULTS: Of the 50 videos, 42% were uploaded by individual users and 36% by academic or institutional sources. Most videos demonstrated laparoscopic or robotic procedures. Median scores indicated moderate educational quality (GQS 3, JAMA 3, LAP-VEGaS 11). Videos categorized as originating from academic or institutional sources tended to achieve higher scores; however, these findings should be interpreted with caution. No significant correlation was found between video popularity (VPI) and educational quality.
CONCLUSION: YouTube provides a widely accessible but variable educational resource for inguinal hernia surgery, with overall moderate quality even among selected videos. Video popularity alone does not reliably indicate educational value. Instead, viewers may benefit from prioritizing videos with structured step-by-step narration, clear visualization of key anatomical landmarks, transparent source identification, and inclusion of complication management. While YouTube may support learning as a supplementary tool, it may not adequately replace structured surgical training.

Keywords:  Inguinal hernia; LAP-VEGaS; Quality assessment; Surgical education; YouTube

DOI:  https://doi.org/10.1007/s10029-026-03721-8
J Back Musculoskelet Rehabil. 2026 May 16. 10538127261450317

Quality and educational value of TikTok videos on rehabilitation exercises for triangular fibrocartilage complex injuries: A cross-sectional study.

Shijie Fan, Sehrish Noor, Dheerav Praveen, Zhaodong Bi, Longguo Zhang.

  BackgroundWhile TikTok's surge in health-related material creates valuable opportunities for patient education, the platform's absence of structured peer-review mechanisms raises concerns regarding the quality of available content. Increasingly, patients rely on social media for guidance on self-managing conditions such as Triangular Fibrocartilage Complex (TFCC) injuries; however, the quality of this information has not yet been rigorously evaluated.ObjectiveThe aim of this research was to systematically assess TikTok videos demonstrating rehabilitation exercises for TFCC injuries, specifically evaluating their quality, reliability, and educational value.MethodsA cross-sectional study searched TikTok for "TFCC rehab", "TFCC exercises", and "TFCC training". A final 123 videos were analyzed. Quality and reliability were assessed using DISCERN, Global Quality Scale (GQS), and JAMA criteria. Educational content was evaluated using an adapted TFCC Exercise Education Score (TFCCEES). Video characteristics, engagement metrics, and uploader types were recorded.ResultsOverall quality was low (median DISCERN 28.00; 88.6% "poor" or "very poor"). Most videos (64.23%) were from non-health professionals. Health professionals scored higher on JAMA criteria (P < 0.001), but no significant differences were found in DISCERN, GQS, or TFCCEES between groups. Engagement metrics showed negligible correlations with quality scores. Video duration moderately correlated with DISCERN (ρ = 0.45).ConclusionsTFCC rehabilitation content on the Chinese version of TikTok is predominantly poor, regardless of uploader status. Popularity is not an indicator of quality, which may limit patients' ability to identify reliable rehabilitation guidance. Healthcare professionals should direct patients to validated resources and create high-quality social media content.

Keywords:  TFCC; TikTok; quality assessment; rehabilitation; social media; triangular fibrocartilage complex

DOI:  https://doi.org/10.1177/10538127261450317
BMC Oral Health. 2026 May 21.

Quality and guideline adherence of child-oriented toothbrushing videos on YouTube: a comparative study of Turkish and English content.

Merter Güçlü, Hatice Selin Güçlü.

   BACKGROUND: Early childhood caries remains a major global public health concern. As parents and children increasingly rely on digital platforms for health information, the quality and reliability of online educational resources become critical. Despite the abundance of general oral health content, studies specifically evaluating child-oriented toothbrushing instructions remain scarce. To address this gap, the present study evaluated and compared the quality, reliability, and guideline adherence of child-oriented toothbrushing videos in Turkish and English, highlighting potential disparities between global and localized digital health information ecosystems.
METHODS: A cross-sectional analysis of YouTube videos was conducted using predefined Turkish and English keywords. After applying exclusion criteria (e.g., entertainment-only and non-instructional videos), 51 videos (27 Turkish, 24 English) were included. Two independent reviewers assessed videos using the Global Quality Scale (GQS), DISCERN, and a 10-item guideline-based content compliance checklist. Engagement metrics (views, likes, viewing rate, and interaction index) were recorded, and associations between engagement and evaluation scores were examined.
RESULTS: English-language videos showed higher median GQS and DISCERN scores than Turkish-language videos (3.25 vs. 1.50, p = 0.042; and 3.00 vs. 2.00, p = 0.037, respectively). Guideline-based compliance scores were also higher in English videos (5.50 vs. 3.00, p = 0.018). Videos uploaded by animation channels demonstrated significantly lower quality and compliance scores compared with educational and health-related channels (p < 0.001). No significant correlations were observed between engagement metrics and GQS, DISCERN, or compliance scores (all p > 0.05).
CONCLUSION: The overall quality, reliability, and guideline adherence of child-oriented toothbrushing videos were limited, and Turkish-language content demonstrated lower evaluation scores than English-language content. The lack of association between engagement and evaluation scores indicates that popularity does not reflect informational quality, underscoring the need for greater involvement of dental professionals in producing evidence-based, age-appropriate digital educational materials. Actionable strategies, such as clinician-guided video recommendations and the future development of digital quality-labeling systems, may help guide families toward more reliable content.

Keywords:  Health information quality; Pediatric dentistry; Social media; Toothbrushing; YouTube

DOI:  https://doi.org/10.1186/s12903-026-08607-w
BMC Oral Health. 2026 May 18.

Quality of information about potentially malignant oral disorders on TikTok: a cross-sectional analysis in the context of social media.

López-Jornet Pia, Bennouna Layla, Parra-Perez Francisco, Pons-Fuster Eduardo.

   OBJECTIVE: Potentially malignant oral disorders (PMODs) can progress to oral cancer, making the quality of available health information especially relevant. This research aimed to analyze TikTok content related to potentially malignant oral disorders (PMOD), considering aspects such as source, duration, and content quality.
MATERIALS AND METHODS: A cross-sectional observational study was conducted to analyze TikTok videos related to potentially malignant oral disorders (PMODs), using the hashtags #leukoplakia, #lichen planus, #actinic cheilitis, #oral lichenoid lesions, and #erythroplasia. English- and Spanish-language videos were included; duplicates, unrelated, advertising, or silent videos were excluded. Video characteristics, engagement (views, likes, comments, saves, shares), viewing and interaction indices, and type of uploader were recorded. Two evaluators scored the content with DISCERN and the Global Quality Scale (GQS).
RESULTS: A total of 96 videos were analyzed. The largest group of creators were nano-influencers (1-10,000 followers; 58.3%). The DISCERN score had a median of 25 (IQR: 22-29), and most of the videos obtained very low Global Quality Scale (GQS) scores (median 2; IQR: 1-2). A very strong correlation was observed between DISCERN and GQS (r = 0.890, p < 0.001). Videos addressing actinic cheilitis obtained slightly higher scores, although the differences were not statistically significant.
CONCLUSION: The overall quality of TikTok videos on potentially malignant oral disorders was low. DISCERN and GQS scores showed a strong correlation, and quality was not related to popularity or engagement metrics. These findings highlight the importance of healthcare professionals contributing accurate information to social media, and the need to adapt or develop evaluation tools for video-based content.

Keywords:  Discern; Global Quality Scale (GQS); Leucoplakia; Oral lichen planus; Tiktok

DOI:  https://doi.org/10.1186/s12903-026-08636-5
Digit Health. 2026 Jan-Dec;12:12 20552076261443867

Short-form video sharing platforms as a source of information for children's growing pains: A cross-sectional content analysis study.

Dan Zhou, Zhuqing Ren, Jiayi Jiang, Haisu Li, Chuan Zhong.

   Objective: To evaluate the quality, reliability, and educational value of short-form videos pertaining to children's growing pains on popular social media platforms (TikTok, Rednote, Bilibili, and YouTube).
Methods: A cross-sectional analysis of 200 short-form videos (50 per platform) was conducted using standardized search terms. Video quality was assessed using four validated instruments: modified DISCERN (mDISCERN), the Global Quality Scale (GQS), the Video Information and Quality Index (VIQI), and the Patient Education Materials Assessment Tool (PEMAT). Metadata and user engagement metrics were collected, and statistical analyses included descriptive statistics, group comparisons, and correlation analyses.
Results: TikTok demonstrated superior performance compared to other platforms in reliability (mDISCERN: 3.00 (3.00, 4.00); GQS: 4.00 (3.00, 4.00); median (IQR); p < 0.001), educational value (PEMAT-Understandability: 80.00 (66.70, 92.22); PEMAT-Actionability: 80.00 (60.00, 80.00); median (IQR); p < 0.001), and content comprehensiveness (VIQI-Total score: 14.00 (12.25, 15.00); median (IQR); p < 0.001). Videos created by healthcare professionals showed significantly higher quality scores and more comprehensive clinical content coverage. User engagement metrics such as likes, comments, video duration, and followers showed positive correlations with several video quality scores (r: 0.12-0.56, p < 0.05). However, engagement alone should not be considered a definitive indicator of quality.
Conclusion: In conclusion, while short-form videos represent a valuable educational resource for parents, their quality varies significantly across platforms and creators. Content from healthcare professionals, particularly on TikTok, was found to be more reliable and robust. This underscores the critical role of platform algorithms in quality curation. Future initiatives should therefore encourage professional creator participation and optimize recommendation systems to prioritize informational accuracy.

Keywords:  content quality; growing pains; health communication; short-form videos; social media

DOI:  https://doi.org/10.1177/20552076261443867
BMC Anesthesiol. 2026 May 18.

Assessing the quality of general anesthesia-related short videos on TikTok: a cross-sectional study.

Yongbo Duan, Jun Kang, Chengjian Wang, Wenjun Yan.

   BACKGROUND: TikTok has become a significant source of health information for the public, yet misinformation on the platform may adversely affect health decision-making. To date, no study has systematically evaluated the quality of general anesthesia-related short videos on TikTok.
OBJECTIVE: This study aims to assess the quality and reliability of general anesthesia-related videos on TikTok using validated instruments and to examine correlations among video source, duration, user engagement, and content quality.
METHODS: We conducted a cross-sectional analysis of TikTok videos retrieved using the keyword "" (general anesthesia). Video quality was evaluated using the Global Quality Scale (GQS), modified DISCERN (mDISCERN), JAMA benchmark criteria, and a custom composite score assessing risk disclosure, procedural completeness, and accuracy. Uploaders were categorized by professional background. Statistical analyses included Holm-Bonferroni corrected non-parametric tests, Spearman correlations, and multivariable ordinal logistic regression adjusting for video duration.
RESULTS: Of 150 retrieved videos, 127 met the inclusion criteria. Overall quality was low-to-moderate: median GQS 2.0 (IQR: 1.0-3.0), JAMA 2.0 (1.0-2.0), mDISCERN 2.0 (2.0-2.0), and custom score 4.0 (3.0-4.0), with notably poor risk disclosure (median 0.0). Videos from anesthesia professionals scored highest across all metrics. Engagement metrics did not correlate with video quality. Video duration correlated positively only with GQS (r = 0.339, P < 0.001). Multivariable analysis confirmed that uploader background independently predicted quality; patients and the general public had significantly lower odds of producing high-quality content compared with anesthesia professionals (P < 0.001).
CONCLUSIONS: General anesthesia-related videos on TikTok exhibit substantial quality deficiencies, particularly in risk communication. Video popularity does not reflect scientific reliability. Given that professional background independently predicts content quality, healthcare professionals should actively create evidence-based content, and platforms must algorithmically prioritize authoritative sources to mitigate misinformation.

Keywords:  Content Quality Assessment; Digital Health; General Anesthesia; Health misinformation; Social Media; TikTok

DOI:  https://doi.org/10.1186/s12871-026-03920-x
Front Digit Health. 2026 ;8 1769121

Stratified and combined analysis of the quality of lumbar spinal stenosis-related videos on major Chinese short video platforms.

Wenhui Zhao, Xinwei Chen, Zeda Wang, Zhehao Xiao, JiLin Yang, Zhipei Huang, Yanwei Jiang, Risheng Liang, Rui Wang.

   Background: Lumbar spinal stenosis (LSS) is a degenerative disorder in which narrowing of the spinal canal compresses neural elements, causing pain, numbness, and limited mobility. With the rapid growth of Chinese short video platforms (TikTok, Bilibili, Xiaohongshu, Kwai, and WeChat), the public increasingly relies on short videos for LSS-related health information. However, the quality of such content has not been systematically evaluated.
Methods: This cross-sectional content analysis searched each of five platforms using "lumbar spinal stenosis," screened the top 100 results, and included 412 videos after applying predefined criteria. Basic characteristics and engagement metrics were extracted. Analyses were stratified by platform, uploader type, and video category. Video quality and reliability were assessed using the Global Quality Score (GQS), modified DISCERN (mDISCERN), and JAMA benchmark criteria. Spearman correlation analysis examined associations between video characteristics and quality scores.
Results: High-quality content was concentrated on Bilibili and WeChat, particularly scientific explanations and professional course videos uploaded by healthcare professionals, whereas Kwai showed consistently low GQS scores across uploader types and categories. Video duration was moderately and positively correlated with GQS (r = 0.423), mDISCERN (r = 0.340), and JAMA scores (r = 0.357; all p < 0.001). Follower count and most engagement metrics (likes, saves, shares) showed only weak correlations with quality.
Conclusions: Overall, the quality and reliability of LSS-related short videos on major Chinese platforms are suboptimal, with marked inter-platform variation. Content from non-professional uploaders and personal experience-focused videos tended to be of lower quality. Healthcare professionals and medical institutions should actively disseminate evidence-based LSS information via short video platforms, and viewers should preferentially seek credible, verifiable sources.

Keywords:  information quality; lumbar spinal stenosis; patient education; quality and reliability; short video

DOI:  https://doi.org/10.3389/fdgth.2026.1769121
Medicine (Baltimore). 2026 May 22. 105(21): e48941

Reliability and quality of cognitive impairment educational content on Douyin and Bilibili: A cross-sectional content analysis.

Yuzhang Liang, Yifeng Xie, Haiyan Song, Xiaoxuan Fan, Yike Liu, Ya Na, Jiamin Gao, Ying Ao, Chunyan Chen.

  This study aimed to evaluate the content characteristics, quality, and reliability of cognitive impairment educational videos on Douyin and Bilibili and examine whether video duration and user engagement are associated with information quality. This cross-sectional content analysis searched the Chinese domestic version of TikTok (Douyin; https://www.douyin.com/) and Bilibili (https://www.bilibili.com/) through their official web interfaces on March 12, 2026. Searches were performed in logged-out mode using the keyword "cognitive impairment" via a desktop browser. The first 150 results from each platform were screened, and 250 eligible videos were included (133 from Douyin and 117 from Bilibili). Video quality and reliability were evaluated with the Global Quality Score, modified DISCERN, Journal of the American Medical Association benchmark criteria, and Video Information and Quality Index. Overall educational quality was moderate. Across both platforms, content focused primarily on clinical manifestations and treatment, whereas epidemiology, diagnosis, and prognosis were insufficiently covered. Douyin videos had significantly higher Global Quality Score, modified DISCERN, Journal of the American Medical Association, and Video Information and Quality Index scores than Bilibili videos (all P < .001), despite being substantially shorter. Videos uploaded by doctors or other health professionals showed the highest quality and reliability, whereas videos uploaded by individual users generated stronger engagement. Correlations between engagement indicators and quality scores were weak, indicating that popularity did not reliably reflect educational value. Cognitive impairment videos on Douyin and Bilibili showed substantial variability in quality and incomplete content coverage. Professional participation, clearer source disclosure, and platform-level governance may improve the accuracy and practical utility of short-video health education on cognitive impairment.

Keywords:  Bilibili; Douyin; TikTok; cognitive impairment; information quality; short video

DOI:  https://doi.org/10.1097/MD.0000000000048941
Sci Rep. 2026 May 19.

A cross-sectional study on the quality of pediatric autism-related videos on short video platforms.

Jiayi Ou, Caixia Sun, Liwei Zhang.

  This study evaluated the information quality, content characteristics, and distribution patterns of short videos concerning pediatric autism on three major Chinese platforms: TikTok, Bilibili, and Rednote. Employing a cross-sectional design, we retrieved videos using the keyword ''pediatric autism'' and analyzed 279 eligible entries. We collected basic video characteristics and engagement metrics, while video quality and reliability were assessed using the Global Quality Scale (GQS), the modified DISCERN (mDISCERN) instrument, the Journal of the American Medical Association (JAMA) criteria, and the Video Information and Quality Index (VIQI). Overall video quality was moderate, with a median GQS score of 3, but information reliability was poor, indicated by a median JAMA score of 1. Significant inter-platform differences emerged (p < 0.05): Bilibili videos were the longest, TikTok had the highest proportion of uploaders who were healthcare professionals (69.89%) and exhibited greater user engagement, whereas Rednote was dominated by individual users and contained a higher proportion of low-quality videos. Information on treatment costs was notably insufficient across all platforms. In conclusion, the quality of pediatric autism-related health information on short-video platforms varies substantially by platform, revealing a mismatch between available information and user needs. Enhancing professional content review mechanisms and encouraging greater participation by healthcare professionals could improve the dissemination of practical and reliable health information.

Keywords:  Autism Spectrum Disorder; Cross-Sectional Study; Health Communication; Information Quality; Short Video

DOI:  https://doi.org/10.1038/s41598-026-53838-0
Medicine (Baltimore). 2026 May 22. 105(21): e48894

Analysis of content, quality, and reliability of acute cholecystitis-related Chinese videos on TikTok and Bilibili: A cross-sectional study.

Weiming Yu, Junjie Zhang, Nianyong Yuan, Guowei Li, Qunfeng Xia.

  The incidence of acute cholecystitis has risen markedly in recent years. With the growing use of platforms like TikTok and Bilibili for health information dissemination, the quality and reliability of their content on this condition remain unclear. This study aimed to evaluate the reliability and quality of educational short videos related to acute cholecystitis on TikTok and Bilibili using the modified DISCERN (mDISCERN) and Global Quality Score tools, and to analyze correlations with video characteristics and user engagement. On August 25, 2025, the top 100 videos from each platform were retrieved using the keyword "acute cholecystitis." After exclusions, 175 videos were analyzed. Basic features and engagement metrics were recorded. Nonparametric statistics and Spearman correlation were applied. Bilibili had significantly more high-quality videos. Specifically, 22.0% scored Global Quality Score ≥ 4 compared to 9.0% on TikTok (P < .05). Additionally, 47.0% scored mDISCERN ≥ 4 versus 7.0% on TikTok (P < .05). On TikTok, 95% of creators had medical qualifications and primarily focused on disease knowledge and surgery. User engagement metrics - including likes, comments, and shares - were significantly inter-correlated but showed no significant association with quality scores. The overall quality and reliability of educational short videos related to acute cholecystitis are poor. The public should critically evaluate such content. Creators are urged to produce evidence-based information, and platforms should enhance moderation to limit misinformation.

Keywords:  acute cholecystitis; bilibili; health communication; information quality; social media; tiktok

DOI:  https://doi.org/10.1097/MD.0000000000048894
Strategies Trauma Limb Reconstr. 2025 May-Aug;20(2):20(2): 90-93

Accuracy and Actionability of TikTok Content on Cosmetic Limb Lengthening: A Comparison between Healthcare Professional and Non-professional Sources.

Akram Al Ramlawi, Munir Sidani, Chelsea Missi, Michael Assayag, Philip K McClure.

   Introduction: In the modern digital era, social media platforms, such as TikTok, have become significant sources of medical information for the general public. This study explores the content of TikTok videos related to cosmetic limb lengthening to understand the quality and accuracy of information shared.
Methods: The 50 most-viewed English-language videos tagged with #Heightsurgery and #lengtheningsurgery on TikTok were analysed by two independent reviewers. The analysis covered creator demographics, video format, and predominant themes. Each video was also evaluated for medical accuracy and scored for clarity and actionability using the patient education materials assessment tool (PEMAT).
Results: The selected videos amassed 186.5 million views, 7.9 million 'likes,' 1,52,000 'shares,' and 67,000 comments. Nearly half (47%) were produced by healthcare professionals (HCPs), with orthopaedic surgeons accounting for 70% of the HCP contributors (32.9% of total creators). Medically accurate content was found in 60% of all videos, with 80% of HCP videos featuring accurate information. Educational content dominated (72%), with the remainder being anecdotal (28%). Tone varied, with videos presenting either positive (50%), negative (32%), or neutral (18%) perspectives. Common themes included post-operative experiences (30%), medical education (15%), surgical techniques (12%), risks vs benefits (10%), and treatments for achondroplasia (10%). Concerns about pain, fear, permanent injury (24%), cost (10%), and a lack of understanding of procedures (10%) were also frequently mentioned. Notably, there was no focus on racial or socio-economic barriers in any of the videos. The average understandability and actionability scores, according to the PEMAT, were 67.5 and 67.15%, respectively.
Conclusions: The widespread popularity of these videos underscores the growing role of social media in disseminating medical information. This analysis highlights the need for HCPs to leverage platforms like TikTok to provide accurate, reliable information and address common concerns, misconceptions, and fears surrounding cosmetic limb lengthening.
How to cite this article: Ramlawi AA, Sidani M, Missi C, et al. Accuracy and Actionability of TikTok Content on Cosmetic Limb Lengthening: A Comparison between Healthcare Professional and Non-professional Sources. Strategies Trauma Limb Reconstr 2025;20(2):90-93.

Keywords:  Cosmetic limb lengthening; Femur; Online health information; Patient perspective; Social media

DOI:  https://doi.org/10.5005/jp-journals-10080-1648
Digit Health. 2026 Jan-Dec;12:12 20552076261450409

Quality and reliability of chest Pain-Related short-form health videos on social media: A cross-sectional content analysis.

Ren Cheng-Han Fan, Qi-Bin Chen, Xin-Xin Zheng, Lu-Jie Huang, Cheng-Lv Hong.

   Background & Aims: Chest pain is one of the most common reasons for emergency medical visits. Increasingly, individuals seek preliminary explanations through short-form video (SFV) platforms before consulting healthcare professionals. In China, TikTok and Bilibili are major sources of public health information; however, the quality and reliability of symptom-focused chest pain content on these platforms remain poorly characterized. Unlike prior studies that primarily examine disease-specific videos, real-world health information-seeking often begins with symptoms rather than diagnoses. This study aimed to evaluate the quality, reliability, and content characteristics of chest pain-related SFVs in a real-world digital health context.
Methods: This cross-sectional study analyzed 200 Chinese-language short-form videos on adult chest pain from TikTok and Bilibili. Video quality and reliability were assessed using the JAMA benchmark criteria, Global Quality Scale, and modified DISCERN instrument. Content characteristics, uploader type, and engagement metrics were systematically evaluated, with high inter-rater reliability.
Results: Overall informational quality was suboptimal on both platforms, with uniformly low JAMA scores reflecting limited transparency regarding authorship and sources. TikTok videos demonstrated higher mean GQS and DISCERN scores and a greater proportion of medical professional uploaders compared with Bilibili. Content predominantly focused on symptom descriptions, differential diagnosis, and general management advice, while information on etiology, treatment options, and prevention was frequently incomplete. Videos produced by non-medical professional uploaders generated higher user engagement despite lower informational quality.
Conclusion: Chest pain-related SFVs on major Chinese platforms show substantial gaps in quality and reliability. Strengthening medical professional engagement, improving platform-level content governance, and promoting evidence-based symptom education may enhance digital health literacy and reduce risks associated with delayed care-seeking.

Keywords:  Bilibili; DISCERN; GQS; JAMA; TikTok; chest pain; information quality; reliability; short-form videos

DOI:  https://doi.org/10.1177/20552076261450409
Ann Afr Med. 2026 May 14.

Attitudes and Practices of Adults Regarding the Use of TikTok for Health Information: A Cross-sectional Study.

Mohammed Zaid Aljulifi, Abdulmohsen Othman Alhaqal, Abdullah Fahad Abahussain, Abdulelah Dohais Almutairy, Saud Saad Alanazi, Waleed Abdulaziz AlKulaibi, Fahad Mohammad Alfhaid, Fatima Bassam AlAlqam, Ahmed Saleh Alsaleh, Manar Mohammed Abutaki, Lama Essam Mohiddin.

   BACKGROUND: TikTok has rapidly grown as a source of health-related content, especially after the COVID-19 pandemic. However, concerns persist regarding the credibility and reliability of medical information shared on the platform. This study examined the attitudes and practices of adults in Saudi Arabia regarding the use of TikTok for obtaining health information.
METHODOLOGY: A cross-sectional online survey was conducted among 895 adults (aged ≥18 years) in Saudi Arabia between January and June 2024 using purposive sampling. A structured questionnaire assessed sociodemographic characteristics and perceptions about health information on TikTok. Data were analyzed using the descriptive statistics and Chi-square tests to determine the associations between demographic factors and TikTok health-information use, with P < 0.05 considered statistically significant.
RESULTS: TikTok use was most common among younger adults and females. Although more than half of participants reported using the app for at least an hour daily, only 41.23% used it to obtain health information. The most frequently searched topics included medication side effects, patient experiences, and general health awareness. Most participants rated TikTok's health information as moderately credible and trusted content more from friends, influencers, and peers than from healthcare professionals. TikTok influenced healthcare provider choices and daily routines but was not widely used before consulting professionals. Female gender was significantly associated with seeking medical information on TikTok (P = 0.004).
CONCLUSIONS: TikTok provides accessible health information but is viewed cautiously due to credibility concerns. Increased healthcare professional involvement and digital literacy initiatives may enhance reliability and safe usage.

Keywords:  Arabie saoudite; Credibility; Crédibilité; Saudi Arabia; TikTok; cross-sectional study; health information; information sur la santé; médias sociaux; social media; étude transversale

DOI:  https://doi.org/10.4103/aam.aam_570_25