bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–12–14
forty-one papers selected by
Thomas Krichel, Open Library Society



  1. Med Ref Serv Q. 2025 Dec 09. 1-10
      Anecdotal evidence suggests that hiring managers and hiring committees are seeing small numbers of applicants for vacancies at their health sciences libraries, making recruitment difficult. Several challenges are often cited for this, but little has been said about geographic considerations. Our objective was to analyze early career health sciences job postings in the United States for one year, and identify any geographic disparities relevant to recruitment. We explored medical and health science librarian job postings from MLA's website, ALA's joblist, medlib-l, and caucus listservs, which were compiled from January to December 2023. Early career postings were determined based on predefined criteria. Based on the medical/health science librarian job postings from 2023, there were 216 total postings including 105 early career positions (requiring one year or less of experience), reflecting approximately 49% of all job postings during this period. A plurality of early career postings (27%) were located in the Mid-Atlantic region while the fewest (5%) were from the Mountain West. Researchers analyzed the early career postings finding that instruction (67%) and reference (58%) duties were most prominent. Geography is important, as a new LIS graduate living in a region with fewer opportunities may be forced to move in order to obtain a medical library position, and optimal approaches to recruitment will vary depending on the employer's location. As this highlights just one aspect of the challenge, there are further research directions that may be taken from this analysis.
    Keywords:  LIS education; Recruitment; geography
    DOI:  https://doi.org/10.1080/02763869.2025.2595570
  2. Nucleic Acids Res. 2025 Dec 08. pii: gkaf1172. [Epub ahead of print]
    CNCB–NGDC Members and Partners
      The National Genomics Data Center (NGDC), as part of the China National Center for Bioinformation (CNCB), provides a suite of database resources for worldwide researchers. As multi-omics big data and artificial intelligence reshape the paradigm of biology research, CNCB-NGDC continuously updates its database resources to enhance data usability, foster knowledge discovery, and support data-driven innovative research. Over the past year, notable progress has been achieved in expanding the scope of high-quality multi-omics datasets, building new database resources, and optimizing extant core resources. Notably, the launch of BIG Search enables cross-database search services for large-scale biological data platforms, including NGDC, National Center for Biotechnology Information (NCBI), and European Bioinformatics Institute (EBI). Additionally, several new resources have been developed, covering genome and variation (Hiland Resource, TOAnnoPriDB), expression (TEDD), single-cell omics (PreDigs, scMultiModalMap, TE-SCALE), radiomics (TonguExpert), health and disease (CAVDdb, IDP, MTB-KB, ResMicroDb), biodiversity and biosynthesis (SugarcaneOmics), as well as research tools (Dingent, miMatch, OmniExtract, RDBSB, xMarkerFinder). All these resources and services are freely accessible at https://ngdc.cncb.ac.cn.
    DOI:  https://doi.org/10.1093/nar/gkaf1172
  3. Nucleic Acids Res. 2025 Dec 12. pii: gkaf1060. [Epub ahead of print]
      The National Center for Biotechnology Information (NCBI) provides biomedical data resources including PubMed®, a repository of citations and abstracts published in life science journals, and ClinicalTrials.gov, a repository of clinical research summaries. NCBI also hosts the NIH Comparative Genomics Resource (CGR) that aims to maximize the impact of eukaryotic genome datasets. NCBI provides search and retrieval operations for most of these data from 40 distinct repositories, knowledgebases, and services. The E-utilities serve as the programming interface for most of these. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, CGR, ClinicalTrials.gov, ClinVar, dbSNP, GTR, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
    DOI:  https://doi.org/10.1093/nar/gkaf1060
  4. J Hand Surg Glob Online. 2025 Nov;7(6): 100822
       Purpose: The purpose of this study was to assess the validity, reliability, and readability of responses to common patient questions about postoperative from ChatGPT, Microsoft Copilot, and Google Gemini.
    Methods: Twenty-seven thoroughly vetted questions regarding distal radius fractures repair surgery were compiled and entered into ChatGPT 4, Gemini, and Copilot. The responses were analyzed for quality, accuracy, and readability using the DISCERN scale, the Journal of the American Medical Association benchmark criteria, Flesch-Kincaid Reading Ease Score, and Flesch-Kincaid Grade Level. Citations provided by Google Gemini and Microsoft Copilot were further categorized by source of reference. Five questions were resubmitted, requesting response simplification. The responses were re-evaluated using the same metrics.
    Results: All three artificial intelligence platforms produced answers that were considered "good" quality (DISCERN scores >50). Copilot had the highest quality of information (68.3), followed by Gemini (62.9) and ChatGPT (52.9). The information provided by Copilot demonstrated the highest reliability, with a Journal of the American Medical Association benchmark criterion of 3 (of 4) compared with Gemini (1) and ChatGPT (0). All three platforms generated complex texts with Flesch-Kincaid Reading Ease Scores ranging between 35.8 and 41.4 and Flesch-Kincaid Grade Level scores between 10.5 and 12.1, indicating a minimum of high-school graduate reading level required. After simplification, Gemini's reading level remained unchanged, whereas ChatGPT improved to that of a seventh-grade reading level and Copilot improved to that of an eighth-grade reading level. Copilot had a higher number of references (74) compared with Gemini (36).
    Conclusions: All three platforms provided safe and reliable answers to postoperative questions about distal radius fractures. High reading levels provided by AI remain the biggest barrier to patient accessibility.
    Clinical relevance: For the current state of mainstream AI platforms, they are best suited as adjunct tools to support, rather than replace, clinical communication from health care workers.
    Keywords:  Artificial intelligence; ChatGPT; Copilot; Distal radius fractures; Readability
    DOI:  https://doi.org/10.1016/j.jhsg.2025.100822
  5. Perspect Med Educ. 2025 ;14(1): 882-890
       Introduction: It is estimated that large language models (LLMs), including ChatGPT, are already widely used in academic paper writing. This study examined whether certain words and phrases reported as frequently used by LLMs have increased in medical literature, comparing their trends with common academic expressions.
    Methods: A structured literature review identified 135 potentially AI-influenced terms from 15 studies documenting LLM vocabulary patterns. For comparison, 84 common academic phrases in medical research served as controls. PubMed records from 2000 to 2024 were analyzed to track the frequency of these terms. Usage trends were normalized using a modified Z-score transformation.
    Results: Of the 135 potentially AI-influenced terms, 103 showed meaningful increases (modified Z-score ≥3.5) in 2024. Terms with the highest increases included "delve," "underscore," "primarily," "meticulous," and "boast." The linear mixed-effects model revealed significantly higher usage of potentially AI-influenced terms compared to controls (β = 0.655, p < 0.001). Notably, these terms began increasing in 2020, preceding ChatGPT's 2022 release, with marked acceleration in 2023-2024.
    Discussion: Certain words and phrases have become more common in medical literature since ChatGPT's introduction. However, the use of these terms tended to increase before 2022, indicating the possibility that the emergence of LLMs amplified existing trends rather than creating entirely new patterns. By understanding which terms are overused by AI, medical educators and researchers can promote better editing of AI-assisted drafts and maintain diverse vocabulary across scientific writing.
    DOI:  https://doi.org/10.5334/pme.1929
  6. Infect Dis Now. 2025 Dec 09. pii: S2666-9919(25)00207-6. [Epub ahead of print]56(1): 105228
       INTRODUCTION: More and more people are using large language models (LLMs) to seek out health information online. Although these tools have great potential to improve digital health literacy, not enough is known about their accuracy and consistency, especially in life-threatening conditions such as sepsis. The aim of this study was to test and compare the effectiveness of two popular LLMs, ChatGPT 4o and Gemini 2.5 Flash, in providing accurate and consistent answers to questions about sepsis.
    MATERIAL AND METHODS: A cross-sectional benchmarking study was conducted using a standardized set of sepsis-related questions, comprising two main categories: frequently asked questions (FAQs) and items drawn from the Surviving Sepsis Campaign (SSC) guidelines. The responses generated by the two models were independently assessed by two raters using the Global Quality Score (GQS), and reproducibility was evaluated by submitting each question twice.
    RESULTS: Gemini significantly outperformed ChatGPT in overall quality and reproducibility. More specifically, 94% of Gemini's responses received the highest GQS rating (GQS 5), compared to only 35.4% of the ChatGPT answers. Gemini also demonstrated higher reproducibility (97.5% vs. 76.5%). Both models underperformed in the "prevention" domain. Gemini showed greater potential than ChatGPT in delivering accurate and consistent sepsis-related health information, which is crucial for patients and caregivers alike.
    CONCLUSION: These findings underscore the importance of rigorous benchmarking before integrating LLMs into digital health platforms, and illustrate a need for refinement of LLMs to enhance their reliability in public-facing health communication.
    Keywords:  Artificial intelligence; ChatGPT; Gemini; Large language model; Sepsis
    DOI:  https://doi.org/10.1016/j.idnow.2025.105228
  7. Nurse Educ Pract. 2025 Dec 04. pii: S1471-5953(25)00425-1. [Epub ahead of print]90 104668
       AIM: To evaluate the accuracy, comprehensiveness and readability of responses generated by four widely used large language models (LLMs) - ChatGPT-4.0, DeepSeek, Google Gemini and Perplexity - when addressing common depression-related questions.
    BACKGROUND: As patients frequently turn to digital tools for health information, reliable LLMs could play a supportive role in primary care and mental health education. However, their performance in providing accurate and accessible responses to depression-related questions remains underexplored.
    DESIGN: Cross-sectional analysis.
    METHODS: Thirty-five depression-related questions (covering pathogenesis, risk factors, clinical presentation, diagnosis, prevention, treatment, prognosis and nursing) were collected from seven authoritative websites. Responses from each LLM were independently evaluated by three psychiatric nurses in a blinded manner, focusing on accuracy and comprehensiveness. R Software was employed for the analysis of readability (Flesch-Kincaid Grade Level, Gunning Fog Index and Flesch Reading Ease Score).
    RESULTS: All four LLMs achieved high mean accuracy ratings (ChatGPT-4.0 = 4.67, DeepSeek = 4.62, Google Gemini = 4.65, Perplexity = 4.04). DeepSeek produced the highest proportion of very comprehensive responses (73.3 %), followed by ChatGPT-4.0 (44.8 %), Google Gemini (36.2 %) and Perplexity (6.7 %). Significant differences in readability scores were observed, with DeepSeek and Google Gemini performing less favorably compared with ChatGPT-4.0 (p < 0.05).
    CONCLUSION: LLMs, particularly DeepSeek, show potential as supplementary resources for depression-related health education in primary care and mental health contexts. Nevertheless, further research is needed to confirm their clinical utility, address readability challenges and evaluate their impact on real-world patient outcomes.
    Keywords:  Depression; Health education; Large language models; Mental Health
    DOI:  https://doi.org/10.1016/j.nepr.2025.104668
  8. Rev Assoc Med Bras (1992). 2025 ;pii: S0104-42302025001100609. [Epub ahead of print]71(11): e20250809
       OBJECTIVE: Artificial intelligence chatbots are increasingly used to disseminate health information. The aim of this study was to evaluate the accuracy, reliability, quality, and readability of responses generated by four artificial intelligence chatbots regarding human papillomavirus vaccination.
    METHODS: Frequently asked questions about human papillomavirus vaccination were identified using a Google search tool, and these questions were posed to the ChatGPT-3.5, Gemini, Copilot artificial intelligence, and ChatGPT-4 models. Responses were assessed for accuracy (five-point Likert scale), reliability (modified DISCERN scale), quality (Global Quality Scale), and readability (Flesch Reading Ease Score). Interobserver agreement was evaluated by the intraclass correlation coefficient. The results were evaluated at a significance level of p<0.05. SPSS 23.0 package programme was used in the analyses.
    RESULTS: There were significant differences between chatbots in terms of accuracy (p=0.001), reliability (p<0.001), and quality (p<0.001), but no significant difference in readability (p=0.497). ChatGPT-4 demonstrated the highest accuracy and quality, while Copilot artificial intelligence demonstrated superior reliability. All models produced responses that were moderately difficult to read. The intraclass correlation coefficient values for inter-rater reliability ranged from 0.034 to 0.512.
    CONCLUSION: Artificial intelligence chatbots show promising potential for use as patient information counsellors regarding human papillomavirus vaccination. However, improvements in readability and consistent evidence-based content generation are required before widespread clinical application.
    DOI:  https://doi.org/10.1590/1806-9282.20250809
  9. JMIR Form Res. 2025 Dec 10. 9 e77707
       Background: The Archive of German-Language General Practice (ADAM) stores about 500 paper-based doctoral theses published from 1965 to today. Although they have been grouped in different categories, no deeper systematic process of information extraction (IE) has been performed yet. Recently developed large language models (LLMs) like ChatGPT have been attributed the potential to help in the IE of medical documents. However, there are concerns about LLM hallucinations. Furthermore, there have not been reports regarding their usage in nonrecent doctoral theses yet.
    Objective: The aim of this study is to analyze if LLMs can help to extract information from doctoral theses by using GPT-4o and Gemini-1.5-Flash for paper-based doctoral theses in ADAM.
    Methods: We randomly selected 10 doctoral theses published between 1965 and 2022. After preprocessing, we used two different LLM pipelines, using models by OpenAI and Google. Pipelines were used to extract dissertation characteristics and generate uniform abstracts. Furthermore, one pooled human-generated abstract was written for comparison. Furthermore, blinded raters were asked to evaluate LLM-generated abstracts in comparison to the human-generated ones. Bidirectional encoder representations from transformers scores were calculated as the evaluation metric.
    Results: Relevant dissertation characteristics and keywords could be extracted for all theses (n=10): institute name and location, thesis title, author name(s), and publication year. For all except one doctoral thesis, an abstract could be generated using GPT-4o, while Gemini-1.5-Flash provided abstracts in all cases (n=10). The modality of abstract generation showed no influence in raters' evaluation using the nonparametric Kruskal-Wallis test for independent groups (P=.44). The creation of LLM-generated abstracts was estimated to be 24-36 times faster than creation by humans. Evaluation metrics showed moderate-to-high semantic similarity (mean bidirectional encoder representations from transformers F1-score, GPT-4o: 0.72 and Gemini: 0.71). Translation from German into English did not result in a loss of information (n=10).
    Conclusions: An accumulating body of unpublished doctoral theses makes it difficult to extract relevant evidence. Recent advances in LLMs like ChatGPT have raised expectations in text mining, but they have not yet been used in the IE of "historic" medical documents. This feasibility study suggests that both models (GPT-4o and Gemini-1.5-Flash) helped to accurately simplify and condense doctoral theses into relevant information, while LLM-generated abstracts were perceived as similar to human-generated ones, were semanticly similar, and took about 30 times less time to create. This pilot study demonstrates the feasibility of a regular office-scanning workflow and use of general-purpose LLMs to extract relevant information and produce accurate abstracts from ADAM doctoral theses. Taken together, this information could help researchers to better search the family medicine scientific literature over the last 60 years, helping to develop current research questions.
    Keywords:  AI; ChatGPT; GPT-4o; Gemini; artificial intelligence; doctoral thesis; family medicine
    DOI:  https://doi.org/10.2196/77707
  10. Psychiatr Q. 2025 Dec 09.
      With growing reliance on AI chatbots for parenting support, this study presents the first evaluation of large language models (LLMs) in addressing common autism-related questions. It compared ChatGPT, Google Gemini, and DeepSeek based on the accuracy, clarity, and usefulness of their responses. The findings aim to inform parents and clinicians about the strengths and limitations of using AI tools in early ASD care. Twenty common questions about Autism Spectrum Disorder (ASD) were identified through content analysis of social media, Google Trends, and ASD forums. These questions were refined by two educational psychologists, and standardized benchmark answers were created by a panel of pediatric neurodevelopment specialists. Two blinded pediatric autism experts then evaluated the AI-generated responses based on quality, as well as usefulness and reliability. GPT-4 achieved the highest mean quality score (M = 4.85, SD = 0.36), followed by Gemini and DeepSeek (both M = 4.55, SD = 0.51; p > 0.05). For usefulness, GPT-4 scored M = 6.40 (SD = 0.75), Gemini M = 6.10 (SD = 0.85), and DeepSeek M = 6.05 (SD = 0.82; p > 0.05). In reliability ratings, Gemini led with M = 6.40 (SD = 0.82), GPT-4 M = 6.25 (SD = 0.71), and DeepSeek M = 5.95 (SD = 0.94; p > 0.05). Findings indicated that AI-based chatbots, by providing rapid, comprehensible, and evidence-based guidance on early signs, interventions, and family support, demonstrate significant potential in bridging the information gap for parents-especially when access to specialists is limited.
    Keywords:  AI accuracy; Autism spectrum disorder; ChatGPT; DeepSeek; Google gemini; Large language models; Parental guidance; Readability
    DOI:  https://doi.org/10.1007/s11126-025-10243-6
  11. Sci Rep. 2025 Dec 12.
      Gestational diabetes mellitus (GDM) is a prevalent condition requiring accurate patient education, yet the reliability and readability of large language models (LLMs) in this context remain uncertain. This study evaluated the performance of four LLMs-ChatGPT-4o, Gemini 2.5 Pro, Grok 3.0, and DeepSeek R-1-using 25 patient-oriented questions derived from clinical scenarios. Seven endocrinologists independently rated the responses with the modified DISCERN (mDISCERN) instrument and the Global Quality Score (GQS). Readability was analyzed using the Flesch Reading Ease (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG), while lexical diversity was assessed through type-token ratio (TTR). Grok and Gemini obtained the highest mDISCERN and GQS scores, whereas ChatGPT performed significantly lower (p < 0.05). DeepSeek generated the most readable outputs, while Grok provided the longest and most complex responses. All models scored below the FRES threshold of 60 recommended for lay audiences. Response length showed strong positive correlations with mDISCERN and GQS, while TTR was inversely related to quality but positively associated with readability. These findings highlight variability among LLMs in GDM education and emphasize the need for model-specific improvements to ensure reliable patient-facing health information.
    Keywords:  Artificial intelligence; Gestational diabetes mellitus; Large language models; Patient education; Readability
    DOI:  https://doi.org/10.1038/s41598-025-27235-y
  12. Indian J Crit Care Med. 2025 Nov;29(11): 967-969
       Background and aims: Obtaining informed consent (IC) for tracheostomy is a frequent and essential process in the intensive care unit (ICU). With the increasing use of artificial intelligence (AI) in health care, chatbots such as ChatGPT and Google Gemini (GG) are being explored as potential tools to assist in drafting IC documents.
    Methods: In this cross-sectional study, IC drafts for tracheostomy were generated by ChatGPT and GG. Fifteen experienced intensivists independently evaluated these drafts for accuracy, completeness, readability, and sentiment. Readability was measured using the Flesch Reading Ease (FRE) score, while sentiment analysis assessed the emotional tone of the text.
    Results: No statistically significant differences were observed in terms of accuracy or completeness between the two chatbots. The inter-rater reliability was assessed using the intraclass correlation (ICC). The ICC for completeness and accuracy ratings between ChatGPT and GG were 0.85 (95% CI: 0.75-0.92) and 0.80 (95% CI: 0.68-0.89), respectively, suggesting excellent to good inter-rater reliability between the two Chatbots. However, ChatGPT drafts had higher FRE scores (76.46 vs 60.04), indicating better readability. Sentiment analysis revealed that both drafts were predominantly neutral, with GG incorporating slightly more positive expressions.
    Conclusion: Both ChatGPT and GG can generate clinically appropriate IC content for tracheostomy. ChatGPT appears to have an advantage in producing more readable and patient-friendly material, highlighting its potential utility in clinical communication.
    How to cite this article: Chilkoti GT, Jain S, Gondode PG. Comparison of Artificial Intelligence Chatbots (ChatGPT vs Google Gemini) for Informed Consent Quality: A Cross-sectional Evaluation. Indian J Crit Care Med 2025;29(11):967-969.
    Keywords:  Artificial intelligence; ChatGPT; Google Gemini; Informed consent; Intensive care unit
    DOI:  https://doi.org/10.5005/jp-journals-10071-25074
  13. Int Urogynecol J. 2025 Dec 09.
       INTRODUCTION AND HYPOTHESIS: The objective was to develop a retrieval-augmented ChatGPT model grounded in evidence-based patient education materials and compare its performance against the standard ChatGPT model in responding to common urogynecology patient questions in this pilot study.
    METHODS: We developed a retrieval-augmented ChatGPT-4.0 model that prioritized content from International Urogynecological Association patient information leaflets. Ten commonly asked patient questions were submitted to both the standard and retrieval-augmented models. Six board-certified urogynecologists evaluated responses using the validated Quality Analysis of Medical Artificial Intelligence (QAMAI) tool, which assesses accuracy, clarity, relevance, completeness, usefulness, and sources. Total and domain-specific QAMAI scores were compared using the Wilcoxon signed-rank test, and a sensitivity analysis was performed, excluding the unblinded Source domain.
    RESULTS: The retrieval-augmented model demonstrated significantly higher total QAMAI scores than the standard model (median 22 [interquartile range, IQR, 19-25] vs 16 [IQR 13-18], p < 0.01) and outperformed the standard model in all six domains. In the sensitivity analysis, the retrieval-augmented model maintained significantly higher performance (18 [IQR 16-20] vs 14.5 [IQR 11-17], p < 0.01). Clinician raters preferred the retrieval-augmented model in 81% of responses.
    CONCLUSIONS: Grounding AI tools in vetted patient education materials significantly improved the quality of ChatGPT-generated responses in urogynecology. Retrieval-augmented models offer a promising approach to enhance patient education and promote patient-centered care.
    Keywords:  Artificial intelligence; Patient education; Pelvic floor disorders; Retrieval-augmented; Urogynecology
    DOI:  https://doi.org/10.1007/s00192-025-06446-x
  14. Cureus. 2025 Nov;17(11): e96436
      Background Conversational AI tools such as ChatGPT are increasingly used for health information seeking. While their popularity continues to grow, little is known about the readability and quality of their outputs in dental contexts, particularly for toothache, one of the most common oral health complaints. Objective This study aimed to evaluate the quality and readability of ChatGPT's responses to frequently searched toothache-related queries and to conceptually compare them with information available from top-ranked traditional websites. Methods In this cross-sectional study, the 20 most common toothache-related queries were identified using Google Trends (January 2014-January 2024). Each query was posed to ChatGPT (May 2024 version) in independent sessions to avoid contextual bias. Two endodontist raters assessed the quality of responses using the Ensuring Quality Information for Patients (EQIP) tool. Readability was measured using the Flesch Reading Ease, Flesch-Kincaid Grade Level, and Simple Measure of Gobbledygook (SMOG) Index. Interrater reliability was calculated using Cohen's kappa and the Intraclass Correlation Coefficient (ICC). For comparison, the first 3-5 non-advertising websites retrieved via Google for each query were evaluated using the same instruments. Results ChatGPT responses demonstrated high content quality (mean EQIP 85.3 ± 5.2) and moderate readability demands (Flesch Reading Ease 57.9 ± 3.2; Flesch-Kincaid 8.4 ± 0.6; SMOG 7.5 ± 0.4). Interrater reliability was excellent (κ = 0.86; ICC = 0.91). Compared with websites, ChatGPT yielded slightly higher EQIP scores (ΔEQIP + 3.1 on average) but also higher reading grade levels (ΔFKGL + 0.4). Conclusions ChatGPT provides responses to toothache-related queries that are comparable, and in some cases superior, in quality to top-ranked, non-advertising traditional health websites retrieved via Google search. However, both ChatGPT and these conventional sources exhibit moderate readability, which may limit accessibility for individuals with low health literacy. Future efforts should focus on simplifying AI-generated and online content to enhance clarity, equity, and effectiveness in digital health communication.
    Keywords:  chatgpt; cross-sectional study; dental pain; ehealth literacy; eqip; google trends; health communication; readability; smog; toothache
    DOI:  https://doi.org/10.7759/cureus.96436
  15. J Stomatol Oral Maxillofac Surg. 2025 Dec 08. pii: S2468-7855(25)00460-4. [Epub ahead of print] 102675
       INTRODUCTION: This study aims to evaluate the quality, accuracy, readability, and understandability of patient information provided by various Artificial intelligence (AI)-based chatbots regarding orthodontic tooth extractions MATERIALS AND METHODS: Two researchers created a list of questions for patients to ask the chatbots. The questions were categorized into 'Pre-extraction' and 'Post-extraction', with 20 questions in each category. Four different criteria were used to evaluate the chatbot responses to 40 questions: the Global Quality Scale (GQS), the Simple Measure of Gobbledygook (SMOG), and the Understandability and Accuracy Index. Jamovi (The Jamovi Project, 2022, version 2.3; Sydney, Australia) software was used for all statistical analyses.
    RESULTS: The highest mean values were observed in Claude 3.5 sonnet for GQS, Readability, and Accuracy Index. In terms of readability, as measured by the SMOG index, all three AI-based chatbots required a college-level education for comprehension. In the 'Pre-extraction' and 'Post-extraction' sections, Claude 3.5 Sonnet demonstrated the highest mean values for the GQS, readability, and accuracy indices. In terms of Understandability subcriteria 1 and 2, statistically significant differences were observed among the three chatbots, primarily due to the variation between Gemini and Claude 3.5 Sonnet.
    CONCLUSION: AI-based chatbots with a variety of features have generally provided answers of high quality, reliability, and difficult readability to questions. Although the medical information related to orthodontic tooth extraction supplied by chatbots is of higher quality, it is recommended that individuals consult their healthcare professionals on this issue.
    Keywords:  Artificial intelligence; Orthodontic tooth extraction; chatbot
    DOI:  https://doi.org/10.1016/j.jormas.2025.102675
  16. Front Digit Health. 2025 ;7 1710159
       Background: Artificial intelligence (AI) chatbots are increasingly consulted for dental aesthetics information. This study evaluated the performance of multiple large language models (LLMs) in answering patient questions about tooth whitening.
    Methods: 109 patient-derived questions, categorized into five clinical domains, were submitted to four LLMs: ChatGPT-4o, Google Gemini, DeepSeek R1, and DentalGPT. Two calibrated specialists evaluated responses for usefulness, quality (Global Quality Scale), reliability (CLEAR tool), and readability (Flesch-Kincaid Reading Ease, SMOG index).
    Results: The models generated consistently high-quality information. Most responses (68%) were "very useful" (mean score: 1.24 ± 0.3). Quality (mean GQS: 3.9 ± 2.0) and reliability (mean CLEAR: 22.5 ± 2.4) were high, with no significant differences between models or domains (p > 0.05). However, readability was a major limitation, with a mean FRE score of 36.3 ("difficult" level) and a SMOG index of 11.0, requiring a high school reading level.
    Conclusions: Contemporary LLMs provide useful and reliable information on tooth whitening but deliver it at a reading level incompatible with average patient health literacy. To be effective patient education adjuncts, future AI development must prioritize readability simplification alongside informational accuracy.
    Keywords:  AI; cosmetic dentistry; dental bleaching; large language models; patient education; tooth whitening
    DOI:  https://doi.org/10.3389/fdgth.2025.1710159
  17. Clin Rehabil. 2025 Dec 08. 2692155251397620
      ObjectivesThe aims of this scoping review were to (i) map education from randomised controlled trials and public websites for Achilles tendinopathy to pre-defined categories and (ii) appraise the quality of education available.Data sourcesSources were extracted via a search of multiple databases and from the first three pages of targeted Google searches in English, Chinese, and Spanish (websites).Review methodsThe frequency of sources that reported on each pre-defined category (n = 15) was reported, and the content within each category was summarised descriptively. Quality and reliability were assessed with the DISCERN tool (1-5 points, higher score means higher quality and trustworthiness). Understandability and actionability of education was assessed using Patient Education Materials Assessment tool (0-100%, higher scores indicate more comprehensible information with clearer messages and more identifiable actions). Alignment with current international guidelines was reported.Results119 randomised controlled trials and 385 websites were included. Education coverage was better in websites compared to trials, particularly related to pathology and management. Conflicting advice was found on websites (e.g. when treatment should be sought). Quality (1.6 ± 0.5) and reliability (2.1 ± 0.7) of education were poor, with low scores for treatment risks and shared decision-making. Understandability was moderate (59%) and actionability was poor (28%). Alignment with clinical guidelines was low, with key information commonly omitted.ConclusionEducational sources found in randomised controlled trials and public websites on Achilles tendinopathy are poorly aligned with clinical guidelines. The information gaps in these sources mean that they are unhelpful to patients and may steer them towards inappropriate decisions. The review highlights the need for the development of accurate, meaningful, and evidence-based educational resources for individuals with Achilles tendinopathy.
    Keywords:  Achilles tendinopathy; education; scoping review
    DOI:  https://doi.org/10.1177/02692155251397620
  18. Support Care Cancer. 2025 Dec 12. 34(1): 27
       PURPOSE: Access to appropriate supportive care resources and services is essential to improve outcomes for cancer survivors. This study aimed to identify, evaluate and content-map Australian online supportive care resources and services for people living with pancreatobiliary cancers.
    METHODS: A structured online search was conducted of Australian cancer organisations to identify pancreatobiliary cancer resources and services. Resources and service sites were evaluated (cost, readability, active engagement, diversity (age, sex, gender and culture), consumer voice) and content mapped against published categories of informational needs.
    RESULTS: A total of 180 unique online resources and seven service webpages were identified from 19 Australian cancer organisations. On evaluation, 99% resources and services were free to access, 44% of resources were deemed readable (year 8 reading level or below), 24% of resources demonstrated diversity and 24% of all resources and service sites included a cancer survivor voice. Information gaps were identified with topic categories such as body image and sexuality, rehabilitation, prognosis, and interpersonal and social issues.
    CONCLUSIONS: There is room for improvement across existing online resources. Co-design of an online resource hub, a centralised collection of accessible and appropriate resources is warranted to maximise support and improve the health outcomes of pancreatobiliary cancers survivors and caregivers. Australian pancreatobiliary cancer survivors and caregivers would benefit from better resources that adhere to best practice standards of online support. Future research should explore ways to reduce the information seeking burden and increase the quality of information.
    Keywords:  Biliary cancer; Online; Pancreatic cancer; Resources; Services; Supportive care
    DOI:  https://doi.org/10.1007/s00520-025-10216-2
  19. J Surg Orthop Adv. 2025 ;34(4): 163-167
      This study evaluates the quality and reliability of shoulder arthroplasty videos available on YouTube. Using the search terms "shoulder arthroplasty," "total shoulder arthroplasty," "partial shoulder arthroplasty," and "shoulder arthroplasty procedures," the authors found a total of 150 videos; 91 were assessed, and 82 met inclusion criteria. Two independent reviewers evaluated each video for educational content quality. Further analysis was undertaken using the following variables: upload date; total view count; duration; number of likes, dislikes, and comments; source; and modality. The included videos had an average Global Quality Scale (GQS) score of 2.95, indicating subpar educational content quality. Patient testimonials (10%) scored the lowest average GQS (1.8), while physician-led presentations (26%) scored the highest (3.5). There was no significant difference in average GQS between videos with a higher versus lower view count, or average GQS and days since upload. Most shoulder arthroplasty videos on social media provide low-quality information for patients. (Journal of Surgical Orthopaedic Advances 34(4):163-167, 2025).
  20. J Surg Res. 2025 Dec 11. pii: S0022-4804(25)00751-6. [Epub ahead of print]317 208-216
       INTRODUCTION: Oncoplastic breast surgery (OBS) has gained attention for improving breast cancer patients' satisfaction and quality of life. This study aims to address the readability of online English and Spanish language patient education materials (PEMs) in oncoplastic breast surgery.
    METHODS: A de-identified online search using the terms "oncoplastic breast surgery" or "cirugía oncoplástica de seno" was performed. English and Spanish websites were selected and categorized by academic or private centers. Readability scores were generated using established tests: Simple Measure of Gobbledygook (SMOG), Fry Graph, Patient Education Materials Assessment Tool (PEMAT) for Understandability and Actionability, and Cultural Sensitivity and Assessment Tool (CSAT). Fisher's exact tests assessed group differences.
    RESULTS: The most common location of origin of online resources was the United States (53%), followed by Europe (23%). The specialties performing OBS included breast surgery (48%), plastic surgery (44%), and obstetric and gynecological surgery (8%). All PEMs failed to meet the recommended readability levels. The average Simplified Measure of Gobbledygook and Spanish Orthographic Length reading level corresponded to that of a university freshman for both academic and private materials, with English websites being more difficult to read than Spanish resources. The average understandability score was slightly higher for academic centers compared to private institutions (63% versus. 61%; P = 0.661). The average actionability score was significantly higher for English websites compared to their Spanish counterparts (35% versus 21%; P < 0.001).
    CONCLUSIONS: Patient information found through an online search for OBS is too difficult for the average American adult to read. As patient interest in OBS grows, access to appropriately written educational material is crucial to support informed decision-making, enhance patient satisfaction, reduce decisional regret, and ultimately promote equity in health care.
    Keywords:  Breast neoplasms; Female; Health literacy; Health services accessibility; Mammaplasty; Reading; United States
    DOI:  https://doi.org/10.1016/j.jss.2025.11.023
  21. J Contemp Dent Pract. 2025 Nov 01. 26(11): 1060-1066
       AIM: Despite the widespread availability of online information about dental veneers, there is a lack of data on the quality and readability of these resources. The aim of this study was to assess the quality and readability of patient-oriented online information on dental veneers.
    MATERIALS AND METHODS: This study conducted a thorough web search utilizing Google, Yahoo, and Bing search engines to identify English-language websites offering information on dental veneers. The quality of the websites was assessed using DISCERN, Journal of the American Medical Association (JAMA) benchmarks, and Health on the Net Code (HONcode) tools. The readability of the websites was evaluated using the Flesch-Kincaid Grade Level (FKGL), Simplified Measure of Gobbledygook (SMOG), and Flesch Reading Ease (FRE) metrics.
    RESULTS: Out of a total of 195 websites included in the study, only 8 websites obtained a high overall DISCERN score, representing 2.4% of dental clinic (DC) and 14.8% of nonprofit organization (NPO) websites. The median [interquartile range (IQR)] of the overall score of DISCERN was significantly higher for NPO websites [55.5 (13)] compared to DC websites [42 (13)] and commercial entity (CE) websites [36.25 (5.5); p < 0.001]. Up to 15, 63, and 70% of NPO, DC, and CE websites, respectively, did not report any of the four JAMA criteria. Only seven websites (all NPOs) showed an active Health on the Net (HONs) Code certificate. All readability indicators confirmed easier texts on the NPO websites.
    CONCLUSIONS: The quality of the English web-based health information on dental veneers seems suboptimal. Nonprofit organization websites offer higher quality, more reliable, and readable content compared to DC and CE websites.
    CLINICAL SIGNIFICANCE: Online dental veneer information is generally of poor quality, with NPO websites offering the most reliable and readable content. Clinicians should direct patients to trustworthy NPO resources for accurate information. How to cite this article: Alqutaibi AY, Alharbi AT, Alassaf MS, et al. Assessment of Quality and Readability of Online Patient-centered Information on Dental Veneers: An Infodemiological Study. J Contemp Dent Pract 2025;26(11):1060-1066.
    Keywords:  Dental veneers; Online patient-centered information; Quality Readability.
    DOI:  https://doi.org/10.5005/jp-journals-10024-3964
  22. JMIR Infodemiology. 2025 Dec 11. 5 e78007
       BACKGROUND: Patients with knee osteoarthritis have a considerable need for information about their condition, its progression, and available treatments. Decision-making is often complex and requires evidence-based health information material (HIM). When medical consultations do not sufficiently address patients' needs, many seek additional information independently.
    OBJECTIVE: This study aimed to examine the quality of German-language HIM on knee osteoarthritis treatment and its suitability for supporting informed choice. In particular, the study analyzed the content of the HIM and assessed the balance in the presentation of treatment options.
    METHODS: A descriptive cross-sectional study was conducted. HIM was identified through a combination of search strategies, including a systematic internet search using commonly used German terms related to the treatment of knee osteoarthritis. Identified HIMs were independently assessed by 2 raters using the validated Mapping the Quality of Health Information (MAPPinfo) checklist, which operationalizes the criteria of the Guideline Evidence-Based Health Information. Information quality was calculated on a scale from 0% to 100%, representing compliance with the quality standard. A descriptive content analysis was also carried out to examine the range and balance of treatment options presented, as well as the reporting of benefits and complications associated with total knee arthroplasty (TKA). The presence of certification was recorded.
    RESULTS: A total of 94 HIMs were included. On average, the material met 14.6% (SD 9.4%) of the quality criteria. HIM from public and nonprofit providers performed better (mean 40.1%, SD 3.6% and mean 37.2%, SD 23.1%, respectively) than those from other providers. Overall, 14 HIMs presented treatment options in a balanced manner. Among the 78 HIMs that covered TKA, 38.5% (n=30) did not report any benefits, and 35.9% (n=28) omitted potential complications. Certified HIMs showed only moderately higher information quality than uncertified material (mean 26.8%, SD 16% vs mean 12.7%, SD 5.9%).
    CONCLUSIONS: Our results highlight the urgent need to improve the quality of German-language HIM on knee osteoarthritis. The deficits identified are fundamental and affect all dimensions of information quality. Although HIM from public or nonprofit organizations has better information quality, this does not facilitate informed choice. The frequent omission of complications and benefits of TKA and the unbalanced presentation of treatment options can influence decisions. Until structural improvements are made, patients seeking quality information should favor material from public or nonprofit providers. Additionally, the MAPPinfo checklist could form the basis of a differentiated certification system to make information quality more transparent for patients.
    Keywords:  consumer health information; evidence-based health information; health literacy; informed choice; knee osteoarthritis; quality; total knee replacement
    DOI:  https://doi.org/10.2196/78007
  23. Dental Press J Orthod. 2025 ;pii: S2176-94512025000500300. [Epub ahead of print]30(5): e2524255
       INTRODUCTION: In this era of artificial intelligence, the increasing competition has significantly enhanced the power of AI chatbots, which have been integrated into various fields, including orthodontics. They can be a good option for enlightening topics of curiosity before or during treatment, as they are easily accessible by patients. However, their reliability remains a concern. Miniscrews, or Temporary Anchorage Devices (TADs), frequently used during orthodontic treatments, are among the subjects of interest to patients.
    OBJECTIVE: This study aimed to comparatively evaluate the responses provided by four Large Language Models (LLMs), namely ChatGPT-3.5, ChatGPT-4 (OpenAI), Google Bard (Google LLC), and Bing Chat (Microsoft Corp), to questions asked by patients in the field of orthodontic miniscrews.
    MATERIAL AND METHODS: The most frequently asked questions by patients about miniscrews used in orthodontics were searched on Google. The first 50 pages were reviewed, and 30 questions were selected and asked to the LLM. The responses from the LLMs were evaluated using a five point modified scale and modified DISCERN (mDISCERN) by three orthodontic residents.
    RESULTS: It was observed that the highest score for Likert belonged to GPT-4 (3.84), while the lowest was for Bing Chat (3.37). Statistically significant differences were found between the median score values given to the questions by all three researchers, depending on the LLMs used (p = 0.016; p<0.001 and 0.017, respectively). Significant differences were found between the scores given by Investigator 1 and Investigator 3 to ChatGPT-4 and Google Bard and the scores given to Bing Chat. Additionally, Investigator 2 showed significant differences in the scores given to Bing Chat compared to ChatGPT-3.5 (p=0.018). Among the evaluated chatbots, ChatGPT-4 achieved the highest mDISCERN score (23.43 ± 2.89), followed by Google Bard (23.47 ± 3.2), ChatGPT-3.5 (22.62 ± 2.97), and Bing Chat (21.63 ± 2.43).
    CONCLUSIONS: In this study, it was found that LLMs can generally inform patients about miniscrews used in orthodontics and have promising potential. However, it is necessary that the information provided by these programs should always be supported by the information given by professionals.
    DOI:  https://doi.org/10.1590/2177-6709.30.5.e2524255.oar
  24. J Cancer Educ. 2025 Dec 08.
      This cross-sectional study evaluated the quality of information on radiotherapy-related oral mucositis (OM) available on the YouTube platform in Brazil. The first 200 videos retrieved using the keywords "boca ferida radioterapia" were analyzed, excluding those shorter than 1 min, longer than 15 min, or unrelated to the topic. Technical data was collected, and six variables were assessed: definitions of head and neck cancer, radiotherapy, and OM; along with OM frequency, symptoms, and prevention/treatment options. Videos were categorized as 'poor' (score 1-2 on ≥ 1 variable), 'average' (score 3 on all variables), or 'good' (score 4-5 on ≥ 1 variable with no scores 1 or 2). Of the 104 videos analyzed, 51.0% were produced by doctors and 85.6% were aimed at the lay public. Most videos lasted 1-9 min (87.5%), with 10,001-100,000 views (32.0%), ≤ 1,000 likes (54.4%), and ≤ 100 comments (66.3%). Overall, 10.6% (n = 11) were classified as 'poor' due to the presence of misinformation. An additional 32.7% (n = 34) were rated 'average' for failing to address any of the assessed variables. The majority, 56.7% (n = 59), were rated 'good' as they provided at least one accurate and comprehensive information and contained no misinformation. While few videos on radiotherapy-related OM in Brazil presented incorrect information, a significant number were incomplete. This highlights an opportunity to improve the content to provide the public with more comprehensive information.
    DOI:  https://doi.org/10.1007/s13187-025-02804-x
  25. J Pediatr Nurs. 2025 Dec 05. pii: S0882-5963(25)00437-3. [Epub ahead of print]86 491-501
       OBJECTIVE: This study aimed to assess the quality and reliability of pediatric vaccination videos on YouTube from a nursing perspective and to identify hesitancy-related cues, indicators, and deterrents present in the content.
    METHODS: In this cross-sectional, descriptive content analysis study, 243 English-language YouTube videos were analyzed using four keywords. Videos were evaluated with the Global Quality Scale (GQS), the Modified DISCERN tool, and the Pediatric Vaccine Hesitancy Assessment Tool for Social Media Content (PVHAT). In addition, engagement measures such as number of views, likes, video duration, and video characteristics such as source type and narrator identity were analyzed.
    RESULTS: The overall quality and reliability of the videos were moderate (mean GQS: 2.52; DISCERN: 2.83). Videos presented by healthcare professionals were of higher quality but showed lower user engagement. Videos with curiosity-driven titles, such as "What's in Vaccines?", received more views and comments. Emotionally framed narratives were identified in 22.6 % of the videos, and expressions of distrust toward health authorities appeared in 8.2 %. Community immunity was emphasized in only 25.5 % of videos. A strong positive correlation was observed between DISCERN and GQS scores (r = 0.760, p < .001).
    CONCLUSION: Pediatric vaccine content on YouTube often lacks high-quality, evidence-based information and frequently includes hesitancy-related signals. Public health communication should prioritize scientific accuracy while using engaging and accessible strategies, ideally through collaborations between healthcare professionals and digital content creators, to improve the reach and effectiveness of vaccination messages.
    Keywords:  Nursing; Pediatrics; Vaccination; Vaccine hesitancy; YouTube
    DOI:  https://doi.org/10.1016/j.pedn.2025.11.048
  26. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2025 Dec 12.
       BACKGROUND: Young people and adults increasingly obtain information about pregnancy termination (PT) through social media. Against this background, the aim of this study is to investigate, for the first time, the content and quality of German-language PT videos on YouTube, Instagram, and TikTok. Research questions on provider types (research question 1: RQ1), content (RQ2), and quality of PT videos (RQ3) as well as audience reactions (RQ4) will be answered.
    METHODS: A sample of N = 500 popular PT videos was drawn from YouTube (150), Instagram (150), and TikTok (200). For each video, up to 20 of the most liked topic-related audience comments were included in the sample (N = 4761). The videos and comments were analyzed using reliability-tested codebooks. Data analysis was conducted with R. The study is preregistered, and all data, materials, and analysis scripts are publicly available.
    RESULTS: The PT videos predominantly originated from media professionals (49%) and only rarely from healthcare professionals (6%; RQ1). In terms of content, the majority of the videos represented a pro-choice position (54%) and frequently addressed medical care as well as psychological and physical experience (RQ2). According to quality criteria for health information, clear deficits were identified, with YouTube videos performing best in comparison (RQ3). TikTok videos, by contrast, led in audience engagement, recording the highest numbers of views, likes, and comments. Viewers used comment sections to express political positions and share personal experiences (RQ4).
    DISCUSSION: Future research as well as practice interventions are needed to further assess the quality of social media videos on pregnancy termination and improve it in a more targeted way.
    Keywords:  Abortion; Health information; Pregnancy termination; Social media; mDISCERN index
    DOI:  https://doi.org/10.1007/s00103-025-04170-x
  27. Inquiry. 2025 Jan-Dec;62:62 469580251401454
      Abdominal aortic aneurysm (AAA) is a severe vascular disease. Given the high dependency of Chinese users on domestic social media platforms, this study aimed to evaluate the content completeness and information quality of AAA related videos on China's dominant social media. We searched for the keyword "abdominal aortic aneurysm" on the Chinese social media TikTok and Bilibili. The top 100 search results for each platform were collected based on the default sorting. The video content was evaluated for completeness and information quality using abdominal aortic aneurysm specific score (AAASS), modified DISCERN tool (mDISCERN), and Global Quality Scale (GQS). The study included 140 social media videos. Among all videos, median scores (IQR) for AAASS, mDISCERN, and GQS were 3.00 (2.00-4.00), 2.00 (1.00-3.00), and 3.00 (2.00-3.00). Analysis of correlations between video duration and engagement metrics revealed a weak positive correlation between "shares" and duration (r = .221, P = .009). Associations between engagement metrics and quality scales showed only "shares" correlated with GQS (r = .216, P = .011), while video duration demonstrated positive correlations with AAASS (r = .211, P = .012) and GQS (r = .234, P = .005). In various uploader identities, radiologists had significantly lower GQS scores than vascular surgeons, medical institutions, and cardiothoracic surgeons (P < .01). This study shows that videos about abdominal aortic aneurysm on Chinese social media exhibited poor content completeness and information quality. This highlights an urgent need for platforms to improve quality control algorithms and for medical professionals to be guided in producing more comprehensive, evidence-based content.
    Keywords:  China; abdominal aortic aneurysm; cross-sectional study; social media; videos
    DOI:  https://doi.org/10.1177/00469580251401454
  28. Sci Rep. 2025 Dec 12.
      Stroke is a leading cause of global mortality, making accurate health information vital. While TikTok is an influential source for health content, the quality of stroke-related information on the platform is unknown. This study evaluated the quality, reliability, and user engagement of the 100 most-liked stroke-related videos on TikTok. We conducted a cross-sectional analysis of videos from January 2025, assessing quality using validated instruments (e.g., GQS, mDISCERN) and categorizing creators. Healthcare professionals (HCPs) produced 34% of videos and achieved significantly higher quality scores (p < 0.001). Misinformation was present in 31% of videos, with significantly lower rates for HCPs (8.8%) compared to content creators (42.2%) and general users (42.9%). Only 23% of videos addressed stroke prevention. Importantly, we found no significant correlation between content quality and user engagement (r = -0.08, p = 0.43), revealing a concerning "engagement paradox". The quality of stroke information on TikTok is highly variable and disconnected from user engagement, posing a risk of misinformation spread. These findings underscore the urgent need for enhanced content moderation and greater engagement from healthcare professionals to disseminate reliable health information on social media.
    Keywords:  Digital health literacy; Health information quality; Misinformation; Social media; Stroke; TikTok
    DOI:  https://doi.org/10.1038/s41598-025-31464-6
  29. Rheumatol Adv Pract. 2025 ;9(4): rkaf126
       Objectives: This study aims to explore what types of gout content are presented on the social media platform TikTok and assess their association with user engagement.
    Methods: The top 200 TikTok videos captured using the search term 'gout' were collected. Two independent researchers coded the videos into eight main categories: account type, presenter, audio, video type, purpose, tone, overall connotation and gout content. Descriptive and inferential analyses were conducted to examine the distribution of variables and examine the association between gout content and engagement. Quotations were selected to reinforce some of the findings.
    Results: In total, 116 TikTok videos were included in the final analysis after excluding 84 non-relevant videos. The total number of views of the videos was ≈426.6 million, with the majority belonging to content creators from the USA. The most common presenters were patients with gout or close family members (27%). Approximately 38% of videos had negative connotations, with the most common purpose of videos being health advice (38%). The main content categories coded were management strategies (79%) and risk factors (45%), focusing overwhelmingly on diet. A significant difference in engagement was evident between gout medical sequelae and gout management (P < 0.05) only.
    Conclusion: This analysis found that there is a wide range of information being promoted on TikTok that may be misleading or inconsistent with rheumatology guidelines. Future public health strategies and health professionals have an opportunity to utilise TikTok as a platform to create content, counteract misinformation and improve public understanding of gout.
    Keywords:  TikTok; arthritis; gout; health education; social media
    DOI:  https://doi.org/10.1093/rap/rkaf126
  30. Sci Rep. 2025 Dec 11. 15(1): 43572
      With the growing availability of short video platforms such as TikTok and Bilibili, patients with diabetic kidney disease (DKD) are increasingly seeking health information through these channels. However, the quality and user engagement of DKD-related content on these platforms have not been systematically evaluated. This exploratory cross-sectional study assessed the quality and reliability of DKD-related short videos and examined predictors of user engagement. On April 4, 2025, the top 100 DKD-related videos were collected from each platform. Content quality and reliability were assessed using the Global Quality Score (GQS), the modified DISCERN (mDISCERN), and the Medical Quality Video Evaluation Tool (MQ-VET). An eXtreme Gradient Boosting (XGBoost) model was employed to predict the number of likes and identify associated predictors. Despite being shorter in length, TikTok videos received significantly more likes, saves, shares, and comments than those on Bilibili (all p < 0.001), and scored higher on GQS and MQ-VET, with no significant difference in mDISCERN scores. Videos uploaded by professionals generally showed higher quality. Follower count, video length, and days since upload were the strongest predictors of engagement. Overall, TikTok videos exhibited higher quality and engagement than those on Bilibili; however, given the algorithm-driven sampling, uploader-level clustering, and tool adaptation, these findings should be interpreted as descriptive and exploratory rather than causal.
    Keywords:  Bilibili; Diabetic kidney disease; Information quality; TikTok; XGBoost
    DOI:  https://doi.org/10.1038/s41598-025-27650-1
  31. J Cosmet Dermatol. 2025 Dec;24(12): e70578
       BACKGROUND: Melasma is a common chronic hyperpigmentation disorder that substantially impairs patients' quality of life. With the rapid growth of short-video platforms such as TikTok and Bilibili, an increasing number of patients are turning to these media for health-related information. This study aimed to evaluate the quality and reliability of melasma-related videos available on TikTok and Bilibili.
    METHODS: Between August 17 and 19, 2025, we searched Douyin (the Chinese version of TikTok) and Bilibili using the Chinese keyword "" ("melasma"), and included the top 150 videos under each platform's default comprehensive ranking. The search and analysis were conducted in Chinese, reflecting the linguistic and geographical context of mainland China. Video characteristics and engagement metrics were recorded. The quality and reliability of the videos were independently evaluated by two researchers using the Global Quality Score (GQS) and the modified DISCERN (mDISCERN) instrument.
    RESULTS: A total of 237 videos were included in this study. Content was dominated by clinical manifestations (46.8%), etiology (44.3%), and diagnosis (40.1%), whereas treatment-related content was markedly underrepresented (9.7%). The median video length was 127.00 s (70.75-270.50) on Bilibili and 47.00 s (35.00-96.00) on TikTok. TikTok videos achieved significantly higher engagement than Bilibili (p < 0.05). Overall video quality was moderate, with both GQS and mDISCERN showing a median score of 3.00 (IQR: 2.00-4.00). The mDISCERN score of Bilibili videos was 3.00 (3.00-4.00), significantly higher than TikTok (p < 0.05). Videos uploaded by healthcare professionals scored 3.00 (3.00-4.00) on GQS and 3.00 (2.00-4.00) on mDISCERN, both significantly higher than those uploaded by non-healthcare professionals (p < 0.05).
    CONCLUSIONS: This study found that melasma-related short videos presented an incomplete content structure, with treatment-related information being markedly underrepresented. The overall quality of the videos was moderate, whereas those produced by healthcare professionals demonstrated higher quality and reliability. Future efforts should encourage greater participation from healthcare professionals and the implementation of refined content strategies, with the aim of improving both the quality and educational value of dermatology-related short video resources.
    Keywords:  Bilibili; TikTok; health information quality; melasma; short‐video platforms
    DOI:  https://doi.org/10.1111/jocd.70578
  32. Cureus. 2025 Nov;17(11): e96297
      Background Patients with medical information have become active participants in their treatment process; however, this has created additional challenges for physicians and other healthcare providers. However, how providers cope with patients' requests for such information is not well understood, and the information-seeking behavior of physicians in catering to patients' information needs has not been thoroughly studied. The primary objectives of this cross-sectional study were to understand healthcare providers' information-seeking behavior when responding to patients' specific questions about the benefits and risks of treatments. Methodology This Institutional Review Board-approved study was conducted at an independent academic center in Allentown, PA, between 2017 and 2020. We collected pertinent data from structured one-on-one interviews using an interview guide. The interviews were recorded using an electronic audio recorder that saved the recordings as an audio file. Interview transcripts were analyzed using hand coding. We investigated the relationship between categorical participant attributes using the chi-square or Fisher's exact test at a significance level of 0.05, with continuity correction. We used the Kruskal-Wallis and Mann-Whitney U tests to investigate the differences between the distribution of continuous variables across the participants' categorical attributes. Results A total of 124 providers from eight departments participated. The majority, 62% (77/124) of the providers, reported that patients brought information about treatment, and 56% (69/124) about a diagnosis. We did not notice a significant variation in the number of patients who brought in information related to their diagnosis (p = 0.08), prognosis (p = 0.35), and other topics, such as birth control, food allergies, and vaccines (p = 0.13), across medical specialties. Overall, 72% (89/124) of the providers reported using DynaMed, UpToDate, and Lexicomp, and 54% (67/124) used PubMed. Further, 27% (33/124) referenced the clinical practice guidelines, 17% (21/124) of the providers referred to textbooks, and 15% (19/124) discussed the topic with colleagues. Moreover, 35% (44/124) of the providers reported conducting traditional critical appraisals to determine the credibility of the information. We did not observe a significant variation in the number of providers using the library website to access journals and the PubMed database (p = 0.29) or in the number of providers seeking information from their colleagues (p = 0.58) across medical specialties. Providers who have recently finished their training (median experience = 6 years; range = 3-15) reported being not satisfied with their process of seeking information compared with providers with a greater level of experience (median = 15 years; range = 3-45) reporting being satisfied and (median = 14.5 years; range = 4-34) reporting being somewhat satisfied (p = 0.04). Conclusions We found that most physicians across all medical specialties utilized and preferred point-of-care tools, such as DynaMed and UpToDate. However, many providers still rely on the reputation of the information source, such as a journal's impact factor and the author's research credentials, to determine the credibility and reliability of the information.
    Keywords:  google; shared decision-making; side effects; treatment benefits; up to date
    DOI:  https://doi.org/10.7759/cureus.96297
  33. BMC Med Educ. 2025 Dec 11. 25(1): 1692
       BACKGROUND: The increasing complexity of disease spectra and the rising public demand for health have led to a continuous expansion in the need for health information acquisition. However, the rapid development of digital and intelligent technologies has posed greater challenges to the credibility assessment of online medical and health information. The professional medical attributes of medical students necessitate their ability to discern health information in the digital-intelligent age.
    OBJECTIVE: To explore the intrinsic mechanisms of health information credibility assessment among medical students in the digital-intelligent age and to provide a theoretical basis for enhancing their health information discernment capabilities.
    METHODS: We conducted a grounded theory study following Corbin & Strauss. Through purposive sampling, 23 medical students from diverse academic backgrounds were selected for in-depth interviews. Open coding, axial coding, and selective coding were employed to analyze the interview data, systematically constructing a theoretical model of health information credibility assessment among medical students.
    RESULTS: (1) Intelligent technologies in the digital-intelligent age have significantly transformed the ecology of health information dissemination, with tools such as GenAI, IAT, and Big Data increasing the difficulty of health information discernment for medical students. (2) Individual heterogeneity is a critical factor leading to variations in health information discernment among medical students, primarily manifested across three dimensions: health information acquisition, health information cognitive processing, and social connectedness. (3) In the process of health information acquisition, differences in platforms, channels, and presentation formats influence individual discernment. (4) In health information cognitive processing, factors such as information familiarity, degree of involvement, and depth of processing affect discernment. (5) Regarding social connectedness, professional identity, clinical practical experience, and medical social support shape individuals' health information discernment.
    CONCLUSION: This study reveals the potential for individuals to proactively generate information in the digital-intelligent age transforming from passive information recipients into active participants in content creation, thereby highlighting the critical importance of individual heterogeneity. A theoretical framework for individual heterogeneity is constructed: health information acquisition → cognitive processing of health information → social connectedness → credibility assessment.
    Keywords:  Credibility assessment; Grounded theory; Medical students; Online health information
    DOI:  https://doi.org/10.1186/s12909-025-08290-5
  34. BMC Public Health. 2025 Dec 12. 25(1): 4243
       BACKGROUND: Older adults (> 55 years), in particular low-income older adults, have lower health literacy than the rest of the Canadian population. Lower health literacy is related to several negative health outcomes such as poor diabetes control and other physical and mental health problems. Canada's rising ageing population requires an age-friendly system that reduces the dependency on the Canadian health care system. This study investigated the Health Information Seeking Behaviour of low-income seniors living in social housing across five Ontario regions to determine how to improve healthcare outcomes and the performance of the Ontario healthcare system.
    METHODS: This cross-sectional study included in-person interviews guided by the Health Awareness and Behaviour Tool (HABiT) survey. Interviews were conducted with older adults from 16 social housing buildings in five Ontario communities between May 2014 and January 2015. Questionnaire responses were analyzed using descriptive statistics and simple logistics regressions.
    RESULTS: 625 individuals completed the HABiT survey. The majority of participants sought out health information at the doctor's office; 515 participants received health information from a doctor or nurse about keeping their heart healthy and 471 about preventing diabetes. Females were more than twice as likely to receive health information about heart health from family members, media sources, and pharmacists than males. Those aged > 84 years were the least likely to use media sources and were almost three times as likely to contact a doctor or nurse for heart health information compared to middle-aged participants. Adults with higher post-secondary education were more likely to use the Internet as a source of health information compared to high school graduates.
    CONCLUSIONS: Family physicians with older adult patients could better supplement their health assessments by promoting and explaining educational brochures, and ensuring that they address these health topics to better communicate chronic disease prevention.
    Keywords:  Diabetes; Health behaviour; Health knowledge; Heart health; Seniors
    DOI:  https://doi.org/10.1186/s12889-025-25469-z
  35. Acta Psychol (Amst). 2025 Dec 07. pii: S0001-6918(25)01380-0. [Epub ahead of print]262 106066
      Perinatal depression (PND) poses a growing health risk for pregnant women in China, yet research on their information-seeking and preventive intention remains limited. This study investigates the primary determinants shaping Chinese pregnant women's engagement in seeking information about PND and its prevention. A structured 41-item questionnaire was administered in Guangzhou through a cross-sectional survey design, yielding 406 valid responses. The findings identified five significant predictors of health information seeking intention: risk perception, affective responses, information insufficiency, informational subjective norms, and attitude, with attitude being the strongest predictor. Relevant channel beliefs moderated the relationship between information insufficiency and health information seeking intention, whereas perceived information-gathering capacity showed no significant moderating effect. Additionally, health information seeking intention was found to positively influence preventive intention. This study is pioneering in its application of the RISP model to the field of perinatal depression, thereby expanding its theoretical scope. By extending the model's pathways, the study offers key empirical evidence demonstrating how information seeking intention can facilitate the formation of preventive intention. It is recommended that policymakers develop an intervention framework for perinatal depression that centers on informational empowerment.
    Keywords:  Health information seeking; Perinatal depression; Pregnant women; Preventive intention; RISP model
    DOI:  https://doi.org/10.1016/j.actpsy.2025.106066
  36. J Pharm Policy Pract. 2025 ;18(1): 2594827
       Background: The internet has become a critical resource for accessing health information worldwide. Online health information seeking (OHIS) is increasingly common among hospitalised patients, particularly those with chronic conditions. While prior studies have explored OHIS in individual countries, limited evidence exists comparing behaviours across different sociocultural and healthcare contexts. This study compared patterns of internet use for health information among hospitalised patients in Switzerland and Qatar.
    Methods: A comparative cross-sectional study was conducted between January and June 2016 in two tertiary hospitals in Switzerland, and Qatar. Eligible patients (18-80 years) admitted to internal medicine, visceral surgery, or orthopaedics wards completed a 33-item structured questionnaire, available in French and Arabic, covering sociodemographic characteristics, internet access, health information-seeking behaviour, and use during hospitalisation. Both descriptive and inferential statistics were applied to identify predictors of OHIS.
    Results: A total of 820 patients participated (617 Swiss, 203 Qatari). Swiss patients were older (mean age 57 ± 15) than Qataris (44 ± 16, p ≤ 0.001). Qatari patients were more likely to search for health information online compared with Swiss patients (85% vs 65%, p ≤ 0.001). They searched more frequently for information on diseases (74% vs 56%), treatments (59% vs 41%), healthcare professionals (40% vs 20%), and hospitals (41% vs 18%). Online health information had greater reported impact among Qataris, prompting them to ask further questions to doctors (75% vs 50%, p ≤ 0.001) and influencing decisions to consult physicians (33% vs 22%, p ≤ 0.05). Both cohorts expressed interest in reliable, tailored online resources, with Qataris showing stronger preference for interactive and video-based platforms.
    Conclusion: This study highlights significant cross-country differences in OHIS behaviour among hospitalised patients, shaped by sociodemographic, cultural, and healthcare system contexts. Findings underscore the need for culturally relevant, trustworthy, and patient-centred digital health resources to enhance patient empowerment, improve clinician-patient communication, and reduce risks of misinformation.
    Keywords:  Internet; Qatar; Switzerland; health information; hospitalized patients; questionnaire survey
    DOI:  https://doi.org/10.1080/20523211.2025.2594827
  37. J Health Care Poor Underserved. 2025 ;36(4): 1255-1276
      Geographic barriers and long travel distances contribute significantly to urban/rural health disparities, making online technology use a vital tool for improving individual and community health in rural areas. However, factors related to technology use, particularly in the Deep South (a historically under-resourced U.S. region characterized by high poverty, limited access to healthcare and education, and a predominantly African American population), remain understudied. Guided by the notion of a digital divide, we explore social determinants of online technology use for seeking health information among rural residents through a cross-sectional survey (N=157). Multiple linear regression analysis (R2=.52) revealed that lower social isolation was associated with reduced online technology use. In contrast, greater social media use, higher education, and improved health literacy were linked to increased use. These findings underscore the need for coordinated efforts among researchers, practitioners, and policymakers to expand access to (and engagement with) health-related online technologies in rural communities.
    DOI:  https://doi.org/10.1353/hpu.2025.a975586