bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–04–05
47 papers selected by
Thomas Krichel, Open Library Society



  1. Bioscience. 2026 Mar;76(3): 269-283
      Taxonomic lists, usually of species, have many functions. However, there is currently no reliable and convenient way to determine whether a list contains the information that a user requires other than by reading the list in detail. We therefore developed 24 indicators to characterise list contents. The indicators aim to describe the extent to which the scored list covers their intended class of organisms, the quality of their taxonomic scholarship, and additional information they provide for each taxon. We tested the indicators on 16 lists drawn from a wide range of vertebrates, invertebrates, and plants. A list content score was derived from individual indicator scores after they had been weighted to reflect the preferences that taxonomists had expressed in a global survey. The indicators aim to help list creators provide the details taxonomists and list users consider important. We expect indicators to be refined after public debate..
    Keywords:  Catalogue of Life; indicators; species list scores; standards; taxonomy
    DOI:  https://doi.org/10.1093/biosci/biaf191
  2. PLoS One. 2026 ;21(3): e0345300
      Generative artificial intelligence (GenAI) blurs the boundaries between expert and non-expert sources, as it increasingly distributes and creates scientific content. This study examines how individuals adapt evaluation strategies, including content and source evaluation, and corroboration, when using GenAI versus a search engine. Based on performance tasks in which participants evaluated science-related socio-scientific dilemmas and follow-up interviews with 30 adult participants from diverse educational backgrounds, findings reveal that users employed these strategies on both platforms but adapted them in distinct ways. We identified two evaluation strategies that emerged as analytical constructs from the qualitative data. First, to corroborate output, participants frequently used a strategy we titled 'representation evaluation,' assessing whether GenAI accurately summarized its sources rather than verifying source agreement independently. Second, participants also applied 'meta source evaluation,' relying on their familiarity with sources provided by GenAI instead of directly evaluating the sources themselves. Although all participants engaged in dialogue with the chat, they did not leverage the bot's dialogue capabilities to assess credibility, and many relied on a "machine heuristic", assuming GenAI's inherent correctness, reflecting a well-documented over-trust in automated systems. This research underscores the importance of developing and assessing critical evaluation skills for navigating AI-generated scientific information. Specifically, it extends existing models of online information evaluation to contexts mediated by artificial intelligence.
    DOI:  https://doi.org/10.1371/journal.pone.0345300
  3. Nature. 2026 Apr;652(8108): 26-29
      
    Keywords:  Ethics; Machine learning; Publishing; Scientific community
    DOI:  https://doi.org/10.1038/d41586-026-00969-z
  4. IEEE Trans Vis Comput Graph. 2026 Apr 03. PP
      Searching for information, objects, or places in virtual reality (VR) is often a cumbersome process that breaks user immersion. Existing interaction techniques like pointing or voice commands are context-dependent, yet current systems fail to help users select the optimal strategy for a given situation. To address this challenge, we introduce an adaptive VR search framework that recommends search strategies based on situational factors. We first propose a taxonomy of five common VR search strategies and construct a factorial dataset linking task types, object properties, and environmental conditions to strategy choices. We then conduct a two-stage study. In Stage 1, we capture users' natural strategy selections in a free-search environment to create a ground-truth dataset. In Stage 2, we use this data to train and evaluate two adaptive systems, one driven by a machine learning (ML) model and another by a Large Language Model (LLM), against the free-search baseline. Our results show that both adaptive systems significantly reduce the number of attempts and perceived workload compared to free search. While the ML-based system achieved the fastest task completion times, both adaptive approaches were rated as significantly more usable. This work demonstrates that adaptive, context-aware systems can enhance search efficiency and user experience in VR, paving the way for more intelligent and immersive information-seeking interfaces.
    DOI:  https://doi.org/10.1109/TVCG.2026.3680637
  5. J Am Med Inform Assoc. 2026 Mar 30. pii: ocag038. [Epub ahead of print]
       OBJECTIVE: Intelligent agent-driven research co-pilots, leveraging advances in generative AI, are transforming how scientists access biomedical knowledge. This paper presents Med.ai ASK, an agentic question-answering system designed to address biomedical inquiries through dynamic retrieval augmentation and tool-driven reasoning. We aim to develop a system capable of parsing the nuance in biomedical scientists' research questions to provide reliable, grounded responses that are more accurate than other generative AI solutions.
    MATERIALS AND METHODS: We adopt the ReAct framework's tool-calling architecture and leverage atomic reasoning from Self-Discover to build Med.ai ASK. It selectively queries multiple biomedical knowledge bases and employs map-reduce tools for vector database retrieval, alongside external API and NER tool integration. We ingested 44 million biomedical documents from diverse sources. The agent is evaluated on a range of biomedical question-answering datasets.
    RESULTS: Human evaluation on an internal dataset shows strong performance and stability. Ratings from a large language model are aligned with human assessments, supporting its use in further experiments. Automatic evaluations indicate superior performance in long-form answers regarding accuracy, faithfulness, factuality, and reduced hallucinations. For short-form and multiple-choice answers, performance is competitive with state-of-the-art systems. The agent's detailed answers are more interpretable than other systems attributed to its agentic design. The agent effectively selects tools based on question type and is deployed in a production-level chat platform with over 1600 users and 25 000 answered questions.
    CONCLUSION: Med.ai ASK dynamically orchestrates biomedical information retrieval tools to deliver robust interpretative, accurate, and factual answers, which is crucial in the biomedical domain.
    Keywords:  agentic AI; biomedical question answering; large language model; reasoning; retrieval-augmented generation
    DOI:  https://doi.org/10.1093/jamia/ocag038
  6. J Am Med Inform Assoc. 2026 Apr 01. pii: ocag037. [Epub ahead of print]
       OBJECTIVE: To evaluate the effectiveness of generative query expansion for biomedical literature retrieval.
    MATERIALS AND METHODS: We thoroughly examined eight generative query expansion methods using three large language models across five datasets for biomedical literature retrieval. We further performed a quantitative analysis, including performance comparisons, rank transition analysis, and article-type effect analysis. We also conducted a qualitative examination of representative cases, from which we derived an error taxonomy.
    RESULTS: On BioASQ-Y/N, GPT-4o-based query expansion shifts Recall@10 to 0.417-0.512 and nDCG@10 to 0.358-0.479, relative to a baseline of 0.491 and 0.456. For PubMedQA, Precision@1 ranges from 0.764 to 0.876 and nDCG@10 from 0.847 to 0.931, compared with baseline values of 0.893 and 0.935. For 2019-Trec-PM, query expansion yields Recall@100 of 0.217-0.256 and nDCG@100 of 0.272-0.312, versus a baseline of 0.227 and 0.274. Similarly, for 2018-TREC-PM, Recall@100 spans 0.169-0.227 and nDCG@100 spans 0.195-0.250, relative to baseline scores of 0.164 and 0.191. For 2017-TREC-PM, Recall@100 and nDCG@100 fall within 0.111-0.139 and 0.154-0.191 under query expansion, compared with baseline metrics of 0.102 and 0.147. Both general-purpose and domain-specific Llama-based models demonstrate similar performance to GPT-4o.
    DISCUSSION AND CONCLUSION: The impact of query expansion varies significantly by the expansion methods and type of evidence, but is relatively agnostic to backbone model choice. Notably, query expansion primarily affects article ranking but has a limited impact on the screening stage. Our findings underscore the unique challenges of biomedical literature retrieval and highlight the need to develop domain-specific information retrieval techniques.
    Keywords:  Biomedical Literature Retrieval; Large Language Models; Query Expansion
    DOI:  https://doi.org/10.1093/jamia/ocag037
  7. Digit Health. 2026 Jan-Dec;12:12 20552076261437915
       Background: Kidney transplantation offers substantial clinical and economic benefits for patients with end-stage kidney disease (ESKD), yet organ shortages persist. Enhancing public awareness and health literacy regarding kidney donation is essential for effective donor recruitment. While online patient education materials are primary drivers of public perception, their readability, digital engagement, and accessibility remain underexplored barriers.
    Methods: We analyzed the most prominent kidney donation websites identified through U.S.-based Google searches, including 11 primary organizations and 16 affiliated subdomains, for a total of 27 websites. Readability was benchmarked using Flesch-Kincaid, SMOG, and Gunning Fog indices. To assess qualitative metrics, we deployed a generative AI framework utilizing a Large Language Model (Claude) to conduct automated sentiment analysis and tone evaluation, validated by human review. We systematically mapped digital engagement features, including multimedia, interactive tools, and multilingual support, to determine content comprehensiveness.
    Results: Websites consistently provided accurate information with a generally positive or neutral tone. Average readability exceeded recommended levels, with a combined mean grade of 12.3; 34% of websites were written at a college-level reading standard. Consensus across content was high. Multimedia elements were widely used, but engagement features were limited; only 30% of sites included extensive testimonials, and interactive tools were absent. AI-based analysis enabled standardized and reproducible evaluation, highlighting opportunities to improve accessibility, tone, and inclusivity.
    Conclusion: Current U.S. kidney donation digital resources present a barrier to health equity due to excessive reading complexity and static engagement models. AI provides a scalable, reproducible framework to audit and optimize patient education materials. Future initiatives must leverage AI-guided content optimization to bridge the literacy gap, potentially increasing donor registration and access to transplantation.
    Keywords:  digital health equity; generative AI; health literacy; kidney transplantation; large language models (LLM); patient education; sentiment analysis
    DOI:  https://doi.org/10.1177/20552076261437915
  8. Odontology. 2026 Mar 30.
      This study aimed to evaluate and compare the responses provided by ChatGPT-4o, Google Gemini (2.0 Flash) and Microsoft Copilot to frequently asked questions (FAQs) in endodontics. Fifty patient-oriented, open-ended endodontic FAQs were formulated by two experienced endodontists. Each question was posed to each chatbot three times, yielding a total of 450 responses. Two endodontists independently evaluated all responses using a modified Global Quality Score (GQS) on a five-point Likert scale. Validity was evaluated at two thresholds: low (all three responses scored ≥ 4) and high (all three responses scored 5). Fisher's exact test was used to compare validity among models. Cronbach's alpha was calculated to assess consistency. Readability was analyzed using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). All chatbots performed well under the low-validity threshold, but their performance declined under the stricter high-validity criteria. High-threshold validity of Google Gemini was significantly greater than that of ChatGPT-4o (p = 0.022). Unlike the other chatbots, ChatGPT-4o did not receive any scores of 2, yet its mean overall score was significantly lower than those of the other two models (p < 0.05). This suggests that models generating more detailed responses may carry a higher risk of misleading information, even when achieving higher scores. By contrast, ChatGPT-4o produced more readable outputs, which may lack sufficient depth. Overall, these findings indicate that no single chatbot can be considered optimal across all dimensions, as readability, accuracy, and completeness cannot be fully achieved by one model alone.
    Keywords:  Artificial intelligence; ChatGPT; Endodontics; Google Gemini; Large language models (LLMs); Microsoft Copilot
    DOI:  https://doi.org/10.1007/s10266-026-01373-9
  9. Arch Plast Surg. 2026 Mar;53(2): 191-198
       Background: Patients undergoing breast cancer surgery and reconstruction seek information using online patient education materials (OPEMs). The National Institutes of Health (NIH) and American Medical Association (AMA) recommend a sixth-grade reading level for OPEMs. In recent years, Chat Generative Pre-Trained Transformer (ChatGPT), a large language model (LLM), has shown potential utility in patient education. This study compares the readability and content quality of OPEMs on breast cancer surgery and reconstruction with ChatGPT-generated materials.
    Methods: Google searches were conducted in January 2025 to identify relevant OPEMs for breast cancer surgery and reconstruction. For each search term, ChatGPT 4.0 was prompted to generate patient education guides using two approaches: (1) Standard prompting and (2) simplified prompting to align with NIH/AHA recommendations ("write the guide like I am in sixth grade"). Readability and content quality metrics were assessed.
    Results: Ninety-nine OPEMs and 60 ChatGPT responses (30 standard, 30 simplified) were analyzed. Median Flesch-Kincaid Grade Level (FKGL) was 10.8 for OPEMs, 10.0 for standard ChatGPT responses, and 5.8 for simplified ChatGPT responses. OPEMs and standard ChatGPT responses significantly exceeded NIH/AMA recommendations ( p  < 0.001). Simplified ChatGPT responses aligned with the sixth-grade level and were significantly easier to read than OPEMs and standard ChatGPT responses ( p  < 0.001). DISCERN scores did not significantly differ between OPEMs and standard/simplified ChatGPT responses.
    Conclusion: OPEMs on breast cancer surgery and reconstruction exceed recommended readability levels. ChatGPT, when prompted to simplify, produced materials consistent with NIH/AMA guidelines while maintaining content quality. Using ChatGPT for patient education may enhance accessibility and patient comprehension of health information.
    Keywords:  artificial intelligence; breast cancer surgery; breast reconstruction; breast reconstruction-oncoplastic surgery; breast/trunk; clinical practice and education; hospital management; patient education materials; readability
    DOI:  https://doi.org/10.1055/a-2794-9984
  10. Glob Cardiol Sci Pract. 2025 May 15. 2025(2): e202526
      Introduction: The use of artificial intelligence (AI) has advanced rapidly in the field of cardiology owing to its ability to process complex data and analyze electrocardiograms, echocardiography, and cardiac testing. AI tools, such as ChatGPT and Google Gemini, can provide evidence-based treatment recommendations using concise language, which can help in the early diagnosis of disease. Methodology: In this cross-sectional study, patient information brochures for three cardiological procedures (ECG, 2D echocardiography, and exercise stress testing) were generated using ChatGPT and Google Gemini. The total word count, sentence count, average words per sentence, and syllables for words were assessed using the Flesch-Kincaid Calculator. The similarity of the text was determined using the Quill Bot plagiarism tool. The reliability of the generated responses was analyzed and graded using the Modified DISCERN Score, which is a 5-point rating system that uses a set of uniform standards to assess the accuracy and dependability of consumer health-related data. Statistical analysis was performed using RStudio v4.3.2. Additionally, the simplicity and reliability scores were compared using Pearson's Coefficient of Correlation. The unpaired t-test was used to compare the responses. Results: Responses generated by ChatGPT and Google Gemini were observed to have no significant difference in the word count (P = 0.59), sentence count (P = 0.74), average word per sentence (P = 0.79), grade level (P = 0.06), similarity (P = 0.45), and reliability scores (P = 0.38) between ChatGPT and Google Gemini. However, the ease score was significantly better for Google Gemini-generated responses than for ChatGPT (P = 0.0044), indicating that the responses generated by Google Gemini are more easily readable and understandable. Conclusions: The study found a statistically significant difference between the average syllables per word and ease score. No significant differences were observed in the number of words, sentences, average words per sentence, grade level, similarity, or reliability scores. More AI technologies need to be evaluated in future studies, which should cover a wider range of illnesses.
    DOI:  https://doi.org/10.21542/gcsp.2025.26
  11. Ann Agric Environ Med. 2026 Mar 25. pii: 204249. [Epub ahead of print]33(1): 46-52
       INTRODUCTION AND OBJECTIVE: ChatGPT can generate reliable medical information in gynaecology and obstetrics,but the content is often difficult to understand for patients with lower educational levels. The aim of the study is to evaluate the impact of Audience Persona Prompting on the simplification and readability of ChatGPT-generated medical information on cervical cancer screening (MICC_GPT) in Polish.
    MATERIAL AND METHODS: 392 MICC_GPT were analyzed, with 196 generated using Zero-Shot Prompting (STANDARD) and 196 generated using Audience Persona Prompting (EASY). The Audience Persona prompts included instructions to simplify the content: 'Explain as if to an average Polish woman with only primary education' (8 years of formal schooling). Readability was assessed using 24 objective linguistic indicators available at Jasnopis.pl. Statistica 13 (StatSoft, Poliand), the Brunner-Munzel test, p < 0.05.
    RESULTS: The average difficulty level of STANDARD output was 5.32 (at least 15 years of formal education), while EASY output averaged 4.15 (12 years of formal education). Of the 24 indicators, 21 showed statistically significant improvements in the simplification of EASY output (p < 0.05). While ChatGPT significantly simplified MICC_GPT, the readability levels remained too high for patients with only primary education.
    CONCLUSIONS: ChatGPT shows promise in tailoring medical information on cervical cancer (CC) screening for the needs of Polish patients with varying educational backgrounds, with the use of advanced prompt engineering techniques. However, further research is required to refine prompt engineering methods and develop effective strategies for generating information on cervical cancer screening that is accessible to individuals with only primary education.
    Keywords:  Audience Persona Prompting; ChatGPT; cervical cancer information; readability; simplification of medical information
    DOI:  https://doi.org/10.26444/aaem/204249
  12. Jt Dis Relat Surg. 2026 May 01. pii: jdrs.2026.2645. [Epub ahead of print]37(2): 470-476
       OBJECTIVES: This study aims to directly compare ChatGPT and DeepSeek, both equipped with DeepResearch/DeepThink capabilities, based on their responses to frequently asked questions (FAQs) on total knee arthroplasty (TKA).
    MATERIALS AND METHODS: Thirty frequently asked questions related to TKA were compiled from validated patient education sources, including American Academy of Orthopaedic Surgeons (AAOS) OrthoInfo, National Institute for Health and Care Excellence (NICE) guidelines, and popular patient discussion forums, and verified for clinical relevance by two independent arthroplasty surgeons. Two orthopedic surgeons, blinded to model identity, evaluated each response using a five-point Likert scale across five domains: accuracy, comprehensiveness, readability, relevance, and ethical and safety considerations. The maximum total score per response was 25. Readability was also assessed using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES). Inter-rater and intra-rater reliability were calculated using intraclass correlation coefficients (ICCs).
    RESULTS: The ChatGPT-4o scored significantly higher in comprehensiveness and clinical detail, whereas DeepSeek R1 produced responses with superior readability, indicated by a lower FKGL (7.5 vs. 10.2) and higher FRES (62.3 vs. 45.6) (p < 0.05). Both models demonstrated high accuracy and safety, with no factual errors identified. Intra-rater reliability was excellent (ICC > 0.81), and inter-rater agreement ranged from fair to substantial (ICC 0.31 to 0.63).
    CONCLUSION: Both ChatGPT-4o and DeepSeek R1 are capable of generating accurate, ethically sound, and clinically relevant educational content for patients undergoing TKA. While ChatGPT-4o offers more comprehensive information, DeepSeek R1 provides content that is more accessible to patients with lower health literacy. Model selection should be tailored to the target population to optimize educational effectiveness in clinical practice. The ability of real-time data retrieval to incorporate the most current clinical evidence and guideline updates may further enhance the educational quality, reliability, and clinical relevance of AI-generated patient information.
    DOI:  https://doi.org/10.52312/jdrs.2026.2645
  13. NPJ Artif Intell. 2026 ;2(1): 39
      Biomedical named entity recognition (NER) is a high-utility natural language processing task, and large language models (LLMs) show promise in few-shot settings. In this article, we address performance challenges for few-shot biomedical NER by investigating innovative prompting strategies involving retrieval-augmented generation. Using five biomedical NER datasets, we implemented and evaluated a systematically-structured multi-component static prompt and a dynamic prompt engineering technique, where the prompt is dynamically updated via retrieval with most relevant in-context examples based on the input texts. Static prompting with structured components increased average F1-scores by 12% for GPT-4, and 11% for GPT-3.5 and LLaMA 3-70B, relative to basic static prompting. Dynamic prompting further boosted performance and was evaluated on GPT-4, LLaMA 3-70B, and the recently released open-weight GPT-OSS-120B model, with TF-IDF based retrieval yielding the best results, improving average F1-scores by 8.8% and 6.3% in 5-shot and 10-shot settings, respectively. An ablation study on retrieval pool size demonstrated that strong performance can be achieved with relatively small number of annotated samples, reinforcing the annotation efficiency and scalability of our framework in real-world settings.
    Keywords:  Computational biology and bioinformatics; Health care; Mathematics and computing
    DOI:  https://doi.org/10.1038/s44387-025-00062-2
  14. Int Ophthalmol. 2026 Apr 02. pii: 185. [Epub ahead of print]46(1):
       PURPOSE: This study compares the readability and thoroughness of patient education materials for LASIK and cataract surgery.
    METHODS: Online patient education materials for LASIK and cataract surgery were collected from the top 30 U.S. ophthalmology programs, the American Academy of Ophthalmology (AAO), and generated using Doximity GPT. Materials were assessed using eight readability metrics that produced an average reading level score. Materials were also assessed for inclusion of information pertaining to description of the procedure, patient eligibility and contraindications, procedural risks, and post-operative care. Average readability and thoroughness were then compared between materials for each procedure.
    RESULTS: Of 30 ophthalmology programs, 27 had online patient education materials for LASIK and 19 for cataract surgery. There was no significant difference between the readability of LASIK and cataract surgery materials (grade level of 10.11 vs. 10.77, p = 0.100). A greater proportion of LASIK materials discussed patient eligibility (81.5% vs. 10.5%, p < 0.001), though there was no difference in the frequency of information pertaining to procedural risks and post-operative care. The AAO's materials were more readable than those from top ophthalmology programs (LASIK: 8.06 vs. 10.10, p < 0.001; cataract surgery: 8.73 vs. 10.77, p < 0.001), as were the Doximity GPT-generated materials (LASIK: 5.82 vs. 10.10, p < 0.001; cataract surgery: 7.32 vs. 10.77, p < 0.001).
    CONCLUSIONS: Online patient education materials for both LASIK and cataract surgery exceed a tenth-grade level, making them inaccessible to many patients. Ophthalmologists may consider using Doximity GPT to generate more readable materials.
    Keywords:  Cataract surgery; LASIK; Patient education; Readability
    DOI:  https://doi.org/10.1007/s10792-026-04066-y
  15. J Laparoendosc Adv Surg Tech A. 2026 Mar 29. 10926429261438353
       OBJECTIVE: The widespread use of online video platforms has transformed surgical education, particularly for minimally invasive procedures. This study aimed to evaluate the educational quality of laparoscopic adrenalectomy videos available on YouTube and to investigate whether digital popularity metrics reflect educational value.
    METHODS: A YouTube search was conducted on January 1, 2026, using the keyword "laparoscopic adrenalectomy." The first 250 videos were screened, and 135 videos meeting the inclusion criteria were analyzed. Educational quality was assessed independently by 2 experienced surgeons using LAP-VEGaS, JAMA benchmark criteria, Modified DISCERN, and Global Quality Score (GQS). Videos were categorized as institutional or individual according to the uploader profile. Digital engagement metrics, including view ratio, like ratio, and Video Power Index (VPI), were calculated. Independent predictors of educational quality were analyzed using multiple linear regression.
    RESULTS: Of the included videos, 36 (26.7%) were uploaded by institutional sources and 99 (73.3%) by individual users. Institutional videos demonstrated significantly higher educational quality scores across all structured assessment tools, including LAP-VEGaS, GQS, JAMA, and Modified DISCERN (P < .001). In multiple regression models, channel type was identified as the only independent predictor of educational quality scores (P < .05). In contrast, digital popularity indicators such as view ratio, like ratio, and VPI were not significantly associated with educational quality. No meaningful correlation was observed between video popularity and LAP-VEGaS score (R2 = 0.013).
    CONCLUSION: Laparoscopic adrenalectomy videos on YouTube show substantial variability in educational quality. Institutional videos provide higher educational value, while popularity metrics do not reliably indicate educational quality. Surgical trainees should preferentially use academically produced content for educational purposes.
    Keywords:  LAP-VEGaS; YouTube; educational quality; laparoscopic adrenalectomy; online surgical videos; surgical education
    DOI:  https://doi.org/10.1177/10926429261438353
  16. J Cancer Educ. 2026 Mar 28.
      
    Keywords:  Chondrosarcoma; Enchondroma; Ewing sarcoma; Osteochondroma; Osteoid osteoma; Osteosarcoma; YouTube
    DOI:  https://doi.org/10.1007/s13187-026-02854-9
  17. Rheumatol Int. 2026 Mar 28. pii: 68. [Epub ahead of print]46(4):
      
    Keywords:  Information science; Pregnacy; Rheumatic diseases; Rheumatology; Social media
    DOI:  https://doi.org/10.1007/s00296-026-06102-7
  18. Acta Neurol Belg. 2026 Apr 03.
      
    Keywords:  Exercise; Parkinson disease; Video content; Video quality; Youtube videos
    DOI:  https://doi.org/10.1007/s13760-026-03055-3
  19. Medeni Med J. 2026 Mar 27. 41(1): 51-58
       Objective: YouTube has become an increasingly popular platform for surgical education, yet the quality and reliability of its medical content remain uncertain. With the rise of artificial intelligence (AI), models such as ChatGPT offer new possibilities for automated educational content evaluation. This study aimed to compare human and AI-based assessments of the educational quality and reliability of YouTube videos on benign prostate surgery.
    Methods: A total of 100 videos of holmium laser enucleation of the prostate, transurethral resection of the prostate (TURP), transvesical prostatectomy, and thulium fiber laser enucleation of the prostate were analyzed. Two urology specialists and ChatGPT-5 (in two independent runs) evaluated each video using the Global Quality Score (GQS) and the modified DISCERN tool. Popularity metrics (views, likes, subscribers, duration) were also recorded. Non-parametric statistical tests and Spearman correlation analyses were applied.
    Results: Human raters assigned significantly higher DISCERN and GQS scores than both AI runs (p<0.01). TURP videos consistently received lower scores across all evaluators. No significant quality differences were found among video sources. Both AI runs showed strong internal consistency (ρ=0.62-0.75) and reproduced human rating patterns, though with lower mean values. Engagement metrics showed weak or no correlation with quality.
    Conclusions: AI models can provide consistent, scalable quality assessments but still underestimate educational value compared with human experts. Hybrid AI-expert evaluation may enhance the reliability of the appraisal of online surgical videos.
    Keywords:  Benign prostate obstruction; ChatGPT; DISCERN; Global Quality Score (GQS); YouTube; urology
    DOI:  https://doi.org/10.4274/MMJ.galenos.2026.38586
  20. Ir J Med Sci. 2026 Mar 31.
      
    Keywords:  Cochlear implant; DISCERN; GQS; JAMA; YouTube
    DOI:  https://doi.org/10.1007/s11845-026-04338-7
  21. J Cancer Educ. 2026 Apr 02.
      
    Keywords:  Colon-cancer screening; Ethnicity; Patient education as topic; Social media; YouTube; eHealth
    DOI:  https://doi.org/10.1007/s13187-026-02867-4
  22. Front Digit Health. 2026 ;8 1726517
       Background: ChatGPT-5, the latest multimodal large language model (LLM), has gained remarkable public attention for its ability to provide real-time and context-aware health information. However, its effectiveness in addressing sensitive urological topics such as vasectomy has not been systematically evaluated.
    Objective: This study aimed to evaluate the accuracy, completeness and public suitability of ChatGPT-5's responses to frequently asked questions about vasectomy, derived from Google Trends data reflecting real-world public interest.
    Methods: A total of eight experts-four urologists, two public health specialists, one obstetrician-gynecologist and one fertility nurse-independently assessed ChatGPT-5's responses to ten high-frequency vasectomy-related questions. Each response was rated using six 5-point Likert-scale criteria: medical accuracy, completeness, clarity, tone, public usefulness and recommendability. Descriptive statistics, Kruskal-Wallis tests and two-way random-effects intraclass correlation coefficients (ICC, 95% CI) were applied for statistical analysis.
    Results: The mean ratings across evaluation domains ranged from 3.75 to 4.04. Clarity of language and tone appropriateness received the highest scores, whereas medical accuracy and comprehensiveness demonstrated greater dispersion. No statistically significant differences were observed among expert subgroups (p > 0.05). Inter-rater reliability was very low (ICC = -0.01), indicating substantial variability across expert evaluations.
    Conclusions: In this exploratory assessment, ChatGPT-5 responses to vasectomy-related public questions were frequently perceived as clear and appropriately framed for informational use. However, variability across expert ratings and the absence of layperson validation underscore the need for cautious interpretation. Large language model outputs may serve as supportive educational resources when accompanied by expert oversight and audience-specific adaptation.
    Keywords:  ChatGPT-5; artificial intelligence; health communication; patient education; public health; vasectomy
    DOI:  https://doi.org/10.3389/fdgth.2026.1726517
  23. Ear Nose Throat J. 2026 Mar 30. 1455613261438070
       OBJECTIVE: YouTube is a video-sharing platform that patients frequently utilize. However, there are no objective assessments of the quality of information about otosclerosis on YouTube. Therefore, we aimed to assess the quality of YouTube videos for patient education via a cross-sectional study. We utilized 4 search phrases and analyzed them with 3 different scoring metrics, followed by statistical analysis.
    RESULTS: Fifty videos were analyzed for the search terms "stapedectomy," "stapedotomy," "laser stapedotomy," and "otosclerosis." Most videos for "stapedotomy" (42%) and "otosclerosis" (41.2%) were intended for patients, while those for "stapedotomy" (48%) and "laser stapedotomy" (96%) were created for healthcare professionals or students. Higher modified DISCERN scores were associated with healthcare organization-produced videos for "otosclerosis" (P = .01964) and "stapedotomy" (P = .02842). Higher global quality score (P = .02964) and Journal of the American Medical Society scores (P = .01488) were significantly associated with videos made by verified users for "otosclerosis."
    CONCLUSION: The quality of YouTube videos may not be sufficient for patient education on stapedectomy and stapedotomy for otosclerosis. Only 2 search terms included most videos geared toward patient education, while the other 2 terms had more videos for healthcare professionals. Lower transparency and reliability scores may give concerns about bias.
    Keywords:  YouTube; otolaryngology; otosclerosis; patient education; stapedectomy; stapedotomy
    DOI:  https://doi.org/10.1177/01455613261438070
  24. Front Public Health. 2026 ;14 1764270
       Objective: This study evaluated and compared the quality and educational value of kidney transplantation-related videos on TikTok and YouTube.
    Methods: A structured search identified 151 eligible videos. Each video was assessed using DISCERN, PEMAT-A/V (understandability and actionability), and the Global Quality Scale (GQS). Content completeness was examined across six key educational domains. Correlation analyses were conducted to determine associations between video characteristics and quality metrics.
    Results: Physicians were the primary uploaders on both platforms (YouTube 55.7%, TikTok 46.0%). Overall quality was low: 69.4% of TikTok and 62.8% of YouTube videos were rated "poor" or "very poor." Only 2.3% of YouTube videos achieved an "excellent" rating, and none on TikTok. TikTok showed much higher engagement (mean 13,639 likes and 2,664 comments per video) than YouTube (1,480 likes and 51 comments), yet engagement and duration were not correlated with quality on TikTok (p > 0.5). In contrast, YouTube video duration was positively associated with DISCERN and GQS scores (r = 0.478-0.584, p < 0.001). Content completeness was limited on both platforms, particularly for evaluation and long-term outcomes, and TikTok demonstrated significantly lower scores across all domains. Significant between-platform differences were observed in DISCERN (p < 0.001), GQS (p = 0.001), PEMAT understandability (p = 0.002), and content completeness (p < 0.01).
    Conclusion: TikTok and YouTube provide suboptimal educational content on kidney transplantation, with TikTok performing notably worse. Greater involvement of medical institutions and improved platform mechanisms to elevate high-quality, evidence-based content are urgently needed to enhance online patient education.
    Keywords:  TikTok; YouTube; internet; kidney transplantation; patient education; quality
    DOI:  https://doi.org/10.3389/fpubh.2026.1764270
  25. Eur J Dent Educ. 2026 Mar 29.
       INTRODUCTION: Large Language Models such as ChatGPT are increasingly used in dental education; however, their credibility in clinical contexts remains uncertain. This study is aimed to analyse the credibility and efficacy of responses given by ChatGPT about temporomandibular disorders among dentists and dental students.
    MATERIALS AND METHODS: Nine questions related to TMDs were posed to ChatGPT 3.5, and its responses were used to create an online survey. A total of 115 participants (60 dental students and 55 dentists) rated each response on a five-point Likert scale. Additionally, a Delphi panel of 14 TMD specialists assessed the same responses for accuracy, completeness, and guideline adherence based on DC/TMD and AAOP criteria. Consensus was defined as ≥ 70% agreement among panellists. The Mann-Whitney U test was used. A significance level of p < 0.05 was considered.
    RESULTS: The study indicates positive perceptions with median values for all questions scoring more than 4. Descriptive statistics, revealing mean scores ranging from 4.28 to 4.48. The score of each response did not show a significant difference between groups. Dental students demonstrated significantly higher total scores across all nine responses compared with the dentist group (p = 0.029). Delphi findings indicated strong expert consensus, with median scores ≥ 4 and interquartile ranges mostly equal to 1.
    CONCLUSION: ChatGPT 3.5 produced accurate and coherent responses about TMDs but should be used as a supplementary educational tool under professional supervision. Incorporating Delphi-based expert validation strengthened objectivity and demonstrated the value of combining user feedback with expert consensus when assessing AI-generated medical information.
    Keywords:  artificial intelligence; deep learning/machine learning; dental education; dental health survey(s); dental public health
    DOI:  https://doi.org/10.1111/eje.70152
  26. Medicine (Baltimore). 2026 Apr 03. 105(14): e48222
      Gout is a common inflammatory arthritis that imposes an increasing burden on global public health. With the rise of social media platforms, particularly TikTok and Bilibili, more individuals are seeking health-related information online. However, concerns remain regarding the quality and reliability of such information. This study aimed to evaluate the quality and reliability of gout-related videos on TikTok and Bilibili. A cross-sectional study was conducted by collecting the top 100 gout-related videos from each platform. Video characteristics, creator type, and user engagement metrics were extracted. Video quality was assessed using the modified DISCERN checklist and the global quality scale. A total of 163 videos were included. The overall median global quality scale was 2 (IQR: 2-3), and the median modified DISCERN score was 2 (IQR: 1-2). Video content was generally incomplete, with information related to diagnosis (7.4%) and epidemiology (11.7%) markedly underrepresented. Videos on Bilibili were significantly longer (368.00, IQR: 201.00-782.00, P <.05), whereas TikTok videos received significantly more likes (1102.00, IQR: 162.50, 11105.75, P <.05). Compared with personal users, specialists uploaded videos of significantly higher quality (P <.05). The overall quality and reliability of gout-related short videos on both platforms were low, and the content structure was incomplete. Videos uploaded by specialists demonstrated higher quality. This study suggests that more accurate and expert-driven content should be promoted on these platforms to enhance public health awareness and reduce the spread of misinformation.
    Keywords:  BiliBili; TikTok; cross-sectional study; gout; video quality
    DOI:  https://doi.org/10.1097/MD.0000000000048222
  27. J Biomech. 2026 Mar 26. pii: S0021-9290(26)00127-2. [Epub ahead of print]201 113272
      Although video analysis is widely used to assess footstrike pattern (FSP) and footstrike angle (FSA), the reliability of these measures from medial and lateral views remains poorly understood, particularly across treadmill and overground running. This study evaluated the intra-rater, inter-rater and inter-side reliability of FSP classification and FSA measurement between treadmill and overground running. This study involved seven novice runners on an indoor treadmill and nine experienced runners during an outdoor marathon-distance run. Synchronized medial- and lateral-view videos were recorded and analyzed twice by three trained raters using Kinovea software. FSP was categorized as rearfoot, midfoot or forefoot strike, and FSA was measured as the angle between the shoe sole and running surface at initial contact. A total of 1,680 treadmill and 134 overground footfalls were analyzed. Reliability was assessed using Cohen's or Fleiss' kappa (k) for FSP classification and intraclass correlation coefficient (ICC) for FSA measurement. Excellent intra-rater (Cohen's k ≥ 0.89, ICC(1,1) ≥ 0.96) and inter-rater reliability (Fleiss' k ≥ 0.85, ICC(2,1) ≥ 0.95) were indicated during treadmill running. For overground running, reliability remained good but comparatively lower (Cohen's k ≥ 0.74, ICC(1,1) ≥ 0.86; Fleiss' k ≥ 0.64, ICC(2,1) ≥ 0.84). Inter-side reliability between medial and lateral views was excellent on the treadmill (k ≥ 0.88, ICC ≥ 0.94) but only fair to moderate overground (k ≥ 0.48, ICC ≥ 0.68). While video-based footstrike assessment is highly reliable in controlled laboratory settings, its application in field-based conditions requires caution due to diminished inter-side agreement.
    Keywords:  Biomechanics; Footstrike pattern; Gait; Marathon; Psychometric properties
    DOI:  https://doi.org/10.1016/j.jbiomech.2026.113272
  28. Front Public Health. 2026 ;14 1751884
       Introduction: Myocardial infarction (MI) is one of the major diseases affecting human health and life, characterized by its acute onset and high mortality rate. Social media platforms like TikTok are playing an increasingly important role in disease education and prevention. This study evaluated the quality of MI-related science education videos on TikTok and examined the relationship between video content quality and user engagement.
    Methods: Video quality was assessed using the Global Quality Scale (GQS), Journal of the American Medical Association (JAMA) benchmark standards, and the modified DISCERN (mDISCERN) tool. Video duration and engagement metrics (likes, comments, shares, and collections) were recorded. Spearman's correlation and linear regression analyses were employed to examine the relationship between video quality and engagement.
    Results: After screening the videos, a total of 270 videos were analyzed. Collectively, these videos garnered 35,913,507 likes, 12,745,256 shares, 8,072,837 collections, and 1,306,276 comments. Most videos (84.1%, n=227) were uploaded by healthcare professionals, predominantly Western medicine practitioners (83.7%, 190/227). The highest proportion of videos scored 3 on the GQS scale (47.0%, 127/270). The highest proportion of videos scored 2 on the JAMA scale, accounting for 81.5% (220/270). The highest proportion of videos scored 3 on the mDISCERN scale, accounting for 45.9% (124/270). Video duration showed a significant positive correlation with both GQS and mDISCERN scores. The number of bookmarks correlated positively with both JAMA and mDISCERN scores.
    Discussion: As a widely used video dissemination platform, TikTok provides users with a positive experience, with the overall quality of MI-related short videos remaining at a moderate level. Analysis indicates that extending video duration and appropriately citing references and sources can further enhance video quality and reliability, offering users more comprehensive knowledge and improved disease prevention outcomes.
    Keywords:  TikTok; myocardial infarction; reliability; social media; video quality
    DOI:  https://doi.org/10.3389/fpubh.2026.1751884
  29. Inquiry. 2026 Jan-Dec;63:63 469580261436338
      Ankylosing Spondylitis (AS) is a complex autoimmune disease for which early diagnosis and treatment are critical for prognosis. The quality of health information shared on social media significantly impacts patient education. This study aims to evaluate the quality and completeness of AS-related health information on 2 major Chinese video platforms, TikTok and Bilibili. On August 27, 2025, a cross-sectional study was conducted, retrieving and analyzing the top 100 AS-related videos (N = 200) from each platform. Video reliability was assessed using the DISCERN tool and JAMA, and video quality was assessed using GQS and the 6-Dimensional Content Integrity Scale. Bilibili was found to be significantly superior to TikTok only in academic rigor (defined as information reliability and source transparency, measured by DISCERN-Dimension 1 and JAMA; P < .001), while both platforms performed poorly in practical utility (defined as content completeness and instructional value for patients, measured by GQS and DISCERN-Dimension 2; P > .05). Video "quality" was found to be completely uncorrelated with "popularity" (eg, likes, favorites) (P > .05), while video duration was the only significantly correlated external factor (rho = .41, P < .001). Content completeness exhibited a "cancelation effect" (Referring to the complementary nature of content strengths across platforms where one excels in areas the other lacks. Bilibili excelled in "diagnosis," while TikTok excelled in "treatment"). Notably, orthopedists and other healthcare professionals (HCPs; 59.5%) were the main content creators, not rheumatologists (27%). On Bilibili, the "patient" group scored significantly higher in "treatment information" (Dim2) than the "rheumatologist" group (P = .029). Bilibili provides more academically rigorous AS information, but both platforms severely lack depth in core practical content, and platform algorithms fail to effectively screen for high-quality content (quality-popularity disconnect). The video ecosystem (eg, "orthopedist-dominated" and "professionalism inversion" [a phenomenon where patient-led narratives outperformed specialists in treatment-related metrics due to the inherent structural alignment between comprehensive personal storytelling and traditional assessment criteria].) profoundly reflects real-world diagnostic dilemmas and the limitations of traditional assessment tools in the modern video era.
    Keywords:  Bilibili; DISCERN; TikTok; ankylosing spondylitis; health information; professionalism inversion; quality-popularity disconnect; social media
    DOI:  https://doi.org/10.1177/00469580261436338
  30. Sci Rep. 2026 03 30. pii: 10652. [Epub ahead of print]16(1):
      
    Keywords:  DISCERN tool; FNF-SCCS; Femoral neck fractures; Global quality score; Health information quality; PEMAT-A/V; Short-video platforms
    DOI:  https://doi.org/10.1038/s41598-026-46431-y
  31. Int Breastfeed J. 2026 Apr 03.
      
    Keywords:  Breast engorgement; Breastfeeding; DISCERN; GQS; JAMA; Postpartum; YouTube
    DOI:  https://doi.org/10.1186/s13006-026-00837-6
  32. Comput Inform Nurs. 2026 Mar 30.
      Heatwaves, intensified by climate change, have increased the risk of heat-related illnesses, particularly among vulnerable populations. As YouTube becomes a key source of health information, concerns have emerged regarding the credibility of its content. Hence, this study evaluated the content characteristics, educational quality, and reliability of Korean-language YouTube videos on heat-related illnesses. A total of 177 videos were systematically extracted and analyzed. Videos were assessed for uploader type, content, and speaker. Educational quality and reliability were evaluated using the Global Quality Scale and the modified DISCERN scale, respectively. The results indicated that broadcast media were the most common uploaders (67.8%). Most videos addressed symptoms (96.0%), causes (93.2%), and prevention (91.5%), with seasonal peaks in the summer. Videos from non-broadcast sources had higher educational quality compared with those from broadcast media, but the latter displayed slightly higher reliability. Videos featuring non-health care speakers had higher educational quality than did those featuring health care speakers. Overall, YouTube videos on heat-related illnesses vary in quality and reliability. For nurses involved in patient and community education, these findings highlight the need for critical appraisal of online health information and active participation in creating accessible and accurate digital health content to improve digital health education.
    Keywords:  Health education; Heat stress disorders; Information dissemination; Social media
    DOI:  https://doi.org/10.1097/CIN.0000000000001536
  33. Medicine (Baltimore). 2026 Apr 03. 105(14): e48231
      Colonoscopy is the primary method for diagnosing colonic diseases and detecting precancerous lesions, thereby playing a crucial role in the prevention of colorectal cancer. However, the scientific quality and reliability of these videos require systematic assessment. In addition, the scientific quality of Turkish colonoscopy videos on YouTube remains unexplored. This study used internationally recognized scoring systems to assess the scientific quality of Turkish colonoscopy-related videos on YouTube. This cross-sectional study analyzed 156 Turkish-language YouTube videos on colonoscopy, with the inclusion criteria requiring a video duration between 30 seconds and 60 minutes. The main outcomes were video quality and information accuracy, which were measured using the Journal of the American Medical Association (JAMA), modified DISCERN (mDISCERN) score, colonoscopy data quality (C-DQS), and Global Quality Score (GQS) tools. The study included 156 videos that fulfilled the inclusion criteria. According to the mDISCERN score, 24.4%, 59%, and 16.7% of the videos were of poor, fair, and good quality, respectively. Only 12.8% of the videos met the JAMA's quality criteria. The overall average score for each video source was low, with an average C-DQS score of 7.1/40. The JAMA, C-DQS, GQS, and mDISCERN scores for the videos designed to educate healthcare professionals were higher than those for videos designed for patient information or general culture/patient experience (P < .001 for each). Significant correlations were found between the JAMA score, mDISCERN score, GQS, and C-DQS (P < .001). Four independent quality assessment scales were used to evaluate Turkish-language YouTube videos on colonoscopy, with the results indicating suboptimal to moderate information quality. Online platforms, including YouTube, should implement strict quality control measures to ensure the accuracy of health-related videos.
    Keywords:  Turkish language; YouTube; colonoscopy; quality
    DOI:  https://doi.org/10.1097/MD.0000000000048231
  34. Sci Rep. 2026 Apr 02.
      
    Keywords:  Atrophic gastritis; Bilibili; Online video; Public health; Quality; TikTok
    DOI:  https://doi.org/10.1038/s41598-026-46260-z
  35. Urogynecology (Phila). 2026 Mar 30.
       IMPORTANCE: TikTok is increasingly used as a source of health information, yet the quality of pessary-related content has not been evaluated.
    OBJECTIVES: The purpose of this study was to assess the reliability, accuracy, and educational quality of pessary-related TikTok videos using validated tools.
    STUDY DESIGN: This was a cross-sectional quality assessment of TikTok videos that were identified using the search term "#pessary." Key engagement metrics, including likes, views, and comments, were extracted, and content creators were categorized as health care professionals (HCP) or non-HCP creators. To minimize algorithmic bias, a new TikTok account was used for data collection. Two independent reviewers assessed the videos using 4 validated tools: (1) modified DISCERN (mDISCERN), (2) Global Quality Scale (GQS), (3) Video Information and Quality Index (VIQI), and (4) the Patient Education Materials Assessment Tool for Audio-Visual Content (PEMAT A/V).
    RESULTS: Of 467 videos screened, 59 met inclusion criteria: 43 (72.9%) produced by HCPs and 16 (27.1%) by non-HCPs. HCP videos scored higher across most quality measures, including mDISCERN (median 3 vs 2, P=0.01), PEMAT A/V (median 10 vs 9, P=0.36), and VIQI (median 13.5 vs 11.5, P=0.33). Only 25.4% of videos achieved a GQS score ≥3. No significant differences were observed in views, likes, or shares between HCP and non-HCP videos, though HCP videos received more comments.
    CONCLUSIONS: Pessary-related TikTok content is generally of low quality. Although videos created by HCPs perform better, substantial gaps remain, highlighting the need for more accurate, accessible, and patient-centered information on social media.
    DOI:  https://doi.org/10.1097/SPV.0000000000001843
  36. Digit Health. 2026 Jan-Dec;12:12 20552076261438953
       Background: With the rise of short video platforms, TikTok and Bilibili have become major sources of health information for the public. This study aimed to evaluate the content, quality, and reliability of amblyopia-related videos.
    Methods: Using "amblyopia" as the keyword, we collected the video content, engagement metrics, video duration, and uploaders identity of the top 150 default-ranked videos on both platforms. The Global Quality Score (GQS) and the modified DISCERN (mDISCERN) tool were used to assess video quality and reliability. Mann-Whitney U and Kruskal-Wallis tests were then used for group comparisons, and Spearman correlation was used for correlation analysis.
    Results: A total of 199 videos were included. TikTok videos were significantly shorter than Bilibili videos. TikTok demonstrated considerably higher user engagement across likes, comments, collections and shares. Video content mainly focused on treatment (88.44%), while etiology (39.70%) and prevention (40.70%) were less discussed. The median GQS score was 3.00 (IQR: 2.00-3.00), and the median mDISCERN score was 3.00 (IQR: 2.00-3.00). No differences were found between platforms in GQS and mDISCERN (p > 0.05). Videos uploaded by specialists with a median GQS score of 4.00 (IQR: 3.00-4.00) and a median mDISCERN score of 3.00 (IQR: 3.00-4.00) outperformed those uploaded by non-specialists and individual users on both GQS and mDISCERN (p < 0.05). Video duration showed a weak positive correlation with quality (p < 0.05). Engagement metrics were not correlated with GQS or mDISCERN (p > 0.05).
    Conclusion: The quality and reliability of amblyopia-related videos were suboptimal, with diagnosis and prevention receiving insufficient attention. Videos uploaded by specialists had the highest quality and reliability. Strengthening content review and oversight and encouraging greater participation of specialists in amblyopia science communication are needed to improve the quality of health information on short video platforms.
    Keywords:  Bilibili; TikTok; amblyopia; health information quality; ophthalmology; short videos; visual health
    DOI:  https://doi.org/10.1177/20552076261438953
  37. PLoS One. 2026 ;21(4): e0345570
       BACKGROUND: Biologic agents are an important novel therapeutic option for moderate-to-severe psoriasis. In recent years, TikTok and Bilibili have gradually become important channels for Chinese patients to obtain health information. This study aims to evaluate the content, quality, reliability, and transparency of videos related to biologic therapy for psoriasis on these platforms.
    METHODS: We searched both platforms using the dual keywords "psoriasis" and "biological agents," confirmed compliance with relevant criteria, and collected the top 150 videos based on their composite rankings. Fundamental characteristics, uploader categories, and content types were documented. Two independent reviewers evaluated video quality using the mDISCERN, GQS, and the JAMA criteria. Nonparametric tests were performed for group comparisons, and Spearman correlation analysis was applied.
    RESULTS: Bilibili videos more frequently addressed medical expenses, types of biologic agents, and recurrence, whereas TikTok videos focused on etiology and clinical manifestations. The video quality was barely acceptable. On TikTok, the median GQS, mDISCERN, and JAMA scores were 3.00 (2.25, 4.00), 3.00 (3.00, 4.00), and 2.00 (2.00, 3.00). On Bilibili, the median scores were 3.00 (2.00, 4.00), 3.00 (2.00, 4.00), and 1.00 (1.00, 3.00). Videos uploaded by professional organizations achieved the highest GQS (median 4.00, IQR: 4.00-4.00) but had the lowest engagement. Engagement metrics showed a moderate correlation with quality scores (P < 0.05).
    CONCLUSIONS: This study found that videos related to biologic therapy for psoriasis lack content completeness, with overall quality, reliability, and transparency remaining at a suboptimal level. Greater participation by professional organizations and increased visibility of their videos should be encouraged to promote the dissemination of high-quality content. This study provides preliminary insights for health communication strategies and highlights the necessity of strengthening content regulation.
    DOI:  https://doi.org/10.1371/journal.pone.0345570
  38. Clin Breast Cancer. 2026 Mar 10. pii: S1526-8209(26)00038-8. [Epub ahead of print]26(5): 36-45
       BACKGROUND: Artificial intelligence (AI) is rapidly being integrated into breast cancer screening, improving cancer detection and workflow efficiency. As patients become increasingly exposed to information about AI in mammography through news outlets, hospitals, and commercial entities, they are likely to seek information online. However, the readability and understandability of Online patient education materials (OPEM) for AI in mammography have not been examined.
    METHODS: We collected the top 20 nonsponsored results for each of 5 AI-related internet search terms in mammography. After removing duplicates, each webpage (n = 56) was categorized by the source type and evaluated for readability using 6 readability algorithms, as well as for understandability using the Patient Education Materials Assessment Tool for Printable Materials.
    RESULTS: The average grade-level readability across all webpages was 14.2, exceeding both the American Medical Association (sixth grade) and Centers for Disease Control and Prevention (eighth grade) recommendations. Reading ease scores placed most content in the "difficult" range, which is most suitable for college-level readers. Understandability averaged 72.4%, with variation by source type: commercial, government, and patient advocacy pages scored highest, while academic and medical media sources scored lowest.
    CONCLUSION: Online information about AI in mammography is generally written at a level too advanced for most patients and meets only the minimum standards of understandability. Because patients are likely to encounter both OPEM and non-OPEM sources when searching online, clinicians should be prepared to guide them toward accessible, reliable resources. Developing standardized, patient-focused education materials by professional medical societies could ensure access to comprehensible information.
    Keywords:  Artificial intelligence; Breast cancer; Grade-level readability; PEMAT; Patient health literacy
    DOI:  https://doi.org/10.1016/j.clbc.2026.03.007
  39. J Exp Child Psychol. 2026 Mar 31. pii: S0022-0965(26)00066-4. [Epub ahead of print]268 106514
      Food rejection is most likely to occur during early childhood, with most rejected foods being vegetables and fruits. Consequently, many young children do not meet the recommended intake levels of these foods. Confirmation bias may be a cognitive factor contributing to low intake of novel vegetables and fruits. This study aimed to examine confirmation bias at two stages-information seeking and information evaluation-regarding novel foods in young children and investigate the differences in the strength of these confirmation biases based on the levels of everyday food rejection. Experiment 1 identified novel foods perceived as appetizing or unappetizing. Using these stimuli, Experiment 2 (N = 34) examined confirmation bias during the information search stage, whereas Experiment 3 (N = 34) examined it during the information evaluation stage. The results showed that, for appetizing novel foods, children sought more positive than negative information (Experiment 2) and evaluated positive information as more credible (Experiment 3). However, for unappetizing novel foods, children sought more negative than positive information (Experiment 2) and evaluated negative information as more reliable (Experiment 3). Thus, children demonstrated confirmation bias at both the information-seeking and information-evaluation stages concerning novel foods. Furthermore, in Experiment 3, children with higher levels of food rejection showed lower trust in positive information relative to negative information when evaluating unappetizing foods. These findings deepen our understanding of the cognitive mechanisms underlying food rejection and may inform the development of more effective, child-centered interventions to promote healthy eating in early childhood.
    Keywords:  Confirmation bias; Food neophobia; Novel foods; Picky/fussy eating; Preschool children
    DOI:  https://doi.org/10.1016/j.jecp.2026.106514
  40. JMIR Pediatr Parent. 2026 Mar 30. 9 e80637
       Background: Singapore is a multicultural society characterized by a diverse array of ethnic groups, including Chinese, Malay, Indians, and others. A considerable percentage of Singaporeans are active users of the internet. The internet has become a significant resource for health education, particularly for women who wish to learn about a healthy lifestyle during pregnancy. However, it is still unclear how pregnant women search for information online, particularly within specific groups.
    Objective: This study aimed to explore the relationship between healthy lifestyle practices, online health information-seeking behaviors, and internet usage (IU) among 1905 pregnant women.
    Methods: Structural equation modeling (SEM) was used to evaluate the relationships between the appropriate intake of food groups, healthy diet practices (HD), internet for dietary advice (ID), internet for physical activity advice (IP), and IU, based on 5 hypotheses rooted in theoretical concepts. We used a multigroup SEM approach to examine these hypotheses across various ages, ethnicities, BMI, and categories of pregnant groups.
    Results: Our results confirmed 5 hypotheses, indicating significant relationships among the variables: appropriate intake of food groups was positively linked to HD (β=0.262; P<.001); HD was positively linked to ID (β=.168; P<.001); ID was positively linked to IP (β=0.185; P<.001); IP was positively linked to IU (β=0.190; P<.001); and HD was negatively linked to IU (β=-0.208; P<.001). The multigroup SEM analyses yielded significant differences in Hypotheses 2 and 3 when comparing different age groups (P=.009), BMI categories (P=.03), and number of pregnancies (P=.003).
    Conclusions: Our findings offer valuable insights into developing customized online interventions aimed at encouraging a healthy lifestyle during pregnancy.
    Keywords:  healthy lifestyle practices; internet usage; multigroup structural equation model; online health information-seeking behaviors; pregnant women
    DOI:  https://doi.org/10.2196/80637
  41. PLoS One. 2026 ;21(4): e0346262
      This study examined how cancer-related beliefs, information-seeking behaviors, and discussions about health with family or friends relate to depressive symptoms (PHQ-2 ≥ 3) among U.S. adults, using data from the 2024 Health Information National Trends Survey (HINTS 7; unweighted N = 6,826). Associations were estimated using survey-weighted logistic regression with jackknife replicate weights, adjusting for sociodemographic factors and personal or family cancer history; results are reported as adjusted odds ratios (ORs) with 95% confidence intervals (CIs). Weighted estimates indicate that approximately 15.5% of respondents screened positive for depression. Fatalistic beliefs, particularly the views that everything causes cancer (OR = 1.86; 95% CI: 1.39-2.48), prevention is not possible (OR = 1.69; 95% CI: 1.25-2.28), and cancer automatically means death (OR = 1.75; 95% CI: 1.31-2.34), were significantly associated with higher odds of screening positive for depression. In contrast, neither cancer information seeking (OR = 1.12; 95% CI: 0.83-1.51) nor discussions about health with family or friends (OR = 0.90; 95% CI: 0.62-1.30) showed a significant independent association with depression. In moderation analyses, discussions about health with family or friends weakened the positive association between each fatalistic belief and depression, but these interaction effects were not statistically significant. Sensitivity analyses using multiple imputation for missing data and restricting the analyses to respondents without a personal cancer history produced consistent results. Theoretical and practical implications of these findings are discussed.
    DOI:  https://doi.org/10.1371/journal.pone.0346262
  42. Front Public Health. 2026 ;14 1773518
       Objective: This study aimed to examine the association between health literacy and Health Information-Seeking Behavior (HISB) among patients with pulmonary nodules (PNs). It further assessed whether illness perception and self-efficacy were associated with this relationship using a theoretically specified serial mediation model informed by the Information-Motivation-Behavioral Skills (IMB) framework.
    Methods: This cross-sectional study was conducted from February to June 2024. Patients with PNs were recruited from two tertiary hospitals in Suzhou, China, using convenience sampling. Structural equation modeling (SEM) was applied to test hypothesized associations among variables. Bias-corrected bootstrapping was used to estimate direct and indirect effects, including serial indirect effects consistent with the hypothesized ordering. Multi-group analysis examined whether model estimates differed by educational level. Reporting followed the STROBE guidelines.
    Results: Overall, 321 patients completed the survey. The mean score of HISB was 131.85 (SD = 34.96). HISB showed modest positive correlations with health literacy (r = 0.464, p < 0.01) and self-efficacy (r = 0.497, p < 0.01), and negative correlation with illness perception (r = -0.429, p < 0.01). The SEM showed excellent fit (χ 2/df = 1.46, RMSEA = 0.038, CFI = 0.982). Health literacy showed association with HISB (β = 0.477, p < 0.001). Indirect associations were observed via self-efficacy [β = 0.110, 95% CI (0.062, 0.173)] and illness perception [β = 0.065, 95% CI (0.035, 0.108)]. A statistically significant but modest serial indirect effect was observed [β = 0.021, 95% CI (0.009, 0.040)], consistent with the hypothesized model. Multi-group analysis supported configural invariance across education levels, although the strength of some associations varied.
    Conclusion: This study found both direct and indirect associations between health literacy and HISB among patients with PNs. The findings suggest that interventions that providing literacy-sensitive support, address maladaptive illness perception, and strengthen self-efficacy may help foster adaptive information-seeking and improve long-term surveillance adherence and psychological outcomes.
    Keywords:  health literacy; illness perception; information-seeking behavior; pulmonary nodules; self-efficacy; structural equation modeling
    DOI:  https://doi.org/10.3389/fpubh.2026.1773518
  43. Dermatol Pract Concept. 2026 Jan 30. 16(1):
       INTRODUCTION: TikTok, a social media platform, is a tool for disseminating dermatological public health education. On TikTok, like other social media platforms, both board-certified dermatologists and non-medical providers ("influencers") provide dermatological advice, but how the quality of the advice compares between the two creator groups is unclear.
    OBJECTIVES: This study sought to assess similarities and differences in the language used within content, user receptibility, and engagement and the utility of online education in understanding hair and scalp disorders.
    METHODS: A cross-sectional analysis of 97 TikTok videos from 2023 was performed to evaluate the content quality of videos made by dermatologists and influencers regarding three common hair and scalp disorders: seborrheic dermatitis, telogen effluvium, and traction alopecia.
    RESULTS: Dermatologists and influencers had similar user engagement, but 49% of dermatologists were more likely to recommend standard treatments compared to 27% of influencers; 46% of influencers were more likely to recommend alternative treatment options compared to 25% of dermatologists. An analysis of user comments to assess audience understanding indicated that the three disorders were often confused with at least 2-3 other similar hair and scalp conditions.
    CONCLUSIONS: This study highlights a need to clarify for patients the standard of care for common hair and scalp disorders. This study also identified a universal lack of messaging encouraging users to seek in-person medical attention for their dermatological concerns.
    DOI:  https://doi.org/10.5826/dpc.1601a5755