bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–02–08
39 papers selected by
Thomas Krichel, Open Library Society



  1. Res Synth Methods. 2025 Nov;16(6): 953-960
      Our objective was to evaluate the recall and number needed to read (NNR) for the Cochrane RCT Classifier compared to and in combination with established search filters developed for Ovid MEDLINE and Embase.com. A gold standard set of 1,103 randomized controlled trials (RCTs) was created to calculate recall for the Cochrane RCT Classifier in Covidence, the Cochrane sensitivity-maximizing RCT filter in Ovid MEDLINE and the Cochrane Embase RCT filter for Embase.com. In addition, the classifier and the filters were validated in three case studies using reports from the Swedish Agency for Health Technology Assessment and Assessment of Social Services to assess impact on search results and NNR. The Cochrane RCT Classifier had the highest recall with 99.64% followed by the Cochrane sensitivity-maximizing RCT filter in Ovid MEDLINE with 98.73% and the Cochrane Embase RCT filter with 98.46%. However, the Cochrane RCT Classifier had a higher NNR than the RCT filters in all case studies. Combining the RCT filters with the Cochrane RCT Classifier reduced NNR compared to using the RCT filters alone while achieving a recall of 98.46% for the Ovid MEDLINE/RCT Classifier combination and 98.28% for the Embase/RCT Classifier combination. In conclusion, we found that the Cochrane RCT Classifier in Covidence has a higher recall than established search filters but also a higher NNR. Thus, using the Cochrane RCT Classifier instead of current state-of-the-art RCT filters would lead to an increased workload in the screening process. A viable option with a lower NNR than RCT filters, at the cost of a slight decrease in recall, is to combine the Cochrane RCT Classifier with RCT filters in database searches.
    Keywords:  literature searching; machine learning; randomized controlled trials; search filters; study classifiers; systematic review software
    DOI:  https://doi.org/10.1017/rsm.2025.10023
  2. Res Synth Methods. 2025 Mar;16(2): 228-250
      Search filters are single-concept systematic search strategies created by experts. Filters are a valuable resource for systematic searchers. Typically, filters are designed for a single database in a single interface. If researchers do not have access to that specific interface, the existing filter will be unusable without translation. Filter translation is a complex process that requires an understanding of information retrieval concepts, as well as the unique indexing and search functionality of databases and interfaces. The authors undertook a project to translate an APA PsycInfo search filter for Randomized Controlled Trials/Clinical Controlled Trials (RCT/CCT), developed by Canada's Drug Agency, from the Wolters Kluwer Health Ovid interface to the EBSCO Information Services EBSCOhost interface. We present here a guide for translation, from the first principles of systematic searching to fine details of the relevant database and interfaces, based on our experience and illustrated by a worked example. We discuss each element of a systematic search in a stepwise process, addressing both the underlying information retrieval concepts and the technical strategies for effective translation between the two interfaces. We end with a discussion on translation challenges, with some guidance on how to mitigate potential impacts on sensitivity. While we have endeavored to explain the workings of this process accessibly for researchers who are not experts in systematic searching, anyone undertaking a search translation project should work with a trained information specialist if they lack information retrieval expertise or are unfamiliar with the inner workings of the database, the original interface, and the destination interface.
    Keywords:  APA PsycInfo; EBSCO; Ovid; Search filters; search hedges; search translation
    DOI:  https://doi.org/10.1017/rsm.2024.18
  3. Res Synth Methods. 2025 Jul;16(4): 688-700
      We developed a geographic search filter for retrieving studies about Germany from PubMed. In this study, we aimed to translate and validate it for use in Embase and MEDLINE(R) ALL via Ovid. Adjustments included aligning PubMed field tags with Ovid's syntax, adding a keyword heading field for both databases, and incorporating a correspondence address field for Embase. To validate the filters, we used systematic reviews (SRs) that included studies about Germany without imposing geographic restrictions on their search strategies. Subsequently, we conducted (i) case studies (CSs), applying the filters to the search strategies of the 17 eligible SRs; and (ii) aggregation studies, combining the SRs' search strategies with the 'OR' operator and applying the filters. In the CSs, the filters demonstrated a median sensitivity of 100% in both databases, with interquartile ranges (IQRs) of 100%-100% in Embase and 93.75%-100% in MEDLINE(R) ALL. Median precision improved from 0.11% (IQR: 0.05%-0.30%) to 1.65% (IQR: 0.78%-3.06%) and from 0.19% (IQR: 0.11%-0.60%) to 5.13% (IQR: 1.77%-6.85%), while the number needed to read (NNR) decreased from 893.40 (IQR: 354.81-2,219.58) to 60.44 (IQR: 33.94-128.97) and from 513.29 (IQR: 167.35-930.99) to 19.50 (IQR: 14.66-59.35) for Embase and MEDLINE(R) ALL, respectively. In the aggregation studies, the overall sensitivities were 98.19% and 97.14%, with NNRs of 83.29 and 33.34 in Embase and MEDLINE(R) ALL, respectively. The new Embase and MEDLINE(R) ALL filters for Ovid reliably retrieve studies about Germany, enhancing search precision. The approach described in our study can support search filter developers in translating filters for various topics and contexts.
    Keywords:  Embase; MEDLINE; Ovid; bibliographic databases; geographic search filters
    DOI:  https://doi.org/10.1017/rsm.2025.10016
  4. Res Synth Methods. 2025 Jan;16(1): 1-14
      Systematic searches of published literature are a vital component of systematic reviews. When search strings are not "sensitive," they may miss many relevant studies limiting, or even biasing, the range of evidence available for synthesis. Concerningly, conducting and reporting evaluations (validations) of the sensitivity of the used search strings is rare, according to our survey of published systematic reviews and protocols. Potential reasons may involve a lack of familiarity or inaccessibility of complex sensitivity evaluation approaches. We first clarify the main concepts and principles of search string evaluation. We then present a simple procedure for estimating a relative recall of a search string. It is based on a pre-defined set of "benchmark" publications. The relative recall, that is, the sensitivity of the search string, is the retrieval overlap between the evaluated search string and a search string that captures only the benchmark publications. If there is little overlap (i.e., low recall or sensitivity), the evaluated search string should be improved to ensure that most of the relevant literature can be captured. The presented benchmarking approach can be applied to one or more online databases or search platforms. It is illustrated by five accessible, hands-on tutorials for commonly used online literature sources. Overall, our work provides an assessment of the current state of search string evaluations in published systematic reviews and protocols. It also paves the way to improve evaluation and reporting practices to make evidence synthesis more transparent and robust.
    Keywords:  bibliographic databases; evidence synthesis; information retrieval; information storage; searching; validity
    DOI:  https://doi.org/10.1017/rsm.2024.6
  5. Res Synth Methods. 2025 Jan;16(1): 211-227
      Bibliographic aggregators like OpenAlex and Semantic Scholar offer scope for automated citation searching within systematic review production, promising increased efficiency. This study aimed to evaluate the performance of automated citation searching compared to standard search strategies and examine factors that influence performance. Automated citation searching was simulated on 27 systematic reviews across the OpenAlex and Semantic Scholar databases, across three study areas (health, environmental management and social policy). Performance, measured by recall (proportion of relevant articles identified), precision (proportion of relevant articles identified from all articles identified), and F1-F3 scores (weighted average of recall and precision), was compared to the performance of search strategies originally employed by each systematic review. The associations between systematic review study area, number of included articles, number of seed articles, seed article type, study type inclusion criteria, API choice, and performance was analyzed. Automated citation searching outperformed the reference standard in terms of precision (p < 0.05) and F1 score (p < 0.05) but failed to outperform in terms of recall (p < 0.05) and F3 score (p < 0.05). Study area influenced the performance of automated citation searching, with performance being higher within the field of environmental management compared to social policy. Automated citation searching is best used as a supplementary search strategy in systematic review production where recall is more important that precision, due to inferior recall and F3 score. However, observed outperformance in terms of F1 score and precision suggests that automated citation searching could be helpful in contexts where precision is as important as recall.
    Keywords:  automation; evidence synthesis; guideline development; learning health systems; scoping review; systematic reviews
    DOI:  https://doi.org/10.1017/rsm.2024.15
  6. Res Synth Methods. 2025 Jan;16(1): 30-41
      While the Institute of Education Science's ERIC is often recommended for comprehensive literature searching in the field of education, there are several other specialized education databases to discover education literature. This study investigates journal coverage overlaps between four specialized education databases: Education Source (EBSCO), Education Database (ProQuest), ERIC (Institute of Education Sciences), and Educator's Reference Complete (Gale). Out of a total of 4,695 unique journals analyzed, there are 2,831 journals uniquely covered by only one database, as well as many journals covered by only two or three databases. Findings show that evidence synthesis projects and literature reviews benefit from the careful selection of multiple specialized education databases and that ERIC is insufficient as the primary education database for comprehensive searching in the field.
    Keywords:  database coverage; database selection; education; education databases; evidence synthesis; systematic reviews
    DOI:  https://doi.org/10.1017/rsm.2024.11
  7. Res Synth Methods. 2025 Mar;16(2): 308-322
      When conducting a systematic review, screening the vast body of literature to identify the small set of relevant studies is a labour-intensive and error-prone process. Although there is an increasing number of fully automated tools for screening, their performance is suboptimal and varies substantially across review topic areas. Many of these tools are only trained on small datasets, and most are not tested on a wide range of review topic areas. This study presents two systematic review datasets compiled from more than 8600 systematic reviews and more than 540000 abstracts covering 51 research topic areas in health and medical research. These datasets are the largest of their kinds to date. We demonstrate their utility in training and evaluating language models for title and abstract screening. Our dataset includes detailed metadata of each review, including title, background, objectives and selection criteria. We demonstrated that a small language model trained on this dataset with additional metadata has excellent performance with an average recall above 95% and specificity over 70% across a wide range of review topic areas. Future research can build on our dataset to further improve the performance of fully automated tools for systematic review title and abstract screening.
    DOI:  https://doi.org/10.1017/rsm.2025.1
  8. Bioinformatics. 2026 Feb 02. pii: btag053. [Epub ahead of print]
       MOTIVATION: Millions of high-throughput, molecular datasets have been shared in public repositories. Researchers can reuse such data to validate their own findings and explore novel questions. A frequent goal is to find multiple datasets that address similar research topics and to either combine them directly or integrate inferences from them. However, a major challenge is finding relevant datasets due to the vast number of candidates, inconsistencies in their descriptions, and a lack of semantic annotations. This challenge is first among the FAIR principles for scientific data. Here we focus on dataset discovery within Gene Expression Omnibus (GEO), a repository containing 100,000 s of data series. GEO supports queries based on keywords, ontology terms, and other annotations. However, reviewing these results is time-consuming and tedious, and it often misses relevant datasets.
    RESULTS: We hypothesized that language models could address this problem by summarizing dataset descriptions as numeric representations (embeddings). Assuming a researcher has previously found some relevant datasets, we evaluated the potential to find additional relevant datasets. For six human medical conditions, we used 30 models to generate embeddings for datasets that human curators had previously associated with the conditions and identified other datasets with the most similar descriptions. This approach was often, but not always, more effective than GEO's search engine. The top-performing models were trained on general corpora, used contrastive-learning strategies, and used relatively large embeddings. Our findings suggest that language models have the potential to improve dataset discovery, likely in combination with existing search tools.
    AVAILABILITY: Our analysis code and a Web-based tool that enables others to use our methodology are availabe from https://github.com/srp33/GEO_NLP and https://github.com/srp33/GEOfinder3.0, respectively.
    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    DOI:  https://doi.org/10.1093/bioinformatics/btag053
  9. Data Brief. 2026 Apr;65 112460
      The Indonesian Pharmaceutical Dataset for Self-medication consists of two structured datasets containing some of the most important public health information: a drug dataset and a disease dataset. Both were extracted from the websites of Indonesian-registered and regulated telemedicine providers. The drug dataset contains general data on drugs, indications, dosages, side effects, contraindications, and warnings, whereas the disease dataset contains definitions, descriptions, symptoms, and causes of diseases. Both datasets are provided in CSV file format and are available exclusively in Bahasa Indonesia to maintain consistency with the source content and cater to local users' needs. These datasets are available to facilitate research, application development, and Indonesian health information systems through locally contextualized and accessible health data for the Indonesian population to use. Some potential applications include powering health chatbots, arming medical search tools, guiding health literacy programs, and facilitating the integration of standardized local information into HealthTech platforms.
    Keywords:  Drug and disease; Indonesian dataset; Self-medication; Web scraping
    DOI:  https://doi.org/10.1016/j.dib.2026.112460
  10. Br J Oral Maxillofac Surg. 2026 Jan 07. pii: S0266-4356(26)00003-3. [Epub ahead of print]
      Large language models (LLMs) are increasingly used in healthcare, but their role in aesthetic surgical procedures remains unexplored. These interventions present unique challenges, marked by high patient expectations, emotionally charged decision-making, and subtle yet impactful outcomes on self-perception and psychosocial health. This cross-sectional in silico study evaluated the performance of ChatGPT-4 (OpenAI, 2025), DeepSeek V3 (DeepSeek AI/High-Flyer, 2025), and Gemini 2.5 Pro Experimental (Google, 2025) in preoperative and postoperative counselling for aesthetic facial surgery. Twenty-six standardised patient-oriented questions were submitted, and the anonymised responses of the chatbots were independently assessed by two calibrated oral and maxillofacial surgeons across four domains: accuracy, empathy, readability (Flesch-Kincaid Reading Ease (FKRE) and Grade Level (FKGL)), and referencing reliability (including the identification of fabricated or non-verifiable citations, a phenomenon referred to as "hallucination" in LLM outputs). Statistical tests included Kruskal-Wallis, Mann-Whitney U with Bonferroni correction, Spearman correlation, and chi-squared. DeepSeek achieved the highest accuracy (4.77 (0.51), p = 0.0078) and readability (FKRE 2.92 (0.27), p < 0.00001), while Gemini outperformed in empathy (4.08 (0.89), p < 0.001). GPT-4 produced the most hallucinated citations (36%) compared with Gemini (14%) and DeepSeek (8.8%) (p < 0.00001). A negative correlation between empathy and readability (r = -0.34, p = 0.002) suggested a trade-off between affective tone and accessibility. Overall, LLMs generated satisfactory counselling responses with distinct performance profiles, supporting their potential in patient-centred communication while reinforcing the need for human oversight.
    Keywords:  Aesthetic medicine; Artificial intelligence in healthcare; Chatbots; Large language models
    DOI:  https://doi.org/10.1016/j.bjoms.2026.01.002
  11. Medicine (Baltimore). 2026 Feb 06. 105(6): e47127
      This cross-sectional evaluation aimed to evaluate the quality of patient education materials provided by ChatGPT regarding otologic balance disorders. A total of 126 patient-oriented questions covering 9 common vestibular conditions - including benign paroxysmal positional vertigo, vestibular neuritis, labyrinthitis, Meniere disease, superior semicircular canal dehiscence, persistent postural perceptual dizziness, perilymph fistula, presbyvestibulopathy, and acoustic neuroma - were submitted to ChatGPT version 4o. The responses were independently evaluated by 2 otolaryngologists using the DISCERN tool to assess information quality and the PEMAT-P tool to evaluate understandability and actionability. The mean DISCERN score was 48.06 (range: 44.0-53.0), indicating moderate quality. PEMAT scores averaged 80% for understandability (range: 75%-88%) and 43% for actionability (range: 40%-60%). While the outputs were generally easy to understand, many lacked actionable guidance. In terms of information quality, the responses were generally acceptable for patient education purposes, though they occasionally included inaccuracies or omissions. ChatGPT may serve as a supportive tool for patient education on vestibular disorders but should be used with professional oversight to ensure safe and accurate communication.
    Keywords:  ChatGPT; artificial intelligence; balance disorders; health communication; large language models; patient information; vestibular disorders
    DOI:  https://doi.org/10.1097/MD.0000000000047127
  12. Drug Alcohol Depend. 2026 Jan 30. pii: S0376-8716(26)00055-4. [Epub ahead of print]280 113074
       BACKGROUND: Artificial intelligence (AI)-powered large language models like ChatGPT are increasingly used by the public to access health information. These platforms may be particularly appealing for high-risk conditions such as substance use disorder (SUD), where anonymity and nonjudgmental responses are valued. Despite growing interest in AI-assisted health education, limited research has assessed the quality of ChatGPT's content when it comes to accuracy and completeness on complex behavioral health topics. This study evaluated the accuracy and clinical consistency of ChatGPT's responses to SUD-related questions compared to national health guidelines.
    METHODS: This descriptive study, using a content analysis approach, analyzed ChatGPT 3.5's and 5's responses to 14 clinically relevant SUD-related questions, drawn from over 200 FAQs sourced from six leading U.S. health organizations in comparison to the top SUD questions asked by US adults using ChatGPT. Each response was independently assessed by a multidisciplinary team for accuracy, clarity, and appropriateness using an evidence-informed rating system. Responses were categorized as excellent, satisfactory requiring minimal clarification, satisfactory requiring moderate clarification, or unsatisfactory. Discrepancies were resolved through consensus.
    RESULTS: Among the 14 responses, 3 were rated excellent, 9 were satisfactory requiring minimal clarification, and 2 were satisfactory requiring moderate clarification. None were rated unsatisfactory. ChatGPT responses were generally accurate for straightforward questions but lacked clinical nuance and specificity in more complex scenarios, particularly regarding individualized care recommendations, withdrawal management, and treatment planning.
    CONCLUSION: As AI becomes more integrated into health information-seeking behaviors, continued evaluation of its role and potential impact in addiction medicine is essential.
    Keywords:  Artificial intelligence; ChatGPT; National health guidelines; Substance use disorder; Treatment
    DOI:  https://doi.org/10.1016/j.drugalcdep.2026.113074
  13. Eur J Ophthalmol. 2026 Feb 06. 11206721261419675
      Background/ObjectivesPatient information can influence decision-making and engagement with healthcare. This study compares the quality of cataract surgery patient information leaflets (PILs) generated by ChatGPT (an AI model) and two reputable hospitals, assessing AI's potential in producing high-quality patient information.Subjects/Methods15 ophthalmologists and 32 patients evaluated three anonymised cataract PILs: one generated by ChatGPT, one from Mount Sinai Hospital (USA) and Manchester Royal Eye Hospital (UK). Doctors used the DISCERN tool (16 questions) for quality assessment. Patients used a shortened version (5 questions). Additional preference and readability questions were added, alongside a readability assessment. PIL ratings and differences between doctor and patient scores were compared.ResultsThe ChatGPT PIL scored lowest amongst doctors (mean 42.75 (SD 9.06)/75), followed by Manchester (47.04 (8.56)/75), with Mount Sinai's PIL highest (54.65 (7.09)/75) (p=<0.01). Patients similarly rated ChatGPT lowest (mean total score 4.50 (0.21)/5), with Manchester highest (4.84 (0.06)/5) (p = 0.04). Despite this, doctors were evenly divided on their preferred PIL, while more patients preferred ChatGPT over Mount Sinai. Mount Sinai's PIL had the highest inter-rater reliability(k = 0.38, 95% CI 0.10-0.60), and ChatGPT the lowest (k = 0.13, 95% CI 0.10-0.15). ChatGPT had the lowest Flesch Reading Ease score but doctors rated it most readable.ConclusionsThis study is the first to assess AI-generated cataract PILs using doctor and patient feedback. While ChatGPT received the lowest ratings, some favoured it, particularly for its clarity and readability. Doctors' highest-rated PIL was the patients' least favoured. This study highlights AIs potential in PIL development and the importance of doctor and patient feedback in this process.
    Keywords:  LENS / CATARACT; Phacoemulsification < LENS / CATARACT; SOCIOECONOMICS AND EDUCATION IN MEDICINE/OPHTHALMOLOGY; lens changes < LENS / CATARACT; practice management < SOCIOECONOMICS AND EDUCATION IN MEDICINE/OPHTHALMOLOGY
    DOI:  https://doi.org/10.1177/11206721261419675
  14. Front Bioeng Biotechnol. 2026 ;14 1750225
       Background: Organoids have become central platforms in precision oncology and translational research, increasing the need for communication that is accurate, transparent, and clinically responsible. Large language models (LLMs) are now widely consulted for organoid-related explanations, but their ability to balance readability, scientific rigor, and educational suitability has not been systematically established.
    Methods: Five mainstream LLMs (GPT-5, DeepSeek, Doubao, Tongyi Qianwen, and Wenxin Yiyan) were systematically evaluated using a curated set of thirty representative organoid-related questions. For each model, twenty outputs were independently scored using the C-PEMAT-P scale, the Global Quality Score (GQS), and seven validated readability indices. Between-model differences were analyzed using one-way ANOVA or Kruskal-Wallis tests, and correlation analyses were performed to examine associations between readability and quality measures.
    Results: Model performance differed markedly, with GPT-5 achieving the highest C-PEMAT and GQS scores (16.05 ± 1.10; 4.70 ± 0.47; both P < 0.001), followed by intermediate performance from DeepSeek and Doubao (C-PEMAT 11.75 ± 2.07 and 12.05 ± 1.82; GQS 3.65 ± 0.49 and 3.35 ± 0.49). Tongyi Qianwen and Wenxin Yiyan comprised the lowest-performing tier (C-PEMAT 7.85 ± 1.09 and 9.00 ± 2.05; GQS 1.55 ± 0.51 and 2.10 ± 0.55). Score-distribution patterns further highlighted reliability gaps, with GPT-5 showing tightly clustered values and domestic models displaying broader dispersion and unstable performance. Readability differed significantly across models and question categories, with safety-related, diagnostic, and technical questions showing the highest linguistic and conceptual complexity. Correlation analyses showed strong internal coherence among readability indices but only weak-to-moderate associations with C-PEMAT, GQS, and reliability metrics, indicating that linguistic simplicity is not a dependable surrogate for scientific quality.
    Conclusion: LLMs exhibited substantial variability in communicating organoid-related information, forming distinct performance tiers with direct implications for patient education and translational decision-making. Because readability, scientific quality, and reliability diverged across models, linguistic simplification alone is insufficient to guarantee accurate or dependable interpretation. These findings underscore the need for organoid-adapted AI systems that integrate domain-specific knowledge, convey uncertainty transparently, ensure output reliability, and safeguard safety-critical information.
    Keywords:  artificial intelligence; large language models; online medical information; organoids; readability
    DOI:  https://doi.org/10.3389/fbioe.2026.1750225
  15. Cureus. 2025 Dec;17(12): e100367
       BACKGROUND: Correct condom use is essential for preventing sexually transmitted infections (STIs), human immunodeficiency virus (HIV), and unintended pregnancies, yet significant knowledge gaps persist, especially among young people in India. YouTube is widely used for sexual-health information, but its engagement-driven algorithm often promotes videos with high views rather than high accuracy, increasing the risk of misinformation. Limited evidence exists on the accuracy and completeness of condom-use instructions available to Indian audiences.
    OBJECTIVE: To analyze condom-use steps presented in popular YouTube videos in English, Hindi, and Tamil, and compare them with the National Health Mission (NHM) condom-use guidelines.
    METHODS: A cross-sectional content analysis was conducted on 30 YouTube videos (15 in English, eight in Hindi, seven in Tamil) identified through predefined multilingual keywords in incognito search. Videos were evaluated using a 16-point Total Score (T Score) and an 8-point Vital Score (V Score), both developed by the authors based on the NHM condom-use guidelines. Descriptive statistics were used to assess completeness, accuracy, and presentation patterns.
    RESULTS: Major instructional gaps were observed across all four steps of condom use. On average, videos mentioned 2.6 of five instructions (52%) for Step 1 (opening), 2.5 of three instructions (83.3%) for Step 2 (application), 0.5 of three instructions (16.7%) for Step 3 (during sex), and 1.7 of five instructions (34%) for Step 4 (disposal). English-language videos performed best (mean V Score: 4.7/8), followed by Hindi (4.4/8) and Tamil (3.0/8). Videos with >10 million views, those published by channels with 10,000-100,000 subscribers, and videos with >500 comments were significantly associated with higher instructional quality. No video demonstrated all eight essential steps.
    CONCLUSION: The analyzed YouTube videos did not consistently follow the step-by-step NHM Condom Use Guidelines, with lower frequencies of instruction during the "during sex" and disposal stages and incomplete coverage even within more frequently mentioned steps. These instructional gaps mirror documented user errors, suggesting that suboptimal online content may indirectly contribute to improper condom-use practices. Improving the completeness and stage-wise consistency of videos - preferably through professional review and alignment with freely available NHM guidelines - may enhance the reliability of YouTube as a sexual-health education resource.
    Keywords:  condom use; content analysis; national health mission (nhm) condom use guidelines; sexual health education; youtube health information
    DOI:  https://doi.org/10.7759/cureus.100367
  16. Bull Hosp Jt Dis (2013). 2025 Dec 01. 83(1): 172-178
       ABSTRACT: Given the progressive spread of medical misinformation, access to understandable educational content from trusted sources has become increasingly more crucial for patients. Online patient education materials, particularly from specialty organizations, have been criticized for being too complex for the average reader. It is advised that this information be at or below the 6th-grade reading level. This study is a 9-year follow-up to an analysis conducted in 2013 which evaluated the overall readability of educational articles from the American Academy of Orthopaedic Surgeons (AAOS) and American Society for Surgery of the Hand (ASSH) websites related to shoulder and elbow conditions. In the current investigation, 74 shoulder and elbow articles were assessed using the same methodology, which included analyzing the number of years since their last update, word count, percentage of passive sentences, Flesch Reading Ease score, Flesch-Kincaid grade level, Simple Measure of Gobbledygook (SMOG) grade, and New Dale-Chall grade level. No articles from either site were at or below the recommended 6th-grade reading level. Those from the AAOS were longer than those from the ASSH (P < .001). The articles had a mean Flesch Reading Ease score of 53.8 vs. 58 (P = .01), Flesch-Kincaid grade level of 9.6 vs. 9.4, SMOG grade of 8.9 vs. 8.6, and New Dale-Chall grade of 10.5 vs. 10.1 for the AAOS and ASSH sites, respectively. Although no significant differences in the readability measures were noted between the 2013 and current AAOS articles, the current ASSH content had a significantly higher Flesch Reading Ease score (P = .01) and significantly lower Flesch-Kincaid (P = .04), SMOG (P = .03), and New Dale-Chall (P = .03) grade levels, than their 2013 counterparts. Although improvements have been made in the shoulder and elbow articles from the ASSH, there remains a need to further improve the readability of AAOS and ASSH online materials to better ensure adequate patient education.
    Keywords:  Readability; elbow; patient education; shoulder
    DOI:  https://doi.org/10.1097/bh9.0000000000000030
  17. Eur Arch Otorhinolaryngol. 2026 Feb 01.
       PURPOSE: Adenoid hypertrophy (AH) is a common condition in children, often leading to nasal obstruction, mouth breathing, and sleep disturbances. With the increasing use of YouTube as a source of medical information, concerns have arisen regarding the accuracy and reliability of video content on this platform. This study aimed to evaluate the quality, reliability, and usefulness of English-language YouTube videos related to AH.
    METHODS: A total of 300 videos were screened using specific keywords, and 93 met the inclusion criteria. Videos were assessed for their general characteristics and classified as either useful or misleading based on scientific accuracy. The Global Quality Scale (GQS), modified DISCERN (mDISCERN), and JAMA benchmarks were used to evaluate video quality and reliability.
    RESULTS: Among the included videos, 78.5% were deemed useful, while 21.5% were misleading. Videos uploaded by academic institutions and physicians demonstrated significantly higher mDISCERN, GQS, and JAMA scores (p < 0.001). In contrast, the majority of misleading videos were uploaded by independent users. A strong correlation was found between viewer engagement metrics (likes, comments) and daily view counts (p < 0.001), though higher popularity did not consistently align with higher quality or reliability.
    CONCLUSION: Although YouTube offers a substantial number of informative videos on AH, the presence of misleading content remains a concern, particularly from non-professional sources. Healthcare professionals and institutions are encouraged to produce high-quality, reliable video content to enhance public health literacy and counter misinformation.
    Keywords:  Adenoid hypertrophy; GQS; Health information; JAMA; Video quality; YouTube; mDISCERN
    DOI:  https://doi.org/10.1007/s00405-025-09834-7
  18. Hand Surg Rehabil. 2026 Feb 04. pii: S2468-1229(26)00027-7. [Epub ahead of print] 102592
       INTRODUCTION: Patients have become increasingly reliant on the internet to seek health-related information (HRI). The newfound popularity of artificial intelligence (AI) search engines has created interest in their ability to provide HRI. This study aimed to quantify and compare the readability of carpal tunnel syndrome (CTS) HRI from the American Academy of Orthopaedic Surgeons OrthoInfo and AI search engines.
    METHODS: Six prompts were developed using the OrthoInfo page on CTS. These prompts were entered to ChatGPT-4 and Google Gemini 2.0 Flash to generate AI responses. The readability of this information was calculated using the Flesch-Kincaid Reading Ease Index, Coleman-Liau Index, Flesch-Kincaid Grade Level, FORCAST Readability Formula, Gunning Fog index, and Simple Measure of Gobbledygook Index. Statistical testing was performed using the Kruskal-Wallis nonparametric One-Way Analysis of Variance test.
    RESULTS: The mean grade level readability score across all platforms, questions, and testing metrics was 12.6. No significant differences were observed between the overall mean grade level readability scores of OrthoInfo, ChatGPT, and Gemini, nor were they observed for any specific prompt. The only significant differences were found using the Flesch-Kincaid Grade Level test, for which ChatGPT had the lowest scores.
    CONCLUSION: The readability of carpal tunnel syndrome health-related information from OrthoInfo, ChatGPT, and Gemini is similar. Physicians should advise patients to continue using OrthoInfo as a primary source of carpal tunnel syndrome information, although artificial intelligence search engines are useful to supplement when patient concerns require more tailored responses. Notably, no text included in this study was at recommended reading level thresholds.
    Keywords:  Carpal tunnel syndrome; OrthoInfo; artificial intelligence; health literacy; readability
    DOI:  https://doi.org/10.1016/j.hansur.2026.102592
  19. Digit Health. 2026 Jan-Dec;12:12 20552076261416800
       Objective: This study aimed to evaluate the quality, reliability, and readability of online patient-centered information related to the management of gummy smile.
    Methods: A systematic search was conducted using Google, Yahoo, and Bing to identify websites providing patient-oriented information on gummy smile treatments. A total of 257 websites met the inclusion criteria and were analyzed. Content quality was assessed using the DISCERN instrument, Journal of the American Medical Association (JAMA) benchmarks, and the Health on the Net (HON) code certification. Readability was evaluated using the Flesch Reading Ease Score (FRES), Flesch -Kincaid Grade Level (FKGL), Simplified Measure of Gobbledygook (SMOG) index, and Coleman -Liau index.
    Results: The overall quality of online information was low to moderate, with a mean DISCERN score of 40 ± 9.9. Only 11 websites were certified by the HON code, indicating limited adherence to established standards for trustworthy health information. According to JAMA benchmarks, only two websites fulfilled all four criteria. Readability analysis demonstrated that the content was relatively complex, with a mean FRES of 60.1 ± 9.2 and a mean FKGL of 8.9 ± 1.8, exceeding the recommended reading level for the general public.
    Conclusions: Online patient-centered information regarding gummy smile is generally of suboptimal quality and readability. The limited number of reliable and easily understandable resources underscores the need for improved quality control, standardization, and patient-focused content development. Enhancing the accessibility and reliability of online information may support better patient understanding and informed decision-making in dental aesthetics.
    Keywords:  DISCERN; Web-based knowledge; gummy smile; online information; readability
    DOI:  https://doi.org/10.1177/20552076261416800
  20. PeerJ. 2026 ;14 e20543
       Introduction: Helicobacter pylori (H. pylori) has drawn considerable attention because of its high infection rate. Although WeChat Official accounts (WOAs) have become a prevalent source of public health information, the reliability and scientific validity of H. pylori-related content on the platform remain uncertain. Therefore, this study aimed to systematically evaluate the reliability and quality of health information on H. pylori disseminated through WOAs and propose evidence-based strategies for enhancing the standard of online health information.
    Methods: Articles containing the keywords "" or "" (Chinese for H. pylori) were retrieved from the WeChat platform. After selection, a total of 115 articles were included in this study. Subsequently, raters collectively evaluated the articles using the Journal of American Medical Association (JAMA) benchmark criteria, the modified DISCERN (mDISCERN) tool, and the Global Quality Scale (GQS). Statistical analyses were then conducted. All continuous data were described as median (interquartile range).
    Results: The median scores for JAMA, mDISCERN, and GQS across all articles were 2.00 (1.00), 3.00 (2.00), and 3.00 (2.00), respectively. Spearman correlation analysis revealed significant positive correlations between each pair of assessment tools (JAMA, mDISCERN, and GQS; P < 0.001). The Kruskal-Wallis test indicated that JAMA, mDISCERN, and GQS scores were all significantly associated with article sources (p < 0.001). Enterprise accounts contributed to the majority of articles (58.51%). Articles sourced from non-profit organizations demonstrated higher reliability and quality, whereas those from individual sources exhibited lower scores. The issues identified in the articles primarily concerned the treatment of H. pylori.
    Conclusion: Generally, the reliability and quality of H. pylori information found on WOAs was unsatisfactory. Users face a significant risk of exposure to misinformation. Content originating from non-profit organizations or large tertiary hospitals demonstrated strong correlations with higher reliability and quality scores. To address these challenges and enhance the credibility of online health information, concerted efforts are required.
    Keywords:  Health information; Helicobacter pylori; Popular science article; Quality; Reliability; WeChat official account
    DOI:  https://doi.org/10.7717/peerj.20543
  21. Surg Innov. 2026 Feb 07. 15533506261424687
      IntroductionSocial media is a significant platform for health information. However, the quality and reliability of patient facing surgical content is uncertain. We evaluated the quality and reliability of TikTok and Instagram videos about three common general surgical procedures: laparoscopic appendicectomy; laparoscopic cholecystectomy; and inguinal hernia repair, and compared performance by platform, procedure, and creator type.MethodsWe conducted a cross-sectional study of the top fifty results per procedure per platform. Videos were classified as useful, misleading, personal experience, or irrelevant and quality and reliability assessed with the Global Quality Score (GQS) and modified DISCERN (mDISCERN) score respectively.Results300 videos, accruing 592,975 likes and 11,489 comments, were analysed. Videos were low in both quality and reliability across both platforms although higher on Instagram (GQS 1.95; mDISCERN 1.65) than TikTok (GQS 1.27; mDISCERN 0.33; both P < .0001). 53/300 (17.7%) videos were judged to be misleading. Useful content was less frequent on TikTok than Instagram (14/150, 9.3% vs 82/150, 54.7%; P < .0001). Professional content was deemed more useful than that of non professionals (54/117, 46.2% vs 42/183, 23.0%; P < .0001) with higher quality and reliability scores (GQS 1.80 vs 1.49; mDISCERN 1.36 vs 0.76; both P < .0001).ConclusionsSurgical educational videos across popular social media platforms are low in quality and reliability. Patients should be wary of the risk of possible health misinformation. Clinicians and professional bodies should be aware of the growing popularity of social media and consider the production of evidence-based content on these platforms to disseminate credible information and counter misinformation.
    Keywords:  General Surgery; evidence based medicine/surgery; surgical education
    DOI:  https://doi.org/10.1177/15533506261424687
  22. Phlebology. 2026 Feb 04. 2683555261424065
      ObjectiveTo evaluate the educational quality, reliability, and transparency of YouTube™ videos on lipoedema, and to examine associations with uploader type and engagement metrics.MethodsOn 15 May 2025 we searched YouTube™ for "lipoedema," screened the first 200 relevance-ranked items, and included videos ≥60 s with intelligible audio. Advertisements, duplicates and soundless videos were excluded. Two independent physicians in Physical Medicine and Rehabilitation (PM&R) rated eligible videos using DISCERN, the Global Quality Score (GQS), and the Journal of the American Medical Association (JAMA) benchmark criteria; disagreements were discussed and original ratings retained for agreement analyses. We recorded upload date, duration, views, likes, comments, channel subscribers, uploader category, and content domain.ResultsWe analyzed 92 YouTube™ lipoedema videos uploaded between 25 February 2015 and 8 January 2025. Uploader mix: vascular surgeons 39.1% (largest) and PM&R physicians 4.3% (smallest); the most common topic was definition + symptoms + management (26.1%). Mean DISCERN totals were 33.47 ± 9.88 and 33.42 ± 8.68 (both poor); mean GQS 2.18 ± 0.82 and 2.43 ± 0.81; only 6.6% were high quality and none scored 5/5. Views correlated strongly with likes and comments (both p < .001), moderately with duration (p < .01), and weakly with subscribers (p < .05). Inter-rater agreement was strong (r = 0.859/0.663/1.000; all p < .001).ConclusionThe overall quality and transparency of YouTube™ lipoedema videos are suboptimal despite substantial engagement. Increasing expert-authored, evidence-based content-particularly from PM&R- and co-produced patient-clinician videos may better align reliability with reach.
    Keywords:  YouTube™; information source; lipoedema; quality; reliability
    DOI:  https://doi.org/10.1177/02683555261424065
  23. Int J Occup Saf Ergon. 2026 Feb 02. 1-9
       OBJECTIVES: This study aimed to evaluate the quality, reliability and content of YouTube videos related to office ergonomics.
    METHODS: The descriptive study analyzed 196 English-language YouTube videos, selected from 752 videos published, using the keywords 'workplace ergonomics' or 'office ergonomics' based on the inclusion and exclusion criteria. The reliability, quality and content of the videos were assessed using the modified DISCERN (mDISCERN), the global quality score (GQS) and the office ergonomics content evaluation checklist (OE-CEC).
    RESULTS: Researcher 1 rated 71.5% as low quality, 68.4% as low reliability and 69.4% as insufficient content, while Researcher 2 rated 62.3, 59.8 and 70.9% respectively. mDISCERN and OE-CEC scores were significantly associated with uploader source, subtitle presence and video duration. Content scores were also associated with view ratio and number of comments. The GQS mean showed significant associations with uploader source, duration, view ratio, video power index and number of comments.
    CONCLUSIONS: YouTube videos on office ergonomics were of low quality, low reliability and insufficient content. Public institutions produced more reliable and higher quality videos, and longer videos contained more comprehensive information. It is recommended that government agencies, universities and occupational health teams are encouraged to produce accurate and reliable videos on office ergonomics.
    Keywords:  YouTube; occupational health; office ergonomics; quality; reliability
    DOI:  https://doi.org/10.1080/10803548.2026.2616999
  24. Int J Impot Res. 2026 Feb 02.
      We aimed to determine the content, reliability, and quality of YouTube videos related to intracavernosal injection (ICI). A search for the keyword "intracavernosal injection" was conducted on YouTube in February 2025, and the first 100 videos were watched. Video features were recorded. Each video was evaluated by two independent urologists using the comprehensiveness scale designed specifically for this study, modified DISCERN and Global Quality Scale (GQS). The study included 60 videos after exclusion criteria were applied. Videos were classified as belonging to one of two categories: useful or misleading. Useful videos contained scientific suggestions, while misleading videos contained insufficient or unproven information. Analysis revealed that 52 videos provided useful information, whereas 8 videos disseminated misleading content. Useful videos demonstrated significantly higher scores on the comprehensiveness, modified DISCERN, and GQS compared to misleading videos (p < 0.001). Most narrators were urologists, and their videos were more useful (94.4 vs. 75.0%, p = 0.03). Videos narrated by urologists scored significantly higher GQS compared to others (p = 0.018). We also found that USA-based videos were more useful than other countries (93.8 vs 58.3%, p = 0.001). Additionally, videos from the USA had higher comprehensiveness and GQS scores (p = 0.023 and p = 0.038, respectively). Our analysis revealed that YouTube videos about ICI exhibited high quality, reliability, and rich informational content. The current findings highlight the significance of actively encouraging the utilization of urologist-narrated training video materials.
    DOI:  https://doi.org/10.1038/s41443-026-01229-4
  25. PLoS One. 2026 ;21(2): e0341799
      The Pericapsular Nerve Group (PENG) block is a novel regional anesthesia technique that provides adequate analgesia while preserving motor function. This cross-sectional study evaluated the quality, reliability, and educational value of YouTube videos on the PENG block. Thirty-six videos were analyzed using validated scoring systems (GQS, JAMA, DISCERN, and modified DISCERN). Overall video quality was moderate, with higher scores observed in procedural and institutional videos. The findings highlight both the educational potential and the need for quality control in online medical content.
    DOI:  https://doi.org/10.1371/journal.pone.0341799
  26. Sci Rep. 2026 Feb 05.
      Weight management has become a major focus in worldwide health, and platforms like TikTok and Bilibili are now popular for health information. However, the quality and reliability of weight management content on these platforms remain uncertain. This research systematically evaluated such videos to provide evidence-based guidance for public health communication. We analyzed the top 100 weight management videos from TikTok and Bilibili, recording their sources, content, and characteristics. The DISCERN instrument, JAMA benchmark criteria, and Global Quality Score (GQS) were used to evaluate video quality and reliability. Further analysis was conducted to explore the relationship between video quality and video characteristics. While TikTok videos attracted more likes, saves, comments, and shares, Bilibili videos were longer and exhibited higher quality and reliability (all P < 0.001). Videos from doctors and non-profit organizations had the highest DISCERN and GQS scores, while those from fitness bloggers and individual users were more popular but of lower quality. Video duration was positively associated with quality, whereas engagement indicators (likes, comments, shares, saves) were negatively associated with both GQS and DISCERN. Overall, the quality of weight management videos on TikTok and Bilibili was poor, although Bilibili performed better than TikTok. Doctors and non-profit organizations produced higher-quality content, highlighting the need for stronger platform review and greater professional contributions to improve the dissemination of reliable health information.
    Keywords:  Bilibili; Social media; TikTok; Video quality; Video reliability; Weight management
    DOI:  https://doi.org/10.1038/s41598-026-38404-y
  27. Digit Health. 2026 Jan-Dec;12:12 20552076261418919
       Background: Despite being a prevalent peripheral vestibular disorder in China, Meniere's disease (MD) suffers from low awareness, frequent misdiagnosis, and unsatisfactory treatment rates. As TikTok has become a prominent source of health information, no study has systematically evaluated the quality of its MD-related content. We therefore assessed the accuracy and reliability of MD videos on Chinese TikTok.
    Methods: Top 100 videos for "Meniere's disease/syndrome" (TikTok, 1 May 2025) were analyzed. Quality was assessed using Video Information and Quality Index (VIQI), Global Quality Score (GQS), modified DISCERN (mDISCERN), and Patient Education Materials Assessment Tool for Audio-Visual Content (PEMAT-A/V). Descriptive statistics, correlation analyses, and predictive modeling were applied to 83 valid videos.
    Results: Among 83 videos, 91.6% (n = 76) were physician-uploaded (primarily otolaryngologists/neurologists). Monologue, Q&A, and medical scenario formats showed superior quality. Symptoms dominated content (47%). Neurologists generated significantly higher normalized engagement per second than otolaryngologists (all adj. p < 0.05, r > 0.35). Physicians outperformed news agencies in GQS scores (adj. p < 0.05, r = 0.291). Otolaryngologists scored higher than both neurologists and Traditional Chinese Medicine practitioners in PEMAT-A/V Understandability (all adj. p < 0.05, r > 0.37). Attending physicians exceeded chief physicians on all quality metrics (all adj. p < 0.05, r > 0.35), an advantage potentially linked to their younger age, greater digital literacy, and more frequent social media use. Engagement metrics (likes, comments, favorites, shares) correlated strongly (r > 0.8). Predictive models for PEMAT-U/A were significant (p < 0.001), lacking multicollinearity/autocorrelation.
    Conclusion: Physician-created MD content ensures credibility but requires quality improvement. PEMAT-U/A models guide enhancements, though broader application needs validation. Key health informatics priorities include certified creator engagement, algorithm optimization, and innovative content design.
    Keywords:  GQS; Meniere's disease; PEMAT-A/V; TikTok; VIQI; health information quality; mDISCERN
    DOI:  https://doi.org/10.1177/20552076261418919
  28. Medicine (Baltimore). 2026 Feb 06. 105(6): e47523
      Hypothyroidism is a common endocrine disorder that significantly impacts patients' quality of life. In recent years, short-video platforms have become an important source of health information for the public. This study aimed to evaluate the content, quality, and reliability of hypothyroidism-related videos on TikTok and Bilibili. We searched for videos related to "hypothyroidism" on TikTok and Bilibili and included the top 150 videos based on comprehensive ranking, excluding irrelevant, duplicate, advertisement, and course-related videos. Extracted variables included video duration, number of likes, collections, comments, shares, uploader type, and content themes. The Global Quality Score and modified DISCERN (mDISCERN) tools were used to assess each video. Mann-Whitney U tests and Kruskal-Wallis tests were used to compare differences, and Spearman correlation analysis was performed to examine associations between engagement metrics and video quality. A total of 270 videos were included. Video content primarily focused on treatment (62.2%) and symptoms (60.0%), whereas prevention (7.0%) and epidemiology (4.1%) were notably underrepresented. Videos on Bilibili were longer but had lower engagement (P < .05), while TikTok videos had higher mDISCERN scores. Videos uploaded by specialists received the highest Global Quality Score and mDISCERN scores (P < .05). Engagement metrics were strongly intercorrelated (P < .05), but showed no significant association with video quality (P > .05). Video length demonstrated a weak correlation with quality (P < .05). This study revealed that hypothyroidism-related short videos generally have incomplete content structures, particularly with insufficient coverage of prevention and epidemiology. The overall quality and reliability were suboptimal, with videos by specialists demonstrating higher quality. Future efforts should encourage greater participation of specialists in short-video health education content creation, and platforms should strengthen content oversight and optimize algorithms to enhance the visibility of high-quality video content.
    Keywords:  Bilibili; Global Quality Score; TikTok; hypothyroidism; modified DISCERN; social media
    DOI:  https://doi.org/10.1097/MD.0000000000047523
  29. Medicine (Baltimore). 2026 Feb 06. 105(6): e47565
      Social media platforms have become an increasingly important sources for obtaining health information. Short-form videos related to reflux esophagitis (RE) are widely disseminated on Chinese video platforms. The quality and reliability of such content remain unclear. This cross-sectional study analyzed the top 150 RE-related videos from TikTok and Bilibili. Video characteristics, uploader background, and engagement metrics were extracted. Information quality was evaluated using the Global Quality Score (GQS), modified DISCERN and Journal of the American Medical Association benchmark criteria assessment tools. The Mann-Whitney U test or Kruskal-Wallis test was applied for subgroup comparisons, and Spearman rank correlation coefficient was used for correlation analysis. A total of 214 videos met the inclusion criteria, and the overall quality was moderate. The median GQS score was 3 (IQR 2-3), the median modified DISCERN score was 2 (IQR 2-3), and the median Journal of the American Medical Association score was 1 (IQR 1-2). TikTok videos had significantly higher GQS scores than those on Bilibili (P < .05). Videos uploaded by gastroenterologists received the highest GQS scores (P < .05). Clinical manifestations were the most frequently discussed topic (75.70%), whereas epidemiology was least represented (13.55%). No significant correlations were found between engagement metrics and quality scores (P > .05). This study provides a comprehensive evaluation of RE-related short videos across 2 major social media platforms. Uploader professional background, particularly gastroenterology specialization, was a stronger determinant of information quality than engagement metrics. The findings highlight the limitations of popularity-based indicators for identifying credible medical information and underscore the need to promote specialist participation and evidence-informed governance to improve the quality of digital health communication.
    Keywords:  Bilibili; TikTok; health communication; reflux esophagitis; video quality
    DOI:  https://doi.org/10.1097/MD.0000000000047565
  30. BMC Womens Health. 2026 Feb 06.
       BACKGROUND: With the increasing public awareness of health, TikTok and Bilibili have become the domain source of short-video platforms for health-related information. This study aims to investigate the quality and reliability of short videos about endometrial cancer on two platforms.
    METHODS: This cross-sectional study was conducted on TikTok and Bilibili to evaluate short videos related to endometrial cancer. The two platforms were searched using the term 'endometrial cancer' from 15:00 to 22:00 on 29 September 2025. The quality of related information was assessed by the Global Quality Score (GQS) and the modified DISCERN score (mDISCERN), and analyzed by descriptive statistics, inter-group comparison, and correlation analysis.
    RESULTS: 200 initial endometrial cancer-related videos were searched, and finally, 174 videos were included from TikTok and Bilibili. The GQS and mDISCERN scores of both platforms were low (median 2/5), and the median(Q1-Q3) duration of these videos was 80 s (46.50-144.50), and the duration of Bilibili videos was longer (111 s to 63 s, P < 0.001). The engagement with TikTok videos was relatively high (median likes: 1342 to 6, P < 0.001). Expert-uploaded videos were longer than those from other origins (median 109 s) and of better quality (median mDISCERN 3/5, P < 0.001). The quality component was positively associated with the duration of short videos(r = 0.42-0.46). The five major dimensions of health demonstrated "fragmented content and insufficient depth": the highest coverage rate of symptoms was 36.2%, while the lowest coverage rate of prevention was 3.5%; the coverage rates of etiology, diagnosis, and therapy were all below 20%.
    CONCLUSIONS: Significant quality gaps exist in endometrial cancer videos. To enhance the structure and completeness of health information, professional participation should be strengthened in the future.
    Keywords:  Endometrial cancer; Health information; Quality; Reliability; Short videos
    DOI:  https://doi.org/10.1186/s12905-026-04330-4
  31. BMC Public Health. 2026 Feb 02.
       BACKGROUND: Knee osteoarthritis is a highly disabling chronic disease that imposes a substantial societal burden.Short video platforms have become a primary source of health information in China, yet the quality of such content is highly variable. There is currently a lack of systematic and multi-platform assessments of knee osteoarthritis-related information quality on these platforms.
    METHOD: This study retrieved and included 300 videos related to knee osteoarthritis from three platforms-TikTok, Bilibili, and Rednote-as the analytical sample. The basic characteristics of the videos (likes, comments, duration, etc.), the identities of the publishers (orthopedic surgeons, other medical personnel, institutions, and ordinary users), and the content types were collected and analyzed with descriptive statistics. Three standardized tools, Global Quality Scale (GQS), modified DISCERN (mDISCERN), and The Journal of the American Medical Association (JAMA), were used to independently assess the quality of the videos. We also examined the correlation between these quality scores and video features.
    RESULT: There were significant differences among the three platforms. TikTok videos had the highest user engagement but the shortest duration (median 108 s); Bilibili had the longest video duration (median 262 s), and achieved a significantly higher median GQS score 3(IQR2-4) than Rednote 2(IQR2-3) (p < 0.001); Rednote performed the best and was the most stable in JAMA scores (p < 0.01). The quality scores of videos released by medical professionals and institutions were significantly higher than those of ordinary users in all assessment tools (p < 0.01). Correlation analysis showed that video quality (GQS, mDISCERN, JAMA) was only weakly or insignificantly correlated with the popularity indicators (likes, followers, etc.), but a positive correlation trend with video duration.
    CONCLUSION: The generally suboptimal quality of knee osteoarthritis information on Chinese short-video platforms and its disconnect from popularity metrics highlight a growing public health concern. As short videos increasingly serve as a key source of health information, it is imperative to strengthen content quality oversight. Collaborative efforts among platforms, health authorities, and the public are essential to improve the reliability of online health content and support public access to accurate health information.
    Keywords:  Content completeness; Health information quality; Knee osteoarthritis; Reliability; Short videos
    DOI:  https://doi.org/10.1186/s12889-026-26455-9
  32. Medicine (Baltimore). 2026 Feb 06. 105(6): e47543
      Uremia is the end-stage of chronic kidney disease, characterized by the irreversible loss of kidney function and the systemic accumulation of metabolic waste products, which severely impairs patients' quality of life. In recent years, TikTok and Bilibili have gradually become important sources of health information for patients. This study aimed to evaluate the quality and reliability of uremia-related short videos on these 2 platforms. The top 150 uremia-related videos from the default rankings of each platform were collected, and their general characteristics and engagement metrics were recorded. The global quality scale (GQS) and the modified DISCERN (mDISCERN) instrument were used for evaluation. The Mann-Whitney U test and Kruskal-Wallis H test were applied to compare differences across groups, and Spearman correlation analysis was conducted to examine associations between video duration, engagement metrics, and quality parameters. A total of 234 videos were included (TikTok: 124; Bilibili: 110). Most videos primarily focused on treatment (28.57%), while critical information, such as epidemiology (3.57%) and diagnostic criteria (11.02%), was underrepresented. The overall median GQS score was 2.00 (interquartile range: 2.00-3.00), and the median mDISCERN score was 2.00 (interquartile range: 2.00-3.00). TikTok videos demonstrated significantly higher user engagement and higher GQS scores compared to Bilibili (median: 3 vs 2, P < .05). Videos uploaded by specialists (board-certified physicians, including nephrologists and other medical specialists) achieved the highest GQS and mDISCERN scores (P < .05). No significant correlations were found between user engagement metrics and content quality scores (P > .05). This study revealed that uremia-related short videos on TikTok and Bilibili were structurally incomplete, with generally suboptimal quality and reliability, though videos uploaded by specialists demonstrated the highest quality. In the future, platforms should strengthen content supervision, improve the structural integrity of video content, and encourage the active participation of specialists to enhance public access to high-quality medical information.
    Keywords:  health communication; information quality; short-video platforms; social media; uremia
    DOI:  https://doi.org/10.1097/MD.0000000000047543
  33. Gynecol Oncol Rep. 2026 Feb;63 102018
       Objectives: Many reproductive-aged individuals use social media platforms to gather medical advice or find community. TikTok is one of the fastest growing social media platforms and used by many reproductive-aged individuals.
    Methods: We evaluated the top 100 English-language videos on cervical cancer screening and dysplasia treatment for relevance to the hashtag and content quality and accuracy.
    Results: Among the included videos, most of the content highlighted patients' personal experiences and provided little medical educational value. Notably, the videos created by medical professionals were higher quality and more often contained accurate health information.
    Conclusions: This study highlights the need to increase content quality on TikTok to raise awareness and uptake for cervical cancer screening and treatment in reproductive-aged individuals.
    Keywords:  Cervical cancer screening; Social media
    DOI:  https://doi.org/10.1016/j.gore.2025.102018
  34. Health Commun. 2026 Feb 01. 1-16
      Generative artificial intelligence (AI) tools, such as ChatGPT, have become a convenient source of information. This study proposes and tests a model predicting intentions to use ChatGPT for health information and examines whether significant predictors differ by condition severity. The model included the original predictors of the unified theory of acceptance and use of technology (UTAUT). Guided by channel complementarity theory, which highlights source characteristics in a multisource information-seeking environment, dissatisfaction with human healthcare services and perceived credibility of ChatGPT were added to the model. Performance expectancy, social influence, and perceived credibility predicted attitudes toward using ChatGPT, which in turn predicted usage intentions, while effort expectancy was not significant. Condition severity moderated the effect of dissatisfaction with healthcare services, predicting greater intentions to use ChatGPT for mild conditions but not severe ones. This study extends UTAUT to health information seeking and discusses theoretical and practical implications for generative AI use in healthcare.
    DOI:  https://doi.org/10.1080/10410236.2026.2620497
  35. BMC Oral Health. 2026 Feb 04.
      
    Keywords:  Adolescent; Decision-making; Health information; Information seeking; Online Q&A service; Oral health literacy
    DOI:  https://doi.org/10.1186/s12903-026-07848-z
  36. Cureus. 2026 Jan;18(1): e100606
      Background Few studies have assessed the usage of online resources by vitiligo patients. Our goal was to determine what information and sources individuals with vitiligo seek out on the internet and how this information affects them. Methods A 15-question cross-sectional survey related to online resources and their impact was developed and distributed to users on the MyVitiligoTeam forum from May to June 2022. The questionnaire gathered information on participants' demographics, their engagement with online resources, and the impact of this information. Descriptive statistics were completed with Qualtrics (Provo, Utah, United States). Results There were 95 responses. The majority of respondents (n = 72, 79.12%) reported currently using online resources to talk about or learn about vitiligo, with the most frequently utilized resources being medical websites (n = 35, 55.56%), vitiligo-specific organizations (n = 30, 47.62%), and social media (n = 21, 33.33%). When asked about the most useful resources for managing vitiligo, participants prioritized vitiligo-specific websites (37.25%) and social media (17.65%). The most important reasons for seeking out information online were for seeking out medical advice from healthcare professionals or organizations, learning about the cause of vitiligo, and exploring treatment experiences. The biggest impact yielded by online resources has been learning to cope with and manage psychosocial aspects of the condition (n = 20, 37.74%). Conclusions This study reveals the depth of influence that online resources have on vitiligo patients as they cope and manage their disease, particularly the significant psychosocial benefits. The findings show the psychosocial benefits of online engagement in supporting vitiligo patients.
    Keywords:  internet; online health resources; online resources; quality of life; vitiligo
    DOI:  https://doi.org/10.7759/cureus.100606
  37. Nutr Health. 2026 Feb 06. 2601060261418143
      BackgroundWith the rise in availability of herbal supplements, there has been a similarly expanding landscape of online information about these supplements.Aims/ObjectivesThis study identifies commonly used herbal supplements, their ingredients, oxalate content, and the reliability of their online information.Methods/MethodologyA survey was administered to members of a nephrolithiasis Facebook group on their use of herbal supplements. The top 10 bestselling herbal supplements on Amazon and their common ingredients were identified. Consumer interest and online engagement with these ingredients were analyzed using Google Trends and BuzzSumo. The reliability of the top 10 articles for each ingredient was rated using the DISCERN questionnaire. Oxalate content was quantified by ion chromatography coupled with mass spectrometry.Results/FindingsThe most common ingredients in supplements were black pepper, ginger, apple cider vinegar, and turmeric. Google Trends identified apple cider vinegar, ginger, and turmeric as search terms of high interest. BuzzSumo revealed the highest article engagement and video views for apple cider vinegar. For all ingredients, average DISCERN scores for the most popular articles were in the "poor reliability" category. Turmeric-containing and standalone turmeric formulations were found to have the highest oxalate levels, with ranges of 2.69-54.8 mg/g and 15-19.5 mg/g, respectively.ConclusionsHigh consumer interest in herbal supplements combined with unreliable online information highlights the need for high-quality, evidence-based information. With popular herbal supplements containing varying amounts of oxalate, it may be useful for those providing care for kidney stone formers to familiarize themselves with popular herbal products and their lithogenic potential.
    Keywords:  DISCERN; herbal; kidney stones; misinformation; nephrolithiasis; social media
    DOI:  https://doi.org/10.1177/02601060261418143