bims-librar Biomed News
on Biomedical librarianship
Issue of 2024‒09‒29
forty-six papers selected by
Thomas Krichel, Open Library Society



  1. PNAS Nexus. 2024 Sep;3(9): pgae400
      Large language models (LLMs) are a potential substitute for human-generated data and knowledge resources. This substitution, however, can present a significant problem for the training data needed to develop future models if it leads to a reduction of human-generated content. In this work, we document a reduction in activity on Stack Overflow coinciding with the release of ChatGPT, a popular LLM. To test whether this reduction in activity is specific to the introduction of this LLM, we use counterfactuals involving similar human-generated knowledge resources that should not be affected by the introduction of ChatGPT to such extent. Within 6 months of ChatGPT's release, activity on Stack Overflow decreased by 25% relative to its Russian and Chinese counterparts, where access to ChatGPT is limited, and to similar forums for mathematics, where ChatGPT is less capable. We interpret this estimate as a lower bound of the true impact of ChatGPT on Stack Overflow. The decline is larger for posts related to the most widely used programming languages. We find no significant change in post quality, measured by peer feedback, and observe similar decreases in content creation by more and less experienced users alike. Thus, LLMs are not only displacing duplicate, low-quality, or beginner-level content. Our findings suggest that the rapid adoption of LLMs reduces the production of public data needed to train them, with significant consequences.
    Keywords:  AI; ChatGPT; online public goods; web
    DOI:  https://doi.org/10.1093/pnasnexus/pgae400
  2. J Med Libr Assoc. 2024 Jul 01. 112(3): 225-237
      Objective: In this paper we report how the United Kingdom's National Institute for Health and Care Excellence (NICE) search filters for treating and managing COVID-19 were validated for use in MEDLINE (Ovid) and Embase (Ovid). The objective was to achieve at least 98.9% for recall and 64% for precision.Methods: We did two tests of recall to finalize the draft search filters. We updated the data from an earlier peer-reviewed publication for the first recall test. For the second test, we collated a set of systematic reviews from Epistemonikos COVID-19 L.OVE and extracted their primary studies. We calculated precision by screening all the results retrieved by the draft search filters from a targeted sample covering 2020-23. We developed a gold-standard set to validate the search filter by using all articles available from the "Treatment and Management" subject filter in the Cochrane COVID-19 Study Register.
    Results: In the first recall test, both filters had 99.5% recall. In the second test, recall was 99.7% and 99.8% in MEDLINE and Embase respectively. Precision was 91.1% in a deduplicated sample of records. In validation, we found the MEDLINE filter had recall of 99.86% of the 14,625 records in the gold-standard set. The Embase filter had 99.88% recall of 19,371 records.
    Conclusion: We have validated search filters to identify records on treating and managing COVID-19. The filters may require subsequent updates, if new SARS-CoV-2 variants of concern or interest are discussed in future literature.
    Keywords:  COVID-19; Embase; MEDLINE; Search filters; Systematic literature review
    DOI:  https://doi.org/10.5195/jmla.2024.1806
  3. J Med Libr Assoc. 2024 Jul 01. 112(3): 214-224
      Objective: To understand the performance of EndNote 20 and Zotero 6's full text retrieval features.Methods: Using the University of York's subscriptions, we tested and compared EndNote and Zotero's full text retrieval. 1,000 records from four evidence synthesis projects were tested for the number of: full texts retrieved; available full texts retrieved; unique full texts (found by one program only); and differences in versions of full texts for the same record. We also tested the time taken and accuracy of retrieved full texts. One dataset was tested multiple times to confirm if the number of full texts retrieved was consistent. We also investigated the available full texts missed by EndNote or Zotero by: reference type; whether full texts were available open access or via subscription; and the content provider.
    Results: EndNote retrieved 47% of available full texts versus 52% by Zotero. Zotero was faster by 2 minutes 15 seconds. Each program found unique full texts. There were differences in full text versions retrieved between programs. For both programs, 99% of the retrieved full texts were accurate. Zotero was less consistent in the number of full texts it retrieved.
    Conclusion: EndNote and Zotero do not find all available full texts. Users should not assume full texts are correct; are the version of record; or that records without full texts cannot be retrieved manually. Repeating the full text retrieval process multiple times could yield additional full texts. Users with access to EndNote and Zotero could use both for full text retrieval.
    Keywords:  Full text retrieval; endnote; find available PDF; find full texts; zotero
    DOI:  https://doi.org/10.5195/jmla.2024.1880
  4. J Med Libr Assoc. 2024 Jul 01. 112(3): 238-249
      Objective: There is little research available regarding the instructional practices of librarians who support students completing knowledge synthesis projects. This study addresses this research gap by identifying the topics taught, approaches, and resources that academic health sciences librarians employ when teaching students how to conduct comprehensive searches for knowledge synthesis projects in group settings.Methods: This study applies an exploratory-descriptive design using online survey data collection. The final survey instrument included 31 open, closed, and frequency-style questions.
    Results: The survey received responses from 114 participants, 74 of whom met the target population. Some key results include shared motivations to teach in groups, including student learning and curriculum requirements, as well as popular types of instruction such as single session seminars, and teaching techniques, such as lectures and live demos.
    Conclusion: This research demonstrates the scope and coverage of librarian-led training in the knowledge synthesis research landscape. Although searching related topics such as Boolean logic were the most frequent, librarians report teaching throughout the review process like methods and reporting. Live demos and lectures were the most reported approaches to teaching, whereas gamification or student-driven learning were used rarely. Our results suggest that librarian's application of formal pedagogical approaches while teaching knowledge synthesis may be under-utilized, as most respondents did not report using any formal instructional framework.
    Keywords:  Evidence Synthesis; Literature Searching; teaching strategies
    DOI:  https://doi.org/10.5195/jmla.2024.1870
  5. Am J Pharm Educ. 2024 Sep 20. pii: S0002-9459(24)11010-8. [Epub ahead of print] 101291
      OBJECTIVE: To determine whether gamifying librarian-led literature searching instruction improved student performance on an authentic literature searching assessment. Secondary objectives included determination of effect on email requests for assistance and student confidence in literature searching abilities.METHODS: Literature searching in PubMed is taught by a librarian to first-year pharmacy students in a drug information course over a two-week period. The librarian chose to implement two game-based learning activities in the live lecture sessions: a crossword puzzle and an escape room. To increase engagement, students were encouraged to work collaboratively as a team during class. To evaluate the impact of incorporating gamification into literature searching instruction, the authors evaluated student grades on a literature searching assignment, reviewed the number of emails received asking for assistance, and evaluated student confidence in literature searching.
    RESULTS: Students scored higher on their literature searching assignment after the implementation of game-based instruction. The average grade on this assignment in 2022 was 90.1% compared with 2021, when the average was 79.9% (P=.0016). The average of 90.6% in 2023 also showed statistically significant improvement in comparison with 2021 (p=.002). Email requests decreased and student confidence increased when comparing 2021 outcomes to those in 2022 and 2023.
    CONCLUSION: Overall, the gamification of literature searching instruction in this course appears to have increased student assignment scores and was well-received by students.
    Keywords:  Gamification; Pharmacy Students; drug information; literature searching
    DOI:  https://doi.org/10.1016/j.ajpe.2024.101291
  6. J Med Libr Assoc. 2024 Jul 01. 112(3): 275-280
      Background: Involving librarians as team members can lead to better quality in reviews. To improve their search results, an international diabetes project involved two medical librarians in a large-scale project planning of a series of systematic reviews for clinical guidelines in diabetes precision medicine.Case Presentation: The precision diabetes project was divided into teams. Four diabetes mellitus types (type 1, type 2, gestational, and monogenic) were divided into teams focusing on diagnostics, prevention, treatment, or prognostics. A search consultation plan was set up for the project to help organize the work. We performed searches in Embase and PubMed for 14 teams, building complex searches that involved non-traditional search strategies. Our search strategies generated very large amounts of records that created challenges in balancing sensitivity with precision. We also performed overlap searches for type 1 and type 2 diabetes search strategies; and assisted in setting up reviews in the Covidence tool for screening.
    Conclusions: This project gave us opportunities to test methods we had not used before, such as overlap comparisons between whole search strategies. It also gave us insights into the complexity of performing a search balancing sensitivity and specificity and highlights the need for a clearly defined communication plan for extensive evidence synthesis projects.
    Keywords:  Systematic review methodology; online collaboration; project management; role of information specialist; search strategy development; teamwork
    DOI:  https://doi.org/10.5195/jmla.2024.1863
  7. J Med Libr Assoc. 2024 Jul 01. 112(3): 261-274
      Objective: To determine if librarian collaboration was associated with improved database search quality, search reproducibility, and systematic review reporting in otolaryngology systematic reviews and meta-analyses.Methods: In this retrospective cross-sectional study, PubMed was queried for systematic reviews and meta-analyses published in otolaryngology journals in 2010, 2015, and 2021. Two researchers independently extracted data. Two librarians independently rated search strategy reproducibility and quality for each article. The main outcomes include association of librarian involvement with study reporting quality, search quality, and publication metrics in otolaryngology systematic reviews and meta-analyses. Categorical data were compared with Chi-Squared tests or Fisher's Exact tests. Continuous variables were compared via Mann Whitney U Tests for two groups, and Kruskal-Wallis Tests for three or more groups.
    Results: Of 559 articles retrieved, 505 were analyzed. More studies indicated librarian involvement in 2021 (n=72, 20.7%) compared to 2015 (n=14, 10.4%) and 2010 (n=2, 9.0%) (p=0.04). 2021 studies showed improvements in properly using a reporting tool (p<0.001), number of databases queried (p<0.001), describing date of database searches (p<0.001), and including a flow diagram (p<0.001). Librarian involvement was associated with using reporting tools (p<0.001), increased number of databases queried (p<0.001), describing date of database search (p=0.002), mentioning search peer reviewer (p=0.02), and reproducibility of search strategies (p<0.001). For search strategy quality, librarian involvement was associated with greater use of "Boolean & proximity operators" (p=0.004), "subject headings" (p<0.001), "text word searching" (p<0.001), and "spelling/syntax/line numbers" (p<0.001). Studies with librarian involvement were associated with publication in journals with higher impact factors for 2015 (p=0.003) and 2021 (p<0.001).
    Conclusion: Librarian involvement was associated with improved reporting quality and search strategy quality. Our study supports the inclusion of librarians in review teams, and journal editing and peer reviewing teams.
    Keywords:  Systematic reviews; librarians; meta-analyses; otolaryngology; reproducibility
    DOI:  https://doi.org/10.5195/jmla.2024.1774
  8. JAMIA Open. 2024 Oct;7(3): ooae098
      Objectives: Development of search queries for systematic reviews (SRs) is time-consuming. In this work, we capitalize on recent advances in large language models (LLMs) and a relatively large dataset of natural language descriptions of reviews and corresponding Boolean searches to generate Boolean search queries from SR titles and key questions.Materials and Methods: We curated a training dataset of 10 346 SR search queries registered in PROSPERO. We used this dataset to fine-tune a set of models to generate search queries based on Mistral-Instruct-7b. We evaluated the models quantitatively using an evaluation dataset of 57 SRs and qualitatively through semi-structured interviews with 8 experienced medical librarians.
    Results: The model-generated search queries had median sensitivity of 85% (interquartile range [IQR] 40%-100%) and number needed to read of 1206 citations (IQR 205-5810). The interviews suggested that the models lack both the necessary sensitivity and precision to be used without scrutiny but could be useful for topic scoping or as initial queries to be refined.
    Discussion: Future research should focus on improving the dataset with more high-quality search queries, assessing whether fine-tuning the model on other fields, such as the population and intervention, improves performance, and exploring the addition of interactivity to the interface.
    Conclusions: The datasets developed for this project can be used to train and evaluate LLMs that map review descriptions to Boolean search queries. The models cannot replace thoughtful search query design but may be useful in providing suggestions for key words and the framework for the query.
    Keywords:  artificial intelligence; systematic reviews as topic/methods
    DOI:  https://doi.org/10.1093/jamiaopen/ooae098
  9. J Med Libr Assoc. 2024 Jul 01. 112(3): 281-285
      There is a 17-year gap between the publication of research which proves an intervention is efficacious and effective and the implementation of that same intervention into practice [1]. In behavioral health, only 14% of successful interventions are integrated into actual practice [2]. As such, Implementation Science is envisioned to address the research to practice gap. This research methodology becomes important as it looks to investigate how to get interventions to become embedded in practice and de-implement unproven or disproven interventions that may be harmful and/or ineffective for patients. The aim of this commentary is to raise awareness of health sciences librarians/information specialists about this research arena and encourage health sciences librarians to envision how they could be involved in implementation science projects and teams or even use implementation science in their practice.
    Keywords:  Implementation Science
    DOI:  https://doi.org/10.5195/jmla.2024.1919
  10. PLOS Digit Health. 2024 Sep;3(9): e0000299
      Given the suboptimal performance of Boolean searching to identify methodologically sound and clinically relevant studies in large bibliographic databases, exploring machine learning (ML) to efficiently classify studies is warranted. To boost the efficiency of a literature surveillance program, we used a large internationally recognized dataset of articles tagged for methodological rigor and applied an automated ML approach to train and test binary classification models to predict the probability of clinical research articles being of high methodologic quality. We trained over 12,000 models on a dataset of titles and abstracts of 97,805 articles indexed in PubMed from 2012-2018 which were manually appraised for rigor by highly trained research associates and rated for clinical relevancy by practicing clinicians. As the dataset is unbalanced, with more articles that do not meet the criteria for rigor, we used the unbalanced dataset and over- and under-sampled datasets. Models that maintained sensitivity for high rigor at 99% and maximized specificity were selected and tested in a retrospective set of 30,424 articles from 2020 and validated prospectively in a blinded study of 5253 articles. The final selected algorithm, combining a LightGBM (gradient boosting machine) model trained in each dataset, maintained high sensitivity and achieved 57% specificity in the retrospective validation test and 53% in the prospective study. The number of articles needed to read to find one that met appraisal criteria was 3.68 (95% CI 3.52 to 3.85) in the prospective study, compared with 4.63 (95% CI 4.50 to 4.77) when relying only on Boolean searching. Gradient-boosting ML models reduced the work required to classify high quality clinical research studies by 45%, improving the efficiency of literature surveillance and subsequent dissemination to clinicians and other evidence users.
    DOI:  https://doi.org/10.1371/journal.pdig.0000299
  11. Cancer Control. 2024 Jan-Dec;31:31 10732748241286688
      This study explored the application of meta-analysis and convolutional neural network-natural language processing (CNN-NLP) technologies in classifying literature concerning radiotherapy for head and neck cancer. It aims to enhance both the efficiency and accuracy of literature reviews. By integrating statistical analysis with deep learning, this research successfully identified key studies related to the probability of normal tissue complications (NTCP) from a vast corpus of literature. This demonstrates the advantages of these technologies in recognizing professional terminology and extracting relevant information. The findings not only improve the quality of literature reviews but also offer new insights for future research on optimizing medical studies through AI technologies. Despite the challenges related to data quality and model generalization, this work provides clear directions for future research.
    Keywords:  convolutional neural networks; medical literature classification; meta-analysis; natural language processing; normal tissue complication probability
    DOI:  https://doi.org/10.1177/10732748241286688
  12. PLoS One. 2024 ;19(9): e0303005
      Preprints provide an indispensable tool for rapid and open communication of early research findings. Preprints can also be revised and improved based on scientific commentary uncoupled from journal-organised peer review. The uptake of preprints in the life sciences has increased significantly in recent years, especially during the COVID-19 pandemic, when immediate access to research findings became crucial to address the global health emergency. With ongoing expansion of new preprint servers, improving discoverability of preprints is a necessary step to facilitate wider sharing of the science reported in preprints. To address the challenges of preprint visibility and reuse, Europe PMC, an open database of life science literature, began indexing preprint abstracts and metadata from several platforms in July 2018. Since then, Europe PMC has continued to increase coverage through addition of new servers, and expanded its preprint initiative to include the full text of preprints related to COVID-19 in July 2020 and then the full text of preprints supported by the Europe PMC funder consortium in April 2022. The preprint collection can be searched via the website and programmatically, with abstracts and the open access full text of COVID-19 and Europe PMC funder preprint subsets available for bulk download in a standard machine-readable JATS XML format. This enables automated information extraction for large-scale analyses of the preprint corpus, accelerating scientific research of the preprint literature itself. This publication describes steps taken to build trust, improve discoverability, and support reuse of life science preprints in Europe PMC. Here we discuss the benefits of indexing preprints alongside peer-reviewed publications, and challenges associated with this process.
    DOI:  https://doi.org/10.1371/journal.pone.0303005
  13. J Med Libr Assoc. 2024 Jul 01. 112(3): 250-260
      Objective: The objective of this study was to evaluate the discoverability of supporting research materials, including supporting documents, individual participant data (IPD), and associated publications, in US federally funded COVID-19 clinical study records in ClinicalTrials.gov (CTG).Methods: Study registration records were evaluated for (1) links to supporting documents, including protocols, informed consent forms, and statistical analysis plans; (2) information on how unaffiliated researchers may access IPD and, when applicable, the linking of the IPD record back to the CTG record; and (3) links to associated publications and, when applicable, the linking of the publication record back to the CTG record.
    Results: 206 CTG study records were included in the analysis. Few records shared supporting documents, with only 4% of records sharing all 3 document types. 27% of records indicated they intended to share IPD, with 45% of these providing sufficient information to request access to the IPD. Only 1 dataset record was located, which linked back to its corresponding CTG record. The majority of CTG records did not have links to publications (61%), and only 21% linked out to at least 1 results publication. All publication records linked back to their corresponding CTG records.
    Conclusion: With only 4% of records sharing all supporting document types, 12% sufficient information to access IPD, and 21% results publications, improvements can be made to the discoverability of research materials in federally funded, COVID-19 CTG records. Sharing these materials on CTG can increase their discoverability, therefore increasing the validity, transparency, and reusability of clinical research.
    Keywords:  COVID-19; Clinical studies; Data sharing; clinicaltrials.gov; discoverability; research transparency
    DOI:  https://doi.org/10.5195/jmla.2024.1799
  14. J Surg Res. 2024 Sep 19. pii: S0022-4804(24)00523-7. [Epub ahead of print]303 89-94
      INTRODUCTION: Online patient educational materials (OPEMs) help patients engage in their health care. The American Medical Association (AMA) recommends OPEM be written at or below the 6th grade reading level. This study assessed the readability of deep venous thrombosis OPEM in English and Spanish.METHODS: Google searches were conducted in English and Spanish using "deep venous thrombosis" and "trombosis venosa profunda," respectively. The top 25 patient-facing results were recorded for each, and categorized into source type (hospital, professional society, other). Readability of English OPEM was measured using several scales including the Flesch Reading Ease Readability Formula and Flesch-Kincaid Grade Level. Readability of Spanish OPEM was measured using the Fernández-Huerta Index and INFLESZ Scale. Readability was compared to the AMA recommendation, between languages, and across source types.
    RESULTS: Only one (4%) Spanish OPEM was written at an easy level, compared to 7 (28%) English OPEM (P = 0.04). More English (28%) OPEM were easy to read compared to Spanish (4%), with a significant difference in reading difficulty breakdown between languages (P = 0.04). The average readability scores for English and Spanish OPEM across all scales were significantly greater than the recommended level (P < 0.01). Only four total articles (8%) met the AMA recommendation, with no significant difference between English and Spanish OPEM (P = 0.61).
    CONCLUSIONS: Nearly all English and Spanish deep venous thrombosis OPEM analyzed were above the recommended reading level. English resources had overall easier readability compared to Spanish, which may represent a barrier to care. To limit health disparities, information should be presented at accessible reading levels.
    Keywords:  DVT; Deep venous thrombosis; Educational materials; Online patient education; Readability
    DOI:  https://doi.org/10.1016/j.jss.2024.08.013
  15. Monogr Oral Sci. 2024 ;32 295-312
      The Internet's increasing prevalence, along with the user-friendly nature of smartphones and the ease of access to virtual spaces, creates a vast and practical domain for digital communication. In this context, obtaining online information plays a crucial role in promoting health and preventing disease, facilitating individual and collaborative decision-making between patients and dental professionals. Digital information resources play a crucial role in providing guidance, support, and knowledge to the public and health care experts on molar incisor hypomineralisation (MIH). This chapter explores various dimensions related to MIH digital information, including a diverse array of digital platforms and the multifaceted landscape of health information-seeking behaviors. This chapter emphasizes the importance of accurate and reliable information dissemination in the digital era. It also sheds light on how understanding the dynamics of digital communication and health information-seeking behavior can improve accessibility and information quality for individuals facing the challenges of MIH.
    DOI:  https://doi.org/10.1159/000538891
  16. Ann Surg Open. 2024 Sep;5(3): e465
      Objective: To assess the accuracy, quality, and readability of patient-focused breast cancer websites using expert evaluation and validated tools.Background: Ensuring access to accurate, high-quality, and readable online health information supports informed decision-making and health equity but has not been recently evaluated.
    Methods: A qualitative analysis on 50 websites was conducted; the first 10 eligible websites for the following search terms were included: "breast cancer," "breast surgery," "breast reconstructive surgery," "breast chemotherapy," and "breast radiation therapy." Websites were required to be in English and not intended for healthcare professionals. Accuracy was evaluated by 5 breast cancer specialists. Quality was evaluated through the DISCERN questionnaire. Readability was measured using 9 standardized tests. Mean readability was compared with the American Medical Association and National Institutes of Health 6th grade recommendation.
    Results: Nonprofit hospital websites had the highest accuracy (mean = 4.06, SD = 0.42); however, no statistical differences were observed in accuracy by website affiliation (P = 0.08). The overall mean quality score was 50.8 ("fair"/"good" quality) with no significant differences among website affiliations (P = 0.10). Mean readability was at the 10th grade reading level, the lowest being for commercial websites with a mean 9th grade reading level (SD = 2.38). All websites exceeded the American Medical Association- and National Institutes of Health-recommended reading level by 4.4 levels (P < 0.001). Websites with higher accuracy tended to have lower readability levels, whereas those with lower accuracy had higher readability levels.
    Conclusion: As breast cancer treatment has become increasingly complex, improving online quality and readability while maintaining high accuracy is essential to promote health equity and empower patients to make informed decisions about their care.
    Keywords:  breast; breast cancer; breast surgery; breast surgical oncology; internet; online information; readability
    DOI:  https://doi.org/10.1097/AS9.0000000000000465
  17. Eur Urol Open Sci. 2024 Nov;69 80-88
      Background and objective: Artificial intelligence (AI)-powered conversational agents are increasingly finding application in health care, as these can provide patient education at any time. However, their effectiveness in medical settings remains largely unexplored. This study aimed to assess the impact of the chatbot "PROState cancer Conversational Agent" (PROSCA), which was trained to provide validated support from diagnostic tests to treatment options for men facing prostate cancer (PC) diagnosis.Methods: The chatbot PROSCA, developed by urologists at Heidelberg University Hospital and SAP SE, was evaluated through a randomized controlled trial (RCT). Patients were assigned to either the chatbot group, receiving additional access to PROSCA alongside standard information by urologists, or the control group (1:1), receiving standard information. A total of 112 men were included, of whom 103 gave feedback at study completion.
    Key findings and limitations: Over time, patients' information needs decreased significantly more in the chatbot group than in the control group (p = 0.035). In the chatbot group, 43/54 men (79.6%) used PROSCA, and all of them found it easy to use. Of the men, 71.4% agreed that the chatbot improved their informedness about PC and 90.7% would like to use PROSCA again. Limitations are study sample size, single-center design, and specific clinical application.
    Conclusions and clinical implications: With the introduction of the PROSCA chatbot, we created and evaluated an innovative, evidence-based AI health information tool as an additional source of information for PC. Our RCT results showed significant benefits of the chatbot in reducing patients' information needs and enhancing their understanding of PC. This easy-to-use AI tool provides accurate, timely, and accessible support, demonstrating its value in the PC diagnosis process. Future steps include further customization of the chatbot's responses and integration with the existing health care systems to maximize its impact on patient outcomes.
    Patient summary: This study evaluated an artificial intelligence-powered chatbot-PROSCA, a digital tool designed to support men facing prostate cancer diagnosis by providing validated information from diagnosis to treatment. Results showed that patients who used the chatbot as an additional tool felt better informed than those who received standard information from urologists. The majority of users appreciated the ease of use of the chatbot and expressed a desire to use it again; this suggests that PROSCA could be a valuable resource to improve patient understanding in prostate cancer diagnosis.
    Keywords:  Artificial intelligence; Chatbot; Early detection; Large language model; Natural language processing; Prostate cancer; Randomized controlled trial
    DOI:  https://doi.org/10.1016/j.euros.2024.08.022
  18. Nat Med. 2024 Sep 23.
      Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases and EquityMedQA, a collection of seven datasets enriched for adversarial queries. Both our human assessment framework and our dataset design process are grounded in an iterative participatory approach and review of Med-PaLM 2 answers. Through our empirical study, we find that our approach surfaces biases that may be missed by narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. While our approach is not sufficient to holistically assess whether the deployment of an artificial intelligence (AI) system promotes equitable health outcomes, we hope that it can be leveraged and built upon toward a shared goal of LLMs that promote accessible and equitable healthcare.
    DOI:  https://doi.org/10.1038/s41591-024-03258-2
  19. Sleep Health. 2024 Sep 21. pii: S2352-7218(24)00187-6. [Epub ahead of print]
      BACKGROUND: Many individuals use the Internet, including generative artificial intelligence like ChatGPT, for sleep-related information before consulting medical professionals. This study compared responses from sleep disorder specialists and ChatGPT to common sleep queries, with experts and laypersons evaluating the responses' accuracy and clarity.METHODS: We assessed responses from sleep medicine specialists and ChatGPT-4 to 140 sleep-related questions from the Korean Sleep Research Society's website. In a blinded study design, sleep disorder experts and laypersons rated the medical helpfulness, emotional supportiveness, and sentence comprehensibility of the responses on a 1-5 scale.
    RESULTS: Laypersons rated ChatGPT higher for medical helpfulness (3.79 ± 0.90 vs. 3.44 ± 0.99, p < .001), emotional supportiveness (3.48 ± 0.79 vs. 3.12 ± 0.98, p < .001), and sentence comprehensibility (4.24 ± 0.79 vs. 4.14 ± 0.96, p = .028). Experts also rated ChatGPT higher for emotional supportiveness (3.33 ± 0.62 vs. 3.01 ± 0.67, p < .001) but preferred specialists' responses for sentence comprehensibility (4.15 ± 0.74 vs. 3.94 ± 0.90, p < .001). When it comes to medical helpfulness, the experts rated the specialists' answers slightly higher than the laypersons did (3.70 ± 0.84 vs. 3.63 ± 0.87, p = .109). Experts slightly preferred specialist responses overall (56.0%), while laypersons favored ChatGPT (54.3%; p < .001). ChatGPT's responses were significantly longer (186.76 ± 39.04 vs. 113.16 ± 95.77 words, p < .001).
    DISCUSSION: Generative artificial intelligence like ChatGPT may help disseminate sleep-related medical information online. Laypersons appear to prefer ChatGPT's detailed, emotionally supportive responses over those from sleep disorder specialists.
    Keywords:  Artificial intelligence; Health information seeking behavior; Medical informatics; Patient education as topic; Sleep disorders
    DOI:  https://doi.org/10.1016/j.sleh.2024.08.011
  20. J Pediatr Surg. 2024 Sep 05. pii: S0022-3468(24)00796-6. [Epub ahead of print] 161894
      BACKGROUND: ChatGPT has demonstrated notable capabilities and has gained popularity in various medical tasks, including patient education. This study evaluates the content and readability of ChatGPT's responses to parents' questions about congenital anomalies.METHODS: Information on four congenital anomalies (congenital diaphragmatic hernia, esophageal atresia and tracheoesophageal fistula, anorectal malformation, and gastroschisis) was assessed. Seven questions frequently asked by parents were posed for each anomaly, and responses generated by GPT-4 were compared to online information sheets from three top pediatric medical centers. Two senior pediatric surgeons, blinded to the source, evaluated the answers based on accuracy, comprehensiveness, and conciseness. Reading time and readability of the answers were also assessed.
    RESULTS: ChatGPT answered all 28 questions, while online information sheets varied in completeness. ChatGPT's responses were rated significantly higher regarding full accuracy, comprehensiveness, and conciseness compared to the online information sheets (p < 0.00001, <0.00001, 0.0002, respectively). Despite having longer reading times and more challenging to read, ChatGPT's responses were more precise and detailed.
    CONCLUSIONS: ChatGPT outperforms online information sheets in providing accurate, comprehensive, and concise answers about congenital anomalies. This positions ChatGPT as a beneficial supplementary resource in pediatric healthcare. Future research should explore real-world applications and usability among parents.
    LEVEL OF EVIDENCE: Level III.
    Keywords:  Artificial intelligence; ChatGPT; Congenital anomalies; Patient education
    DOI:  https://doi.org/10.1016/j.jpedsurg.2024.161894
  21. J Am Acad Orthop Surg. 2024 Sep 20.
      INTRODUCTION: Patients have long turned to the Internet for answers to common medical questions. As the ability to access information evolves beyond standard search engines, patients with adolescent idiopathic scoliosis (AIS) and their parents may use artificial intelligence chatbots such as ChatGPT as a new source of information.METHODS: Ten frequently asked questions regarding AIS were posed to ChatGPT. The accuracy and adequacy of the responses were graded as excellent not requiring clarification, satisfactory requiring minimal clarification, satisfactory requiring moderate clarification, and unsatisfactory requiring substantial clarification.
    RESULTS: ChatGPT gave one response that was excellent not requiring clarification, four responses that were satisfactory requiring minimal clarification, three responses that were satisfactory requiring moderate clarification, and two responses that were unsatisfactory requiring substantial clarification, with information about higher level, more complex areas of discussion such as surgical options being less accurate.
    CONCLUSION: ChatGPT provides answers to FAQs about AIS that were generally accurate, although correction was needed on specific surgical treatments. Patients may be at risk of developing a Dunning-Kruger effect by proxy from the superficial and sometimes inaccurate information provided by ChatGPT on more complex aspects of AIS.
    DOI:  https://doi.org/10.5435/JAAOS-D-24-00297
  22. Clin Ophthalmol. 2024 ;18 2647-2655
      Purpose: To compare the accuracy and readability of responses to oculoplastics patient questions provided by Google and ChatGPT. Additionally, to assess the ability of ChatGPT to create customized patient education materials.Methods: We executed a Google search to identify the 3 most frequently asked patient questions (FAQs) related to 10 oculoplastics conditions. FAQs were entered into both the Google search engine and the ChatGPT tool and responses were recorded. Responses were graded for readability using five validated readability indices and for accuracy by six oculoplastics surgeons. ChatGPT was instructed to create patient education materials at various reading levels for 8 oculoplastics procedures. The accuracy and readability of ChatGPT-generated procedural explanations were assessed.
    Results: ChatGPT responses to patient FAQs were written at a significantly higher average grade level than Google responses (grade 15.6 vs 10.0, p < 0.001). ChatGPT responses (93% accuracy) were significantly more accurate (p < 0.001) than Google responses (78% accuracy) and were preferred by expert panelists (79%). ChatGPT accurately explained oculoplastics procedures at an above average reading level. When instructed to rewrite patient education materials at a lower reading level, grade level was reduced by approximately 4 (15.7 vs 11.7, respectively, p < 0.001) without sacrificing accuracy.
    Conclusion: ChatGPT has the potential to provide patients with accurate information regarding their oculoplastics conditions. ChatGPT may also be utilized by oculoplastic surgeons as an accurate tool to provide customizable patient education for patients with varying health literacy. A better understanding of oculoplastics conditions and procedures amongst patients can lead to informed eye care decisions.
    Keywords:  ChatGPT; accuracy; google; oculoplastics; readability
    DOI:  https://doi.org/10.2147/OPTH.S480222
  23. Shoulder Elbow. 2024 Jul;16(4): 407-412
      Background: The rising prominence of artificial intelligence in healthcare has revolutionized patient access to medical information. This cross-sectional study sought to assess if ChatGPT could satisfactorily address common patient questions about total shoulder arthroplasty (TSA).Methods: Ten commonly encountered questions in TSA practice were selected and posed to ChatGPT. Each response was assessed for accuracy and clarity using the Mika et al. scoring system, which ranges from "excellent response not requiring clarification" to "unsatisfactory response requiring substantial clarification," and a modified DISCERN score. The readability was further evaluated using the Flesch Reading Ease Score and the Flesch-Kincaid Grade Level.
    Results: The mean Mika et al. score was 2.93, corresponding to an overall subjective rating of "satisfactory but requiring moderate clarification." The mean DISCERN score was 46.60, which is considered "fair." The readability analysis suggested that the responses were at a college-graduate level, higher than the recommended level for patient educational materials.
    Discussion: Our results suggest that ChatGPT has the potential to supplement the collaborative decision-making process between patients and experienced orthopedic surgeons for TSA-related inquiries. Ultimately, while tools like ChatGPT can enhance traditional patient education methods, they should not replace direct consultations with medical professionals.
    Keywords:  ChatGPT; anatomic; artificial intelligence; frequently asked questions; reverse; total shoulder arthroplasty
    DOI:  https://doi.org/10.1177/17585732241246560
  24. Shoulder Elbow. 2024 Jul;16(4): 429-435
      Background: Artificial intelligence (AI) has progressed at a fast pace. ChatGPT, a rapidly expanding AI platform, has several growing applications in medicine and patient care. However, its ability to provide high-quality answers to patient questions about orthopedic procedures such as Tommy John surgery is unknown. Our objective is to evaluate the quality of information provided by ChatGPT 3.5 and 4.0 in response to patient questions regarding Tommy John surgery.Methods: Twenty-five patient questions regarding Tommy John surgery were posed to ChatGPT 3.5 and 4.0. Readability was assessed via Flesch Kincaid Reading Ease, Flesh Kinkaid Grade Level, Gunning Fog Score, Simple Measure of Gobbledygook, Coleman Liau, and Automated Readability Index. The quality of each response was graded using a 5-point Likert scale.
    Results: ChatGPT generated information at an educational level that greatly exceeds the recommended level. ChatGPT 4.0 produced slightly better responses to common questions regarding Tommy John surgery with fewer inaccuracies than ChatGPT 3.5.
    Conclusion: Although ChatGPT can provide accurate information regarding Tommy John surgery, its responses may not be easily comprehended by the average patient. As AI platforms become more accessible to the public, patients must be aware of their limitations.
    Keywords:  Tommy John; artificial intelligence; patient education
    DOI:  https://doi.org/10.1177/17585732241259754
  25. J Thorac Cardiovasc Surg. 2024 Sep 24. pii: S0022-5223(24)00837-7. [Epub ahead of print]
      OBJECTIVE: Chat-based artificial intelligence (AI) programs like ChatGPT are re-imagining how patients seek information. This study aims to evaluate the quality and accuracy of ChatGPT-generated answers to common patient questions about lung cancer surgery.METHODS: A 30-question survey of patient questions about lung cancer surgery was posed to ChatGPT in July 2023. The ChatGPT-generated responses were presented to nine thoracic surgeons at four academic institutions who rated the quality of the answer on a 5-point Likert scale. They also evaluated if the response contained any inaccuracies and were prompted to submit free text comments. Responses were analyzed in aggregate.
    RESULTS: For ChatGPT-generated answers, the average quality ranged from 3.1-4.2 out of 5.0, indicating they were generally "good" or "very good". No answer received a unanimous 1-star (poor quality) or 5-star (excellent quality) score. Minor inaccuracies were found by at least one surgeon in 100% of the answers, and major inaccuracies were found in 36.6%. Regarding ChatGPT, 66.7% of surgeons felt it was an accurate source of information for patients. However, only 55.6% felt they were comparable to answers given by experienced thoracic surgeons, and only 44.4% would recommend it to their patients. Common criticisms of ChatGPT-generated answers included lengthiness, lack of specificity regarding surgical care, and lack of references.
    CONCLUSIONS: Chat-based AI programs have potential to become a useful information tool for lung cancer surgery patients. However, the quality and accuracy of ChatGPT-generated answers need improvement before thoracic surgeons could consider this method as a primary education source for patients.
    Keywords:  Education; artificial intelligence; lung cancer; perioperative care
    DOI:  https://doi.org/10.1016/j.jtcvs.2024.09.030
  26. Clin Orthop Relat Res. 2024 Sep 25.
      BACKGROUND: Patients and caregivers may experience immense distress when receiving the diagnosis of a primary musculoskeletal malignancy and subsequently turn to internet resources for more information. It is not clear whether these resources, including Google and ChatGPT, offer patients information that is readable, a measure of how easy text is to understand. Since many patients turn to Google and artificial intelligence resources for healthcare information, we thought it was important to ascertain whether the information they find is readable and easy to understand. The objective of this study was to compare readability of Google search results and ChatGPT answers to frequently asked questions and assess whether these sources meet NIH recommendations for readability.QUESTIONS/PURPOSES: (1) What is the readability of ChatGPT-3.5 as a source of patient information for the three most common primary bone malignancies compared with top online resources from Google search? (2) Do ChatGPT-3.5 responses and online resources meet NIH readability guidelines for patient education materials?
    METHODS: This was a cross-sectional analysis of the 12 most common online questions about osteosarcoma, chondrosarcoma, and Ewing sarcoma. To be consistent with other studies of similar design that utilized national society frequently asked questions lists, questions were selected from the American Cancer Society and categorized based on content, including diagnosis, treatment, and recovery and prognosis. Google was queried using all 36 questions, and top responses were recorded. Author types, such as hospital systems, national health organizations, or independent researchers, were recorded. ChatGPT-3.5 was provided each question in independent queries without further prompting. Responses were assessed with validated reading indices to determine readability by grade level. An independent t-test was performed with significance set at p < 0.05.
    RESULTS: Google (n = 36) and ChatGPT-3.5 (n = 36) answers were recorded, 12 for each of the three cancer types. Reading grade levels based on mean readability scores were 11.0 ± 2.9 and 16.1 ± 3.6, respectively. This corresponds to the eleventh grade reading level for Google and a fourth-year undergraduate student level for ChatGPT-3.5. Google answers were more readable across all individual indices, without differences in word count. No difference in readability was present across author type, question category, or cancer type. Of 72 total responses across both search modalities, none met NIH readability criteria at the sixth-grade level.
    CONCLUSION: Google material was presented at a high school reading level, whereas ChatGPT-3.5 was at an undergraduate reading level. The readability of both resources was inadequate based on NIH recommendations. Improving readability is crucial for better patient understanding during cancer treatment. Physicians should assess patients' needs, offer them tailored materials, and guide them to reliable resources to prevent reliance on online information that is hard to understand.
    LEVEL OF EVIDENCE: Level III, prognostic study.
    DOI:  https://doi.org/10.1097/CORR.0000000000003263
  27. J Allergy Clin Immunol Glob. 2024 Nov;3(4): 100330
      Background: This study assessed the reliability of ChatGPT as a source of information on asthma, given the increasing use of artificial intelligence-driven models for medical information. Prior concerns about misinformation on atopic diseases in various digital platforms underline the importance of this evaluation.Objective: We aimed to evaluate the scientific reliability of ChatGPT as a source of information on asthma.
    Methods: The study involved analyzing ChatGPT's responses to 26 asthma-related questions, each followed by a follow-up question. These encompassed definition/risk factors, diagnosis, treatment, lifestyle factors, and specific clinical inquiries. Medical professionals specialized in allergic and respiratory diseases independently assessed the responses using a 1-to-5 accuracy scale.
    Results: Approximately 81% of the responses scored 4 or higher, suggesting a generally high accuracy level. However, 5 responses scored >3, indicating minor potentially harmful inaccuracies. The overall median score was 4. Fleiss multirater kappa value showed moderate agreement among raters.
    Conclusion: ChatGPT generally provides reliable asthma-related information, but its limitations, such as lack of depth in certain responses and inability to cite sources or update in real time, were noted. It shows promise as an educational tool, but it should not be a substitute for professional medical advice. Future studies should explore its applicability for different user demographics and compare it with newer artificial intelligence models.
    Keywords:  AI; Asthma; ChatGPT; artificial intelligence; patient education
    DOI:  https://doi.org/10.1016/j.jacig.2024.100330
  28. J ISAKOS. 2024 Sep 20. pii: S2059-7754(24)00170-6. [Epub ahead of print] 100323
      INTRODUCTION: In recent years, Artificial Intelligence (AI) has seen substantial progress in its utilization, with Chat Generated Pre-Trained Transformer (ChatGPT) emerging as a popular language model. The purpose of this study was to test the accuracy and reliability of ChatGPT's responses to frequently asked questions (FAQ) pertaining to reverse shoulder arthroplasty (RSA).METHODS: The ten most common FAQs were queried from institution patient education websites. These ten questions were then input into the chatbot during a single session without additional contextual information. The responses were then critically analyzed by two orthopedic surgeons for clarity, accuracy, and the quality of evidence-based information using The Journal of the American Medical Association (JAMA) Benchmark criteria and the DISCERN score. The readability of the responses was analyzed using the Flesch-Kincaid Grade Level.
    RESULTS: In response to the ten questions, the average DISCERN score was 44 (range 38-51). Seven responses were classified as fair and three were poor. The JAMA Benchmark criteria score was 0 for all responses. Furthermore, the average Flesch-Kincaid Grade Level was 14.35, which correlates to a college graduate reading level.
    CONCLUSION: Overall, ChatGPT was able to provide fair responses to common patient questions. However, the responses were all written at a college graduate reading level and lacked reliable citations. The readability greatly limits its utility. Thus, adequate patient education should be done by orthopedic surgeons. This study underscores the need for patient education resources that are reliable, accessible, and comprehensible.
    LEVEL OF EVIDENCE: IV.
    Keywords:  AI; Artificial Intelligence; Deep Learning; Machine Learning; Reverse Shoulder Arthroplasty; Total Shoulder Arthroplasty
    DOI:  https://doi.org/10.1016/j.jisako.2024.100323
  29. Int J Cardiol. 2024 Sep 19. pii: S0167-5273(24)01198-7. [Epub ahead of print]417 132576
      Chat Generative Pretrained Transformer (ChatGPT) is a natural language processing tool created by OpenAI. Much of the discussion regarding artificial intelligence (AI) in medicine is the ability of the language to enhance medical practice, improve efficiency and decrease errors. The objective of this study was to analyze the ability of ChatGPT to answer board-style cardiovascular medicine questions by using the Medical Knowledge Self-Assessment Program (MKSAP).The study evaluated the performance of ChatGPT (versions 3.5 and 4), alongside internal medicine residents and internal medicine and cardiology attendings, in answering 98 multiple-choice questions (MCQs) from the Cardiovascular Medicine Chapter of MKSAP. ChatGPT-4 demonstrated an accuracy of 74.5 %, comparable to internal medicine (IM) intern (63.3 %), senior resident (63.3 %), internal medicine attending physician (62.2 %), and ChatGPT-3.5 (64.3 %) but significantly lower than cardiology attending physician (85.7 %). Subcategory analysis revealed no statistical difference between ChatGPT and physicians, except in valvular heart disease where cardiology attending outperformed ChatGPT (p = 0.031) for version 3.5, and for heart failure (p = 0.046) where ChatGPT-4 outperformed senior resident. While ChatGPT shows promise in certain subcategories, in order to establish AI as a reliable educational tool for medical professionals, performance of ChatGPT will likely need to surpass the accuracy of instructors, ideally achieving the near-perfect score on posed questions.
    Keywords:  Artificial intelligence; Cardiology; ChatGPT; MKSAP
    DOI:  https://doi.org/10.1016/j.ijcard.2024.132576
  30. Int J Cardiol. 2024 Sep 21. pii: S0167-5273(24)01213-0. [Epub ahead of print]417 132591
      OBJECTIVE: YouTube®, attracts billions of monthly viewers, including those seeking health-related content. However, the quality standards of information are highly variable. The study aimed to evaluate the educational merit of YouTube® concerning pacemakers, focusing on quality and reliability for educating both patients and physicians.METHODS: The term "pacemaker" was searched on YouTube®. Following the application of exclusion criteria based on video language, duration, and minimum view count, a total of 71 videos were analyzed. Quality was assessed using the Global Quality Score (GQS), while reliability was evaluated using the modified DISCERN (mDISCERN) score. Data and metrics available regarding the channel and the specific video were obtained. The Kolmogorov-Smirnov test was employed to assess data normality, and the Mann-Whitney U test was utilized to determine statistical significance.
    RESULTS: Youtube videos on cardiac pacemakers proved to be of moderate quality, with an average GQS score of 3.10, and of moderate reliability, indicated by a mean mDISCERN score of 3.08. Higher scores were reported for videos of more than five minute duration, the ones that were targeted at physicians, and the videos with higher view ratios. The presence of a board-certified MD yielded statistically greater mean GQS, but not mDISCERN score. No statistical difference was observed based on the number of likes.
    CONCLUSION: In conclusion, while YouTube® offers significant education opportunities, there is a clear need for enhanced oversight and quality control. Healthcare providers should guide patients towards valid resources and consider collaborating with platforms to develop content standards.
    Keywords:  Pacemaker; Patient education; Video quality; Video reliability; YouTube; YouTube health
    DOI:  https://doi.org/10.1016/j.ijcard.2024.132591
  31. World Neurosurg. 2024 Sep 21. pii: S1878-8750(24)01629-2. [Epub ahead of print]
      
    Keywords:  chiari malformation surgery; medical content quality; online health information; patient education; video analysis; video reliability
    DOI:  https://doi.org/10.1016/j.wneu.2024.09.080
  32. Int Urogynecol J. 2024 Sep 24.
      INTRODUCTION AND HYPOTHESIS: The aim of this study is to examine the quality and content characteristics of educational videos on the use of vaginal cones published on YouTube.METHODS: Video searches were conducted on the YouTube website using the keyword "usage of vaginal cones". A total of 52 videos were included in the current study. Modified DISCERN (mDISCERN) and Journal of the American Medical Association (JAMA) scales were used to evaluate the reliability of the videos, and the Global Quality Scale (GQS) was used for quality and usefulness.
    RESULTS: As a result of the content analysis conducted, it was observed that 29 videos were classified as having "poor content" and 23 as "rich content." When we examined the sources of the videos (n = 52), it was found that the majority (58%, n = 30) were produced by nonhealth care sources (medical companies and nonhealth professionals). With statistical significance in mDISCERN and GQS (p = 0.014, p = 0.036), physiotherapists were found to have the highest average scores (4.11 ± 1.05, 3.44 ± 0.73) whereas doctors ranked second in the average standard deviation (3.09 ± 1.04, 2.82 ± 0.98). In JAMA, medical companies were found to have the highest average score with statistical significance (p = 0.015) at 3.4 ± 0.74, followed by doctors at 3 ± 1, and physiotherapists at 2.89 ± 0.78 when averages were analyzed.
    CONCLUSION: It is clearly evident that there is a need for higher quality and more reliable vaginal cone content database on YouTube. It is important for patients to be guided by health care professionals and informed about quality content criteria in order to access quality, reliable, and useful information.
    Keywords:  Reliability; Vaginal cone; Video-assisted education; YouTube
    DOI:  https://doi.org/10.1007/s00192-024-05932-y
  33. Medicine (Baltimore). 2024 Sep 20. 103(38): e39824
      YouTube (YT) is one of the world's most well recognized video-sharing platforms that appeals to large audiences and is used by individuals to educate themselves on disease diagnosis and treatment alternatives and to distribute health-related information. Videos were searched by typing the terms "migraine botox" and "botox treatment for migraine" on the YT search bar in English. A total of 50 videos were evaluated for each term. Two independent researchers viewed the videos and documented pertinent descriptive attributes of each video, such as the upload date, number of comments, number of dislikes, number of likes, and views. The videos were analyzed and the DISCERN Global Quality Scale (GQS), Journal of the American Medical Association (JAMA) quality, and reliability scores were recorded. A total of 100 videos were assessed. The mean DISCERN score was 3.09, the mean JAMA score was 2.11, and the mean GQS score was 3.25. According to the source, 32% of the videos were uploaded by university/nonprofit physicians or professional organizations. In addition, when the DISCERN, GQS, and JAMA scores of the videos uploaded by health professionals were examined, a statistically significant difference was observed (P = .002, P = .015, and P = .002, respectively). However, no statistically significant relationship was found for the Video Popularity Index score. The reliability and quality scores of the evaluated videos uploaded by healthcare professionals for migraine Botox treatment were high, but the frequency of viewership was low. In our analysis of migraine Botox treatment videos on YT, we observed that the information on migraine Botox treatment had a wide spectrum, high-quality content, and that there were videos that may mislead viewers. In conclusion, we believe that the platform is not sufficient in its entirety, and that it should be supported with renewed, fact-checked, easy-to-understand language, and video-duration optimized videos.
    DOI:  https://doi.org/10.1097/MD.0000000000039824
  34. Z Rheumatol. 2024 Sep 25.
      OBJECTIVE: YouTube is often used by patients and healthcare professionals to obtain medical information. Reactive arthritis (ReA) is a type of inflammatory arthritis triggered by infection, usually in the genitourinary or gastrointestinal tract. However, the accuracy and quality of ReA-related information on YouTube are not fully known. This study aimed to assess the reliability and quality of YouTube videos pertaining to ReA.MATERIALS AND METHODS: A YouTube search was performed on August 1, 2023, using the keywords "reactive arthritis," "Reiter's disease," and "Reiter's syndrome." The number of days since upload; the number of views, likes, and comments; and the duration of videos were recorded. The modified DISCERN tool (mDISCERN) and the global quality scale (GQS) were used to evaluate the reliability and quality of the videos. Two physicians independently classified videos as low, moderate, or high quality and rated them on a five-point GQS (1 = poor quality, 5 = excellent quality). The source of videos was also noted.
    RESULTS: Of the 180 videos screened, 68 met the inclusion criteria. The most common topic (61, 89.7%) was "ReA overview." Among the 68 videos analyzed, the main source of uploads was physicians 45 (66.2%), and 66 (97%) were categorized as useful. Around half of the YouTube videos about ReA were of high quality (33, 48.5%) according to the GQS. Upon comparing videos uploaded by rheumatologists, non-rheumatology healthcare professionals, and independent users, significant differences were found in mDISCERN and GQS but not in the number of views, likes, and comments or duration. Upon comparing high-, moderate-, and low-quality videos, significant differences were found in the number of views, likes, and comments; duration; and in mDISCERN and GQS.
    CONCLUSION: YouTube is a source of information on ReA of variable quality, with wide viewership and the potential to influence patients' knowledge and behavior. Our results showed that most YouTube videos on ReA were of high quality. Videos presented by physicians had higher quality. YouTube should consider avoiding low-quality videos by using validity scales such as mDISCERN and GQS.
    Keywords:  Information sources; Internet; Physician-patient relations; Rheumatic diseases ; Social media
    DOI:  https://doi.org/10.1007/s00393-024-01571-2
  35. Dent Traumatol. 2024 Sep 24.
      BACKGROUND: Sports dentistry aims to prevent and manage orofacial injuries, tooth fractures, tooth loss, and soft tissue trauma during sport activities. Mouthguards are appliances that protect athletes from dental trauma during contact sports. The video-sharing platform YouTube has a large number of informative videos about mouthguards. This study aimed to analyze the quality, accuracy, and reliability of YouTube videos about mouthguards, investigate the relationship between the features and the quality of mouthguard videos on YouTube, and provide suggestions for future informative content about mouthguards and sports dentistry.MATERIALS AND METHODS: The first 100 videos for each keyword from YouTube were collected using the keywords "mouthguard," "sports mouthguard," and "mouthguard and dental trauma." Videos meeting the inclusion criteria were categorized based on publisher (dental professionals and nonprofessionals) and type (animation/slideshow, interview, and product introduction). Video features were recorded. Video content quality, reliability, and accuracy were measured by the Video Information and Quality Index (VIQI), the Journal of the American Medical Association (JAMA) benchmarks, the DISCERN Instrument, the Global Quality Scale (GQS), and the usefulness score. Data were analyzed using SPSS (IBM 29.0) at a 95% statistical significance level (p = 0.05).
    RESULTS: Out of 300 videos, 80 videos were included. Most of the videos were uploaded by dental professionals (n = 49). The average values of the VIQI, JAMA, DISCERN, and GQS scores were 15.33 out of 20.0, 1.38 out of 4.00, 49.24 out of 80.0, and 2.99 out of 5.00, respectively. Videos uploaded by dental professionals had significantly higher scores in VIQI, JAMA, DISCERN, GQS, and usefulness scores but exhibited a lower number of likes, comments, and views (p < 0.05). Of all included videos, 51% (n = 41) were categorized as "moderately useful" and 10% (n = 8) as "very useful."
    CONCLUSIONS: Mouthguard videos uploaded by dental professionals are more useful, accurate, and of higher quality. Therefore, patients should consider the information shared by dental professionals. Greater participation from dentists in sharing high-quality content would be beneficial.
    Keywords:  mouthguard; sports dentistry; sports‐related orofacial injuries; video‐audio media
    DOI:  https://doi.org/10.1111/edt.12989
  36. J Prim Health Care. 2024 Sep;16(3): 270-277
      Introduction The volume and quality of online health information requires consumers to be discerning. Aim This study aimed to explore consumer Internet use for health information, preferred format and what factors helped them to trust the source. Methods A cross-sectional study was conducted in 2016-2017 with adults attending three cardiology outpatient clinic sites using a short paper-based survey. The survey included questions regarding online health information use and perceived trustworthiness with opportunities for free text responses. Survey data were summarised with key questions adjusted by age group, gender and ethnicity using logistic regression. Results Of the 708 respondents (51% women, 66% aged 45-74 years, 16% Māori, 12% Pacific), 73% had sought health information online (64% in the previous 12 months), commonly for medication side effects, their health condition and self-help. Most (65%) were successful, although Pacific respondents reported a lower likelihood of search success compared to Europeans. Younger age groups were more concerned about information quality. Fact sheets (80%) were the most popular format and for all ethnic groups, followed by short videos (31%) and discussion groups (23%). Trusting online information required many strategies with 72% wanting health professionals to recommend websites. Discussion Online health information seeking is a norm for consumers, with simple fact sheets being the preferred format to build knowledge and skills. With the rising tide of misinformation, health portal providers need to offer accurate and easy-to-read fact sheets in their suite of formats and health professionals need to support consumers guiding them to trusted websites.
    DOI:  https://doi.org/10.1071/HC23143
  37. Health Educ Behav. 2024 Sep 22. 10901981241278587
      To unpack the process of how health information seeking influences health behaviors, we examined the mediating roles of interpersonal discussion and online information sharing in the associations between health information seeking and healthy lifestyle behaviors and the moderating role of health literacy in the associations among health information seeking, interpersonal discussion, online information sharing, and healthy lifestyle behaviors. Data from a large-scale, representative survey (N = 916) revealed that interpersonal discussion and online information sharing mediated the associations between health information seeking and healthy lifestyle behaviors. The associations between health information seeking and interpersonal discussion and between health information seeking and online information sharing were stronger for individuals with high health literacy than those with low health literacy. Findings advance the understanding of the influence of health information seeking and provide practical guidance for promoting a healthy lifestyle.
    Keywords:  health behaviors; health information seeking; health literacy; information sharing; interpersonal discussion
    DOI:  https://doi.org/10.1177/10901981241278587
  38. Scand J Prim Health Care. 2024 Sep 27. 1-9
      Young adults experiencing unfamiliar symptoms commonly seek health information online. This study's aim was to explore how health information websites express and communicate health information about symptoms common among young adults and guide readers in regard to health, illness, and care. Symptoms commonly searched for by young adults were used as search terms. The resulting data comprised material from 24 web pages and was analyzed using content analysis. The foremost purpose of online health information is to try to narrow down the user's symptoms and then advise the user on what actions to take. This is done by first forming a foundation of knowledge through descriptions and explanations, then specifying the symptom's time, duration, and location, and finally giving advice on whether to self-manage symptoms or seek additional information about them. However, the uncertainty of the diagnosis may rule out self-care. For readers inexperienced with health care, forming a decisive conclusion about diffuse symptoms on the sole basis of online health information could be challenging. The necessity of numeracy skills and the ability to deal with uncertainty are highlighted. There is a discrepancy between health advice given online and readers' accessibility to health care that needs to be addressed in future policy and research.
    Keywords:  Online health information; health information website; health literacy; healthcare guide service; numeracy; young adult
    DOI:  https://doi.org/10.1080/02813432.2024.2408610
  39. Psychol Aging. 2024 Sep 19.
      The present study examined age differences in the influence of informational value cues on curiosity and information seeking. In two experiments, younger and older adults (total N = 514) rated their curiosity about content before having the opportunity to seek out more information. Experiment 1 examined the impact of social value on curiosity and information seeking about trivia. Online popularity metrics served as social value cues. Metric visibility increased engagement with high-popularity information for older adults, whereas it decreased engagement with low-popularity information for younger adults. Experiment 2 examined the impact of practical value on curiosity and information seeking about science facts. Personal and collective practical value were highlighted by linking the information to the domains of medicine and the environment, respectively. Patterns of curiosity and information seeking revealed greater sensitivity to collective practical value in older than younger adults. In both experiments, the relationship between curiosity and information seeking was stronger in older adults than in younger adults. Overall, these findings suggest that age differences in motivational priorities may lead to age differences in curiosity and information seeking. In addition to highlighting strategies for fostering curiosity in older learners, these findings may also inform digital literacy interventions aimed at reducing engagement with clickbait and misinformation. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
    DOI:  https://doi.org/10.1037/pag0000847
  40. J Med Libr Assoc. 2024 Jul 01. 112(3): 195-204
      Professional associations provide resources to support members' career development and facilitate ways for members to engage with and learn from one another. This article describes Medical Library Association (MLA) activities related to the revision of professional competencies and the restructuring of the organization's communities during the past twenty-five years. Grounded in MLA's Platform for Change, the MLA competency statement underwent two revisions with core themes remaining consistent. Major efforts went into rethinking the structure of MLA communities, and it became a strategic goal of the association. Numerous groups spent considerable time guiding the changes in MLA's community structure. Sections and special interest groups were transformed into caucuses. Domain hubs were established to facilitate project coordination across caucuses and create more leadership opportunities for MLA members, but their implementation did not meet expectations. Member engagement and leadership are ongoing challenges for MLA. The next twenty-five years will undoubtedly see additional revisions to the competencies and continued iterations of the community structure.
    Keywords:  Health Information Professionals; MLA competencies; Medical Library Association; Organizational Change
    DOI:  https://doi.org/10.5195/jmla.2024.1966
  41. J Med Libr Assoc. 2024 Jul 01. 112(3): 205-213
      On the occasion of the Medical Library Association's 125th Anniversary, four librarian leaders with a combined 105 years of engagement in MLA collaborated to reflect on the changes in our profession and our association. We draw on an examination of the last 25 years of the MLA Janet Doe Lectures, our own personal histories, and scholarship we produced for MLA publications and presentations. We offer this compilation as an invitation for readers to reflect on their experiences of changes within the profession, inspiration to engage in the issues around our place in society, and a source for additional exploration into researching and learning from our collective history.
    DOI:  https://doi.org/10.5195/jmla.2024.1948
  42. J Med Libr Assoc. 2024 Jul 01. 112(3): 180-185
      Over the past twenty-five years, the Medical Library Association (MLA) has pursued a range of diversity, equity, and inclusion (DEI) initiatives. This article, written by members of the Journal of the Medical Library Association (JMLA)'s Equity Advisory Group (EAG), outlines significant measures taken to raise awareness about specific concepts, opportunities, and challenges related to DEI among MLA members. Topics discussed include the impact of influential Black, Indigenous, and people of color (BIPOC) leaders, the establishment of DEI and social justice-focused membership communities, and specific initiatives led by various working groups and committees which have served to strengthen MLA's commitment to diversity, equity, and inclusion during the last three decades.
    Keywords:  Diversity; equity; history; inclusion; retrospective; social justice
    DOI:  https://doi.org/10.5195/jmla.2024.1967