bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–04–06
fifteen papers selected by
Thomas Krichel, Open Library Society



  1. CJC Open. 2025 Mar;7(3): 297-303
       Background: Percutaneous coronary intervention (PCI) is the most common treatment for coronary artery disease revascularization. Many patients undergoing PCI may seek educational information online, but the reliability of such resources remains uncertain. This study seeks to assess the readability and understandability of online patient resources for PCI from Canadian hospital sources.
    Methods: We performed a descriptive study evaluating online educational materials pertaining to PCI hosted by all Canadian hospitals that perform the procedure. The primary outcomes were readability, assessed using the Flesch-Kincaid Grade Level (FKGL) and Scolarius score, and understandability plus actionability, as assessed using the Patient Education Materials Assessment Tool (PEMAT). Educational clinical material is recommended to be written at an FKGL between 6 and 8. A score between 50 and 89 on the Scolarius tool suggests the text is readable by most adults, and a PEMAT score >70% corresponds to an understandable and actionable educational material.
    Results: A total of 29 Canadian hospitals performing PCI and hosting unique educational content were identified. Only 71% of PCI-capable hospitals provide relevant online educational resources to patients. The average FKGL of the analyzed content was 10 (range 5-18) and the average Scolarius score was 127.8 (range 79-173). The average total PEMAT print score was 46.1%, whereas the average total PEMAT audiovisual score was 71.8%.
    Conclusions: Most of the educational material pertaining to PCI created by Canadian hospitals is in English and print format, and of poor readability, understandability, and actionability. Audiovisual materials perform better but are sparsely used.
    DOI:  https://doi.org/10.1016/j.cjco.2024.11.012
  2. Health Informatics J. 2025 Apr-Jun;31(2):31(2): 14604582251328930
      Objective: This study aims to conduct a multidimensional evaluation of non-small cell lung cancer (NSCLC)-related videos on social media platforms in China (TikTok, Bilibili, and Red).Methods: Validated tools were used to evaluate video quality (DISCERN instrument), reliability (Journal of the American Medical Association [JAMA] benchmarks), understandability, and actionability (Patient Education Materials Assessment Tool [PEMAT]).Results: This study included 96 videos, primarily created by medical professionals (n = 63). The median DISCERN score was 30.0 (IQR 28.5-34.4), indicating poor quality overall. Compared to videos rated as "good", the "poor" videos had significantly shorter durations (P = 0.040). The overall median understandability and actionability scores were 81.8% (IQR 75.0-90.9%) and 0% (IQR 0.0-66.7%), respectively, indicating good understandability but extremely poor actionability. Only one met all four JAMA benchmarks. TikTok videos with the shortest durations garnered the highest numbers of "likes", "comments", and "bookmarks", while Bilibili videos exhibited a relatively high overall quality.Conclusions: To guide the public in making informed medical decisions, Chinese NSCLC videos need improvement in various aspects.
    Keywords:  health information; non-small cell lung cancer; online video; quality; social media
    DOI:  https://doi.org/10.1177/14604582251328930
  3. Ophthalmic Epidemiol. 2025 Mar 28. 1-6
       PURPOSE: This study aimed to evaluate the accuracy and readability of responses generated by ChatGPT-4o, an advanced large language model, to frequently asked patient-centered questions about keratoconus.
    METHODS: A cross-sectional, observational study was conducted using ChatGPT-4o to answer 30 potential questions that could be asked by patients with keratoconus. The accuracy of the responses was evaluated by two board-certified ophthalmologists and scored on a scale of 1 to 5. Readability was assessed using the Simple Measure of Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL), and Flesch Reading Ease (FRE) scores. Descriptive, treatment-related, and follow-up-related questions were analyzed, and statistical comparisons between these categories were performed.
    RESULTS: The mean accuracy score for the responses was 4.48 ± 0.57 on a 5-point Likert scale. The interrater reliability, with an intraclass correlation coefficient of 0.769, indicated a strong level of agreement. Readability scores revealed a SMOG score of 15.49 ± 1.74, an FKGL score of 14.95 ± 1.95, and an FRE score of 27.41 ± 9.71, indicating that a high level of education is required to comprehend the responses. There was no significant difference in accuracy among the different question categories (p = 0.161), but readability varied significantly, with treatment-related questions being the easiest to understand.
    CONCLUSION: ChatGPT-4o provides highly accurate responses to patient-centered questions about keratoconus, though the complexity of its language may limit accessibility for the general population. Further development is needed to enhance the readability of AI-generated medical content.
    Keywords:  ChatGPT-4o; healthcare; keratoconus; large language models; readability
    DOI:  https://doi.org/10.1080/09286586.2025.2484760
  4. Epilepsia Open. 2025 Apr 01.
       OBJECTIVE: Artificial intelligence chatbots have been a game changer in healthcare, providing immediate, round-the-clock assistance. However, their accuracy across specific medical domains remains under-evaluated. Dravet syndrome remains one of the most challenging epileptic encephalopathies, with new data continuously emerging in the literature. This study aims to evaluate and compare the performance of ChatGPT 3.5 and Perplexity in responding to questions about Dravet Syndrome.
    METHODS: We curated 96 questions about Dravet syndrome, 43 from healthcare professionals and 53 from caregivers. Two epileptologists independently graded the chatbots' responses, with a third senior epileptologist resolving any disagreements to reach a final consensus. Accuracy and completeness of correct answers were rated on predefined 3-point scales. Incorrect responses were prompted for self-correction and re-evaluated. Readability was assessed using Flesch reading ease and Flesch-Kincaid grade level.
    RESULTS: Both chatbots had the majority of their responses rated as "correct" (ChatGPT 3.5: 66.7%, Perplexity: 81.3%), with no significant difference in performance between the two (χ2 = 5.30, p = 0.071). ChatGPT 3.5 performed significantly better for caregivers than for healthcare professionals (χ2 = 7.27, p = 0.026). The topic with the poorest performance was Dravet syndrome's treatment, particularly for healthcare professional questions. Both models exhibited exemplary completeness, with most responses rated as "complete" to "comprehensive" (ChatGPT 3.5: 73.4%, Perplexity: 75.7%). Substantial self-correction capabilities were observed: ChatGPT 3.5 improved 55.6% of responses and Perplexity 80%. The texts were generally very difficult to read, requiring an advanced reading level. However, Perplexity's responses were significantly more readable than ChatGPT 3.5's [Flesch reading ease: 29.0 (SD 13.9) vs. 24.1 (SD 15.0), p = 0.018].
    SIGNIFICANCE: Our findings underscore the potential of AI chatbots in delivering accurate and complete responses to Dravet syndrome queries. However, they have limitations, particularly in complex areas like treatment. Continuous efforts to update information and improve readability are essential.
    PLAIN LANGUAGE SUMMARY: Artificial intelligence chatbots have the potential to improve access to medical information, including on conditions like Dravet syndrome, but the quality of this information is still unclear. In this study, ChatGPT 3.5 and Perplexity correctly answered most questions from healthcare professionals and caregivers, with ChatGPT 3.5 performing better for caregivers. Treatment-related questions had the most incorrect answers, particularly those from healthcare professionals. Both chatbots demonstrated the ability to correct previous incorrect responses, particularly Perplexity. Both chatbots produced text requiring advanced reading skills. Further improvements are needed to make the text easier to understand and address difficult medical topics.
    Keywords:  Artificial intelligence; ChatGPT 3.5; Dravet syndrome; Large language model; Perplexity
    DOI:  https://doi.org/10.1002/epi4.70022
  5. Int J Obstet Anesth. 2025 Feb 15. pii: S0959-289X(25)00016-0. [Epub ahead of print]62 104344
       BACKGROUND: Labour epidurals are considered the gold standard for labour analgesia in pregnant patients. Inequities in health literacy levels can negatively impact understanding of online patient education materials, potentially affecting uptake of labour epidural analgesia. Generative artificial technology such as ChatGPT may be able to improve readability of patient information materials.
    OBJECTIVES: Firstly, to assess the readability of available online materials on labour epidurals in the United Kingdom (UK). Secondly, to evaluate the ability of generative artificial technology to improve readability.
    METHODS: All UK public hospitals' websites performing obstetric anaesthesia were searched for patient education materials relating to labour epidurals. A readability assessment was conducted using three readability scoring systems. ChatGPT was used to rewrite content of online patient information material on labour epidural analgesia to be understandable by an individual with the health literacy level of an 11-year-old (sixth grade).
    RESULTS: A total of 61.6% of UK hospitals provided some form of online patient education materials on labour analgesia and epidurals, 14.5% and 23.2% of the texts, met the target readability in two commonly used readability scores, respectively. The mean grade (8.4 ± 2.1) did not meet target readability levels. After AI-modification, 24.6% and 27.5% of the texts met targets using the same metrics, with the mean grade (7.7 ± 1.2) decreasing significantly (P <0.001), but still not meeting the target level.
    CONCLUSION: Online patient-information on labour epidural analgesia frequently exceeds the recommended sixth grade reading level. ChatGPT can be used to enhance readability but also fails to meet recommended health literacy standards.
    Keywords:  Artificial intelligence; ChatGPT; Epidural; Health literacy; Internet; Labour; Obstetric anaesthesia; Readability
    DOI:  https://doi.org/10.1016/j.ijoa.2025.104344
  6. JSES Int. 2025 Mar;9(2): 390-397
       Background: Rotator cuff tears are common upper-extremity injuries that significantly impair shoulder function, leading to pain, reduced range of motion, and a decrease in quality of life. With the increasing reliance on artificial intelligence large language models (AI LLMs) for health information, it is crucial to evaluate the quality and readability of the information provided by these models.
    Methods: A pool of 50 questions was generated related to rotator cuff tear by querying popular AI LLMs (ChatGPT 3.5, ChatGPT 4, Gemini, and Microsoft CoPilot) and using Google search. After that, responses from the AI LLMs were saved and evaluated. For information quality the DISCERN tool and a Likert Scale was used, for readability the Patient Education Materials Assessment Tool for Printable Materials (PEMAT) Understandability Score and the Flesch-Kincaid Reading Ease Score was used. Two orthopedic surgeons assessed the responses, and discrepancies were resolved by a senior author.
    Results: Out of 198 answers, the median DISCERN score was 40, with 56.6% considered sufficient. The Likert Scale showed 96% sufficiency. The median PEMAT Understandability score was 83.33, with 77.3% sufficiency, while the Flesch-Kincaid Reading Ease score had a median of 42.05 with 88.9% sufficiency. Overall, 39.8% of the answers were sufficient in both information quality and readability. Differences were found among AI models in DISCERN, Likert, PEMAT Understandability, and Flesch-Kincaid scores.
    Conclusion: AI LLMs generally cannot offer sufficient information quality and readability. While they are not ready for use in medical field, they show a promising future. There is a necessity for continuous re-evaluation of these models due to their rapid evolution. Developing new, comprehensive tools for evaluating medical information quality and readability is crucial for ensuring these models can effectively support patient education. Future research should focus on enhancing readability and consistent information quality to better serve patients.
    Keywords:  AI Tools in Healthcare; Artificial intelligence; ChatGPT; Frequently asked questions; Large language models; Patient information; Rotator cuff tears
    DOI:  https://doi.org/10.1016/j.jseint.2024.11.012
  7. J Med Syst. 2025 Apr 03. 49(1): 43
       BACKGROUND: Artificial intelligence (AI) chatbots are increasingly used for medical inquiries, including sensitive topics like sexually transmitted diseases (STDs). However, concerns remain regarding the reliability and readability of the information they provide. This study aimed to assess the reliability and readability of AI chatbots in providing information on STDs. The key objectives were to determine (1) the reliability of STD-related information provided by AI chatbots, and (2) whether the readability of this information meets the recommended standarts for patient education materials.
    METHODS: Eleven relevant STD-related search queries were identified using Google Trends and entered into four AI chatbots: ChatGPT, Gemini, Perplexity, and Copilot. The reliability of the responses was evaluated using established tools, including DISCERN, EQIP, JAMA, and GQS. Readability was assessed using six widely recognized metrics, such as the Flesch-Kincaid Grade Level and the Gunning Fog Index. The performance of chatbots was statistically compared in terms of reliability and readability.
    RESULTS: The analysis revealed significant differences in reliability across the AI chatbots. Perplexity and Copilot consistently outperformed ChatGPT and Gemini in DISCERN and EQIP scores, suggesting that these two chatbots provided more reliable information. However, results showed that none of the chatbots achieved the 6th-grade readability standard. All the chatbots generated information that was too complex for the general public, especially for individuals with lower health literacy levels.
    CONCLUSION: While Perplexity and Copilot showed better reliability in providing STD-related information, none of the chatbots met the recommended readability benchmarks. These findings highlight the need for future improvements in both the accuracy and accessibility of AI-generated health information, ensuring it can be easily understood by a broader audience.
    Keywords:  AI-generated health information; Health Comminucation; Sexually Transmitted Diseases
    DOI:  https://doi.org/10.1007/s10916-025-02178-z
  8. J Rhinol. 2025 Mar;32(1): 36-39
       BACKGROUND AND OBJECTIVES: YouTube has become a widely used educational platform for medical trainees in endoscopic surgery. However, the quality of surgical videos on this platform remains unregulated. This study evaluates the educational quality of YouTube videos on endoscopic choanal atresia repair using a validated assessment tool.
    METHODS: In this descriptive cross-sectional study, 50 YouTube videos on endoscopic choanal atresia surgery were analyzed. Video quality was assessed using the LAParoscopic surgery Video Educational GuidelineS (LAP-VEGaS) checklist, which evaluates content structure, procedural clarity, and outcomes reporting.
    RESULTS: Among 108 initially identified videos, 50 met the inclusion criteria. Video quality scores ranged from 1 to 16, with a median score of 7. The most frequently included elements were step-by-step approach (96%), patient anonymity (96%), and descriptive title (76%). Procedural clarity received moderate scores overall, with only the "step-by-step approach" achieving consistent quality. Outcomes reporting was notably deficient, with 90% of videos failing to address postoperative morbidity or complications.
    CONCLUSION: Most YouTube videos on endoscopic choanal atresia surgery lack the quality required for effective surgical education. As digital platforms increasingly supplement traditional training, academic institutions and specialists should prioritize creating and sharing high-quality, standardized educational content on public platforms like YouTube.
    Keywords:  Choanal atresia; Choanal atresia repair; Endoscopy; Medical education; YouTube
    DOI:  https://doi.org/10.18787/jr.2024.00037
  9. J Fluency Disord. 2025 Mar 26. pii: S0094-730X(25)00018-X. [Epub ahead of print]84 106116
      Adults who stutter (AWS) often turn to social media platforms to connect with others, exchange personal experiences and access informational content. This study aimed to assess the reliability, quality, understandability, and actionability of videos about stuttering on these platforms, evaluating them based on both content and source. The most relevant YouTube keywords related to stuttering were identified using Google Trends, and popular Instagram hashtags were determined through Later Application. Videos from the first three pages for each keyword were analyzed for YouTube, and the top 100 videos with the highest engagement for each hashtag were selected using Python for Instagram. Speech and Language Therapists (SLTs) rated the videos using the Modified Quality Criteria for Consumer Health Information (M-DISCERN), Global Quality Score (GQS), and Patient Education Materials Assessment Tool (PEMAT). The analysis also included the number of ratings and likes on comments. Videos created by SLTs on YouTube and Instagram are more reliable and of higher quality compared to videos on AWS and non-expert sources (p < .001). On YouTube, videos created by SLTs are superior in quality, reliability, comprehensibility compared to videos produced by other healthcare professionals (p < .001). Additionally, videos on AWS receive a greater number of positive comments compared to videos from SLTs and other healthcare professionals (p < .001). AWS should carefully consider the content and source of the videos they watch. There is a need for greater social awareness, and SLTs should be encouraged to produce high-quality content on social media platforms to ensure the dissemination of accurate and helpful information.
    Keywords:  Instagram; Social media; Stuttering; Video quality assessment; YouTube
    DOI:  https://doi.org/10.1016/j.jfludis.2025.106116
  10. Medicine (Baltimore). 2025 Jan 03. 104(1): e41213
      Today, people frequently turn to the internet to seek information about temporomandibular joint disorders and treatments, as in all other health areas. However, does the information presented online without professional evaluation truly reflect the facts, and if so, to what extent? Based on this question, our study aims to evaluate YouTube™ videos on the treatment of temporomandibular joint diseases. In this cross-sectional study, a search was conducted on YouTube using the search term "TMJ (temporomandibular joint) treatment" for YouTube videos. One hundred sixty-three videos that met the study criteria were evaluated for content usefulness by 3 researchers. The videos were categorized as having low and high content according to usefulness score. All videos were classified according to the source and type of the videos. Statistical analysis were conducted using with Chi-Square test and Mann-Whitney U test. It was found that 130 videos had low content, while 33 videos had high content. It was observed that the number of views, duration in minutes, number of comments, number of likes, number of days since uploading and the rate of views were higher in videos with high content (P < .05). However, no significant association was found between the usefulness score and the source that uploaded the video and video type (P > .05). The results of our study reveal that the vast majority of videos on the treatment of temporomandibular diseases on YouTube contain insufficient information.
    DOI:  https://doi.org/10.1097/MD.0000000000041213
  11. Ann Plast Surg. 2025 Apr 01. 94(4S Suppl 2): S395-S400
       BACKGROUND: With the rise of social media as a knowledge sharing resource, patients increasingly obtain information regarding plastic surgery online. Publicly shared posts may influence active users' desire to undergo procedures and misinform their expectations, which is of particular importance in patients with body dysmorphic disorder (BDD), who may already suffer from an altered mentality of their appearance. To date, no study has assessed the quality of information about BDD on TikTok. Thus, our study aims to evaluate the usefulness and quality of the most trending TikTok videos related to BDD and plastic surgery.
    METHODS: A search was conducted on TikTok using keywords "body dysmorphia," "body dysmorphia plastic surgery," and "dysmorphophobia," and the top 15 trending videos in each category were selected. Two expert reviewers assessed the videos using the DISCERN and Global Quality Score evaluation tools.
    RESULTS: The mean ± SD DISCERN score across all videos was 22.66 ± 8.62. Videos uploaded by certified healthcare accounts (11/45) had a mean ± SD score of 29.87 ± 8.85, indicating a "poor" grade. These scores were significantly higher than those uploaded by nonhealthcare accounts, which received a mean ± SD score of 20.32 ± 7.17, reflecting a "very poor" grade (P < 0.0001). The overall Global Quality Score mean ± SD was 1.96 ± 1.11, with uploads from healthcare professionals scoring significantly higher than uploads from nonhealthcare professionals (3.00 ± 1.34 vs 1.62 ± 0.78, P < 0.0001).
    CONCLUSIONS: With the increasing prevalence of BDD and social media usage, vulnerable individuals may be prone to comparing themselves to others, potentially further impacting their self-image and driving them toward a more permanent, surgical solution. These findings highlight the suboptimal BDD-related content on TikTok, stressing the potential for further disease development in this at-risk population. As plastic surgeons ascertain whether a patient is a candidate for cosmetic surgery, directing them to reliable resources can ensure proper education and foster realistic expectations.
    DOI:  https://doi.org/10.1097/SAP.0000000000004314
  12. J Cancer Educ. 2025 Apr 01.
      The aim of the present work was to conduct a descriptive cross-sectional study that consisted on the assessment of videos published in the social platform TikTok about HPV, and its appearance in the head and neck. The first 100 Spanish-language videos suggested by TikTok were selected, as well as the first 100 videos in English. To assess the reliability of the videos, the Modified DISCERN tool was utilized-Global Quality Score (GQS) for quality and modified DISCERN for reliability. Statistically significant differences were found between the variables language, with a significant relationship with objective (p = 0.01), current information (p < 0.01), and balance and objectivity (p = 0.01). The duration of the video was significantly related with the objective (p < 0.01) and clarity and understanding (p < 0.01), but not with other metrics such as the source of information (p = 0.87), or balance and objectivity (p = 0.92). The HPV contents must be verified by experts to avoid the propagation of incorrect information. There is little information available online about the relationship between oropharyngeal cancer and the human papillomavirus (HPV), which makes it difficult to access trustworthy and current resources about the subject.
    Keywords:  Human papillomavirus; Oropharyngeal cancer; TikTok
    DOI:  https://doi.org/10.1007/s13187-025-02614-1
  13. Int Orthod. 2025 Apr 01. pii: S1761-7227(25)00037-3. [Epub ahead of print]23(3): 101002
       OBJECTIVE: The aim of the study was to determine the reliability, quality and readability of content contained within informed consent forms concerning orthodontic retention and retainers provided by orthodontic treatment providers.
    METHODS: An online search strategy identified informed consent forms for evaluation. The DISCERN instrument was used to determine content reliability. Each form was assessed for the presence of pre-determined content regarding 11 domains. Analysis for quality of the domain content was via a 4-point scoring scale. The Simple Measure of Gobbledegook (SMOG) and the Flesch-Kincaid Grade-Level (FKGL) were employed to determine readability.
    RESULTS: Thirty-four forms satisfied selection criteria. The majority (n=20; 58.8%) were sourced from websites in the US, with most (n=22; 64.7%) from specialist orthodontist websites. The mean (SD) DISCERN score per form was 31.9 (4.5). The mean (SD) number of domains present within each form was 7.76 (1.65). The mean (SD) number of points scored per form was 14.82 (3.01) from a maximum of 33. Information regarding retainer review and relevant potential impacts on quality-of-life was lacking and scored poorly. The requirement for lifetime retention was stated in 25 (73.5%) forms. Forms sourced from specialist orthodontist websites scored higher (P=0.016) than those sourced from general dentist and multi-disciplinary clinic websites. The median (IQR) SMOG and FKGL scores were 10.11 (9.55) and 9.95 (9.18) respectively.
    CONCLUSIONS: The reliability and quality of the informed consent forms concerning orthodontic retention and retainers was generally poor. The readability of the forms failed to meet recommended guidelines, meaning that many are likely not to comprehend the information provided.
    Keywords:  Dentistry; Ethics; Informed consent; Orthodontic retention; Orthodontics; Retainers; Valid consent
    DOI:  https://doi.org/10.1016/j.ortho.2025.101002
  14. J Med Internet Res. 2025 Mar 31. 27 e68560
       BACKGROUND: As large language model (LLM)-based chatbots such as ChatGPT (OpenAI) grow in popularity, it is essential to understand their role in delivering online health information compared to other resources. These chatbots often generate inaccurate content, posing potential safety risks. This motivates the need to examine how users perceive and act on health information provided by LLM-based chatbots.
    OBJECTIVE: This study investigates the patterns, perceptions, and actions of users seeking health information online, including LLM-based chatbots. The relationships between online health information-seeking behaviors and important sociodemographic characteristics are examined as well.
    METHODS: A web-based survey of crowd workers was conducted via Prolific. The questionnaire covered sociodemographic information, trust in health care providers, eHealth literacy, artificial intelligence (AI) attitudes, chronic health condition status, online health information source types, perceptions, and actions, such as cross-checking or adherence. Quantitative and qualitative analyses were applied.
    RESULTS: Most participants consulted search engines (291/297, 98%) and health-related websites (203/297, 68.4%) for their health information, while 21.2% (63/297) used LLM-based chatbots, with ChatGPT and Microsoft Copilot being the most popular. Most participants (268/297, 90.2%) sought information on health conditions, with fewer seeking advice on medication (179/297, 60.3%), treatments (137/297, 46.1%), and self-diagnosis (62/297, 23.2%). Perceived information quality and trust varied little across source types. The preferred source for validating information from the internet was consulting health care professionals (40/132, 30.3%), while only a very small percentage of participants (5/214, 2.3%) consulted AI tools to cross-check information from search engines and health-related websites. For information obtained from LLM-based chatbots, 19.4% (12/63) of participants cross-checked the information, while 48.4% (30/63) of participants followed the advice. Both of these rates were lower than information from search engines, health-related websites, forums, or social media. Furthermore, use of LLM-based chatbots for health information was negatively correlated with age (ρ=-0.16, P=.006). In contrast, attitudes surrounding AI for medicine had significant positive correlations with the number of source types consulted for health advice (ρ=0.14, P=.01), use of LLM-based chatbots for health information (ρ=0.31, P<.001), and number of health topics searched (ρ=0.19, P<.001).
    CONCLUSIONS: Although traditional online sources remain dominant, LLM-based chatbots are emerging as a resource for health information for some users, specifically those who are younger and have a higher trust in AI. The perceived quality and trustworthiness of health information varied little across source types. However, the adherence to health information from LLM-based chatbots seemed more cautious compared to search engines or health-related websites. As LLMs continue to evolve, enhancing their accuracy and transparency will be essential in mitigating any potential risks by supporting responsible information-seeking while maximizing the potential of AI in health contexts.
    Keywords:  consumer health information; eHealth; internet; large language models; online health information–seeking
    DOI:  https://doi.org/10.2196/68560