bims-librar Biomed News
on Biomedical librarianship
Issue of 2023‒12‒17
seventeen papers selected by
Thomas Krichel, Open Library Society



  1. Nature. 2023 Dec;624(7991): 227
      
    Keywords:  Funding; Policy; Research data; Research management
    DOI:  https://doi.org/10.1038/d41586-023-03935-1
  2. J Clin Epidemiol. 2023 Dec 08. pii: S0895-4356(23)00327-X. [Epub ahead of print] 111237
      OBJECTIVE: Systematic reviews (SRs) are considered the gold standard of evidence, but many published SRs are of poor quality. This study identifies how librarian involvement in SRs is associated with quality reported methods and examines the lack of motivation for involving a librarian in SRs.STUDY DESIGN AND SETTING: We searched databases for SRs that were published by a first or last author affiliated to a Vancouver hospital or biomedical research site and published between 2015-2019. Corresponding authors of included SRs were contacted through an email survey to determine if a librarian was involved in the SR. If a librarian was involved in the SR, the survey asked at what level the librarian was involved and if a librarian was not involved the survey asked why. Quality of reported search methods was scored independently by two reviewers. A linear regression model was used to determine the association between quality of reported search methods scores and the level at which a librarian was involved in the study.
    RESULTS: 191 SRs were included in this study and 118 (62%) of the SRs authors indicated whether a librarian was involved in the SR. SRs that included a librarian as a co-author had a 15.4% higher quality assessment score than SRs that did not include a librarian. Most authors (27;75%) who did not include a librarian in their SR, did not do so because they did not believe it was necessary.
    CONCLUSION: Higher level of librarian involvement in SRs is correlated with higher scores in reported search methods. Greater advocacy or changes at the policy level is necessary to increase librarian involvement in SRs and as a result the quality of their search methods.
    Keywords:  librarian contribution; quality; reporting; search methods; systematic review
    DOI:  https://doi.org/10.1016/j.jclinepi.2023.111237
  3. J Orthop. 2024 Feb;48 103-106
      Background: Machine learning assisted systematic reviewing may help to reduce the work burden in systematic reviews. The aim of this study is therefore to determine by a non-developer the performance of machine learning assisted systematic reviewing on previously published orthopaedic reviews in retrieving relevant papers.Methods: Active learning for Systematic Reviews (ASReview) was tested against the results from three previously published systematic reviews in the field of orthopaedics with 20 iterations for each review. The reviews covered easy, intermediate and advanced scenarios. The outcomes of interest were the percentage work saved at 95% recall (WSS@95), the percentage work saved at 100% recall (WSS@100) and the percentage of relevant references identified after having screened the first 10% of the records (RRF@10). Means and corresponding [95% confidence intervals] were calculated.
    Results: The WSS@95 was respectively 72 [71-74], 72 [72-73] and 50 [50-51] for the easy, intermediate and advanced scenarios. The WSS@100 was respectively 72 [71-73], 62 [61-63] and 37 [36-38] for the easy, intermediate and advanced scenarios. The RRF@10 was respectively 79 [78-81], 70 [69-71] and 58 [56-60] for the easy, intermediate and advanced scenarios.
    Conclusions: Machine learning assisted systematic reviewing was efficient in retrieving relevant papers for systematic review in orthopaedics. The majority of relevant papers were identified after screening only 10% of the papers. All relevant papers were identified after screening 30%-40% of the total papers meaning that 60%-70% of the work can potentially be saved.
    Keywords:  ASReview; Artificial intelligence; Machine learning; Screening; Systematic review
    DOI:  https://doi.org/10.1016/j.jor.2023.11.051
  4. Cancer Epidemiol. 2023 Dec 09. pii: S1877-7821(23)00191-1. [Epub ahead of print]88 102511
      To evaluate the performance accuracy and workload savings of artificial intelligence (AI)-based automation tools in comparison with human reviewers in medical literature screening for systematic reviews (SR) of primary studies in cancer research in order to gain insights on improving the efficiency of producing SRs. Medline, Embase, the Cochrane Library, and PROSPERO databases were searched from inception to November 30, 2022. Then, forward and backward literature searches were completed, and the experts in this field including the authors of the articles included were contacted for a thorough grey literature search. This SR was registered on PROSPERO (CRD 42023384772). Among the 3947 studies obtained from search, five studies met the preplanned study selection criteria. These five studies evaluated four AI tools: Abstrackr (four studies), RobotAnalyst (one), EPPI-Reviewer (one), and DistillerSR (one). Without missing final included citations, Abstrackr eliminated 20%-88% of titles and abstracts (time saving of 7-86 hours) and 59% of the full-texts (62 h) from human review across four different cancer-related SRs. In comparison, RobotAnalyst (1% of titles and abstracts, 1 h), EPPI Review (38% of titles and abstracts, 58 h; 59% of full-texts, 62 h), DistillerSR (42% of titles and abstracts, 22 h) also provided similar or lower work savings for single cancer-related SRs. AI-based automation tools exhibited promising but varying levels of accuracy and efficiency during the screening process of medical literature for conducting SRs in the cancer field. Until further progress is made and thorough evaluations are conducted, AI tools should be utilized as supplementary aids rather than complete substitutes for human reviewers.
    Keywords:  Accuracy outcome; Artificial intelligence tools; Systematic survey; Time saving; Workload saving
    DOI:  https://doi.org/10.1016/j.canep.2023.102511
  5. Res Synth Methods. 2023 Dec 08.
      Data extraction is a time-consuming and resource-intensive task in the systematic review process. Natural language processing (NLP) artificial intelligence (AI) techniques have the potential to automate data extraction saving time and resources, accelerating the review process, and enhancing the quality and reliability of extracted data. In this paper, we propose a method for using Bing AI and Microsoft Edge as a second reviewer to verify and enhance data items first extracted by a single human reviewer. We describe a worked example of the steps involved in instructing the Bing AI Chat tool to extract study characteristics as data items from a PDF document into a table so that they can be compared with data extracted manually. We show that this technique may provide an additional verification process for data extraction where there are limited resources available or for novice reviewers. However, it should not be seen as a replacement to already established and validated double independent data extraction methods without further evaluation and verification. Use of AI techniques for data extraction in systematic reviews should be transparently and accurately described in reports. Future research should focus on the accuracy, efficiency, completeness, and user experience of using Bing AI for data extraction compared with traditional methods using two or more reviewers independently.
    Keywords:  AI; Bing; artificial intelligence; data extraction; machine learning; systematic review
    DOI:  https://doi.org/10.1002/jrsm.1689
  6. Cureus. 2023 Dec;15(12): e50298
      Introduction Avascular necrosis (AVN) of the femoral head is a type of osteonecrosis, which is caused by the disruption of blood flow to the proximal femur, resulting in osteocyte death. Regression of the disease is rare, and most patients will ultimately progress to having a total hip arthroplasty performed. Early diagnosis of AVN allows treatment options beyond total hip arthroplasty. One such procedure described is core decompression of the femoral head. Health literacy is defined as the ability to make health decisions in the context of everyday life. It has been shown that lower levels of health literacy are associated with higher complication rates. It has been recommended that patient information documents are written at a reading grade level (RGL) no higher that the sixth grade to help with health literacy. Methods Twenty-nine websites containing information on core decompression were identified, and the online readability software WebFX (Pennsylvania, USA) was used to carry out analysis on readability. This software was able to generate a Flesch reading ease score (FRES) and an RGL for each website. The search was carried out in the Republic of Ireland. Results The mean FRES score was 48.8 (standard deviation (SD) +/-15.3), which categorizes the data as "difficult to read." The mean RGL was 8.46 (SD +/-2.34), which is higher than the recommended target. Conclusion This study has shown that the material on the Internet regarding core decompression is above the recommended readability levels for the majority of patients. This aligns with results from similar studies that have assessed the readability of online patient information. Given these outcomes, it is imperative for physicians to take an active role in curating and delivering information to their patients, ensuring that it is comprehensible. This approach aims to empower patients with a clearer understanding of core decompression, enabling them to make more informed decisions about their health.
    Keywords:  avascular necrosis (avn); core decompression; health literacy; orthopaedics; readability
    DOI:  https://doi.org/10.7759/cureus.50298
  7. Sleep Breath. 2023 Dec 07.
      STUDY OBJECTIVES: Maxillomandibular advancement (MMA) is an effective surgical option for patients suffering from obstructive sleep apnea (OSA). As a relatively new treatment option, patients may turn to the Internet to learn more. However, online patient education materials (OPEMs) on MMA may be written at a higher literacy level than recommended for patients. The aim of this study was to analyze the readability of OPEMs on MMA.METHODS: A Google search of "maxillomandibular advancement" was performed, and the first 100 results were screened. Websites that met eligibility criteria were analyzed for their readability using the Automated Readability Index (ARI), Coleman-Liau Index (CLI), Flesch-Kincaid Grade Level (FKGL), Gunning Fog (GF), and Simple Measure of Gobbledygook (SMOG) and compared to the recommended sixth-grade reading level using one-tailed t tests. Readability scores were compared based on the type of website, including hospitals/universities or physician clinics, using ANOVA tests.
    RESULTS: The mean (SD) for ARI, CLI, FKGL, GF, and SMOG was 11.91 (2.43), 13.42 (1.81), 11.91 (2.06), 14.32 (2.34), and 13.99 (1.56), respectively. All readability scores were significantly higher than a sixth-grade reading level (p < 0.001). After comparing readability scores between different website types (university/hospital, clinic, and other), there was no statistical difference found.
    CONCLUSIONS: The available OPEMs on MMA surgery for OSA are above the recommended sixth-grade reading level. Identifying and reducing the gap between the reading levels of OPEMs and the reading level of the patient are needed to encourage a more active role, informed decisions, and better patient satisfaction.
    Keywords:  Health literacy; Maxillomandibular advancement; Online patient education materials; Readability
    DOI:  https://doi.org/10.1007/s11325-023-02952-8
  8. Resuscitation. 2023 Dec 09. pii: S0300-9572(23)00813-4. [Epub ahead of print] 110077
      INTRODUCTION: Cardiac arrest leaves witnesses, survivors, and their relatives with a multitude of questions. When a young or a public figure is affected, interest around cardiac arrest and cardiopulmonary resuscitation (CPR) increases. ChatGPT allows everyone to obtain human-like responses on any topic. Due to the risks of accessing incorrect information, we assessed ChatGPT accuracy in answering laypeople questions about cardiac arrest and CPR.METHODS: We co-produced a list of 40 questions with members of Sudden Cardiac Arrest UK covering all aspects of cardiac arrest and CPR. Answers provided by ChatGPT to each question were evaluated by professionals for their accuracy, by professionals and laypeople for their relevance, clarity, comprehensiveness, and overall value on a scale from 1 (poor) to 5 (excellent), and for readability.
    RESULTS: ChatGPT answers received an overall positive evaluation (4.3±0.7) by 14 professionals and 16 laypeople. Also, clarity (4.4±0.6), relevance (4.3±0.6), accuracy (4.0±0.6), and comprehensiveness (4.2±0.7) of answers was rated high. Professionals, however, rated overall value (4.0±0.5 vs 4.6±0.7; p=0.02) and comprehensiveness (3.9±0.6 vs 4.5±0.7; p=0.02) lower compared to laypeople. CPR-related answers consistently received a lower score across all parameters by professionals and laypeople. Readability was 'difficult' (median Flesch reading ease score of 34 [IQR 26-42]).
    CONCLUSIONS: ChatGPT provided largely accurate, relevant, and comprehensive answers to questions about cardiac arrest commonly asked by survivors, their relatives, and lay rescuers, except CPR-related answers that received the lowest scores. Large language model will play a significant role in the future and healthcare-related content generated should be monitored.
    Keywords:  ChatGPT; artificial intelligence; cardiopulmonary resuscitation; large language model; out-of-hospital cardiac arrest
    DOI:  https://doi.org/10.1016/j.resuscitation.2023.110077
  9. Cureus. 2023 Nov;15(11): e48518
      Objectives The aim of this study is to evaluate the accuracy and completeness of the answers given by Chat Generative Pre-trained Transformer (ChatGPT) (OpenAI OpCo, LLC, San Francisco, CA), to the most frequently asked questions on different topics in the field of periodontology. Methods The 10 most frequently asked questions by patients about seven different topics (periodontal diseases, peri-implant diseases, tooth sensitivity, gingival recessions, halitosis, dental implants, and periodontal surgery) in periodontology were created by ChatGPT. To obtain responses, a set of 70 questions was submitted to ChatGPT, with an allocation of 10 questions per subject. The responses that were documented were assessed using two distinct Likert scales by professionals specializing in the subject of periodontology. The accuracy of the responses was rated on a Likert scale ranging from one to six, while the completeness of the responses was rated on a scale ranging from one to three. Results The median accuracy score for all responses was six, while the completeness score was two. The mean scores for accuracy and completeness were 5.50 ± 0.23 and 2.34 ± 0.24, respectively. It was observed that ChatGPT's responses to the most frequently asked questions by patients for information purposes in periodontology were at least "nearly completely correct" in terms of accuracy and "adequate" in terms of completeness. There was a statistically significant difference between subjects in terms of accuracy and completeness (P<0.05). The highest and lowest accuracy scores were peri-implant diseases and gingival recession, respectively, while the highest and lowest completeness scores were gingival recession and dental implants, respectively. Conclusions The utilization of large language models has become increasingly prevalent, extending its applicability to patients within the healthcare domain. While ChatGPT may not offer absolute precision and comprehensive results without expert supervision, it is apparent that those within the field of periodontology can utilize it as an informational resource, albeit acknowledging the potential for inaccuracies.
    Keywords:  artificial intelligence in dentistry; chat generative pre-trained transformer; chatgpt; dental care; large language models (llms); oral medicine and periodontology; patient information
    DOI:  https://doi.org/10.7759/cureus.48518
  10. J Med Internet Res. 2023 Dec 14. 25 e49771
      BACKGROUND: The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has necessitated reliable and authoritative information for public guidance. The World Health Organization (WHO) has been a primary source of such information, disseminating it through a question and answer format on its official website. Concurrently, ChatGPT 3.5 and 4.0, a deep learning-based natural language generation system, has shown potential in generating diverse text types based on user input.OBJECTIVE: This study evaluates the accuracy of COVID-19 information generated by ChatGPT 3.5 and 4.0, assessing its potential as a supplementary public information source during the pandemic.
    METHODS: We extracted 487 COVID-19-related questions from the WHO's official website and used ChatGPT 3.5 and 4.0 to generate corresponding answers. These generated answers were then compared against the official WHO responses for evaluation. Two clinical experts scored the generated answers on a scale of 0-5 across 4 dimensions-accuracy, comprehensiveness, relevance, and clarity-with higher scores indicating better performance in each dimension. The WHO responses served as the reference for this assessment. Additionally, we used the BERT (Bidirectional Encoder Representations from Transformers) model to generate similarity scores (0-1) between the generated and official answers, providing a dual validation mechanism.
    RESULTS: The mean (SD) scores for ChatGPT 3.5-generated answers were 3.47 (0.725) for accuracy, 3.89 (0.719) for comprehensiveness, 4.09 (0.787) for relevance, and 3.49 (0.809) for clarity. For ChatGPT 4.0, the mean (SD) scores were 4.15 (0.780), 4.47 (0.641), 4.56 (0.600), and 4.09 (0.698), respectively. All differences were statistically significant (P<.001), with ChatGPT 4.0 outperforming ChatGPT 3.5. The BERT model verification showed mean (SD) similarity scores of 0.83 (0.07) for ChatGPT 3.5 and 0.85 (0.07) for ChatGPT 4.0 compared with the official WHO answers.
    CONCLUSIONS: ChatGPT 3.5 and 4.0 can generate accurate and relevant COVID-19 information to a certain extent. However, compared with official WHO responses, gaps and deficiencies exist. Thus, users of ChatGPT 3.5 and 4.0 should also reference other reliable information sources to mitigate potential misinformation risks. Notably, ChatGPT 4.0 outperformed ChatGPT 3.5 across all evaluated dimensions, a finding corroborated by BERT model validation.
    Keywords:  AI; COVID-19; ChatGPT 3.5; ChatGPT 4.0; artificial intelligence; information retrieval; pandemic; public health
    DOI:  https://doi.org/10.2196/49771
  11. World Neurosurg X. 2024 Jan;21 100249
      •Most YouTube videos on awake craniotomy are of poor educational value.•Intraoperative musical performances by patients are the strongest driver of video popularity.•User engagement of awake craniotomy videos is not linked to their educational quality.•Patients must be aware of the high prevalence of misleading content on YouTube.•Patients may require guidance in choosing the best resources online.
    Keywords:  Awake brain surgery; Awake craniotomy; Internet; Neurosurgery; Patient education; Video; YouTube
    DOI:  https://doi.org/10.1016/j.wnsx.2023.100249
  12. Front Public Health. 2023 ;11 1266415
      Summary of background: Dementia is among the leading causes of death and disability worldwide, having a major impact not only on the affected person but also on all of society. The Internet is a popular and growing source of health-related information for patients, family members, carriers, and physicians. TikTok, one of the most popular social media platforms, is an important source for knowledge access and adoption. However, the quality of health information on TikTok has not been sufficiently studied.Objective: To evaluate the quality of the information provided in the most popular videos on dementia shared on TikTok.
    Study design: A cross-sectional study.
    Methods: The top 100 most popular videos on TikTok obtained by searching the hashtag "dementia" were included in the study and grouped based on their source and content. The popularity of the videos was estimated based on the numbers of likes, comments, and shares. The quality of health-related information was evaluated using the DISCERN score and the Global Quality Score (GQS).
    Results: Videos had a median duration of 33.29 s; the median number of likes was 635,100, with a total of 93,698,200 likes, 903,859 comments, and 5,310,912 shares. The source (uploader) of 65% of the videos was family members, while only 4% were uploaded by doctors. The content was lifestyle-related in 62% of the videos, while 12% of the videos were for fun. Videos had a median DISCERN score of 22.5 (IQR 20-27) and a median GQS of 2 (IQR 1-3). The videos uploaded by doctors had the highest quality scores and the lowest popularity.
    Conclusion: The most popular dementia videos on TikTok are mostly shared by family members and are of poor quality. Given the major public health issues associated with dementia, experts must provide appropriate and active assistance to patients in interpreting the information identified.
    Keywords:  TikTok; credibility; dementia; online health information; reliability; social media; video quality
    DOI:  https://doi.org/10.3389/fpubh.2023.1266415
  13. BMC Health Serv Res. 2023 Dec 11. 23(1): 1389
      BACKGROUNDS: Previous studies have indicated that users' health information-seeking behavior can serve as a reflection of current health issues within a community. This study aimed to investigate the online information-seeking behavior of Iranian web users on Google about Henoch-Schönlein purpura (HSP).METHODS: Google Trends (GTr) was utilized to collect big data from the internet searches conducted by Iranian web users. A focus group discussion was employed to identify users' selected keywords when searching for HSP. Additionally, keywords related to the disease's symptoms were selected based on recent clinical studies. All keywords were queried in GTr from January 1, 2012 to October 30, 2022. The outputs were saved in an Excel format and analyzed using SPSS.
    RESULTS: The highest and lowest search rates of HSP were recorded in winter and summer, respectively. There was a significant positive correlation between HSP search rates and the terms "joint pain" (P = 0.007), "vomiting" (P = 0.032), "hands and feet swelling" (P = 0.041) and "seizure" (P < 0.001).
    CONCLUSION: The findings were in accordance with clinical facts about HSP, such as its seasonal pattern and accompanying symptoms. It appears that the information-seeking behavior of Iranian users regarding HSP can provide valuable insights into the outbreak of this disease in Iran.
    Keywords:  Google; Health information; Henoch–Schönlein purpura; Infodemiology; Information-seeking behavior; Web users
    DOI:  https://doi.org/10.1186/s12913-023-10357-2
  14. Front Psychol. 2023 ;14 1255604
      Objective: The rise of online platforms like Douyin, Baidu, and other Chinese search engines has changed how gynecologic oncology patients seek information about their diagnosis or condition. This study aimed to investigate the factors associated with information seeking among these patients and to evaluate their predictive performance.Methods: A cross-sectional study was conducted among 199 gynecologic oncology patients at a single hospital in China. The patients' demographic characteristics and scores on the State-Trait Anxiety Inventory (STAI-S and STAI-T) and the Hospital Anxiety and Depression Scale (HADS-A and HADS-D) were compared between those who sought information online and those who did not. Logistic regression analyses and receiver operating characteristic (ROC) curve analyses were performed.
    Results: The patients' age, marital status, STAI-S scores, and HADS-A scores were significantly associated with online information seeking. The combined model that included these factors showed good predictive performance with an area under the ROC curve of 0.841.
    Conclusion: The combination of demographic and psychological factors can be used to predict the likelihood of gynecologic oncology patients seeking information online. These findings can help healthcare providers understand their patients' information-seeking behaviors and tailor their communication strategies accordingly.
    Keywords:  Baidu; Douyin; anxiety; depression; gynecologic oncology; information seeking; online platforms; prediction model
    DOI:  https://doi.org/10.3389/fpsyg.2023.1255604
  15. PeerJ Comput Sci. 2023 ;9 e1710
      Topic-based search systems retrieve items by contextualizing the information seeking process on a topic of interest to the user. A key issue in topic-based search of text resources is how to automatically generate multiple queries that reflect the topic of interest in such a way that precision, recall, and diversity are achieved. The problem of generating topic-based queries can be effectively addressed by Multi-Objective Evolutionary Algorithms, which have shown promising results. However, two common problems with such an approach are loss of diversity and low global recall when combining results from multiple queries. This work proposes a family of Multi-Objective Genetic Programming strategies based on objective functions that attempt to maximize precision and recall while minimizing the similarity among the retrieved results. To this end, we define three novel objective functions based on result set similarity and on the information theoretic notion of entropy. Extensive experiments allow us to conclude that while the proposed strategies significantly improve precision after a few generations, only some of them are able to maintain or improve global recall. A comparative analysis against previous strategies based on Multi-Objective Evolutionary Algorithms, indicates that the proposed approach is superior in terms of precision and global recall. Furthermore, when compared to query-term-selection methods based on existing state-of-the-art term-weighting schemes, the presented Multi-Objective Genetic Programming strategies demonstrate significantly higher levels of precision, recall, and F1-score, while maintaining competitive global recall. Finally, we identify the strengths and limitations of the strategies and conclude that the choice of objectives to be maximized or minimized should be guided by the application at hand.
    Keywords:  Automatic query formulation; Diversity maximization; Diversity preservation; Global recall; Information retrieval; Information-theoretic fitness functions; Learning complex queries; Multi-objective genetic programming; Topic-based search
    DOI:  https://doi.org/10.7717/peerj-cs.1710
  16. Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023 1-4
      The Medical Subject Headings (MeSH) is a comprehensive indexing vocabulary used to label millions of books and articles on PubMed. The MeSH annotation of a document consists of one or more descriptors, the main headings, and of qualifiers, subheadings specific to a descriptor. Currently, there are more than 34 million documents on PubMed, which are manually tagged with MeSH terms. In this paper, we describe a machine-learning procedure that, given a document and its MeSH descriptors, predicts the respective qualifiers. In our experiment, we restricted the dataset to documents with the Heart Transplantation descriptor and we only used the PubMed abstracts. We trained binary classifiers to predict qualifiers of this descriptor using logistic regression with a tfidf vectorizer and a fine-tuned DistilBERT model. We carried out a small-scale evaluation of our models with the Mortality qualifier on a test set consisting of 30 articles (15 positives and 15 negatives). This test set was then manually re-annotated by a cardiac surgeon, expert in thoracic transplantation. On this re-annotated test set, we obtained macroaveraged F1 scores of 0.81 for the logistic regression model and of 0.85 for the DistilBERT model. Both scores are higher than the macroaveraged F1 score of 0.76 from the initial PubMed manual annotation. Our procedure would be easily extensible to all the MeSH descriptors with sufficient training data and, we believe, would enable human annotators to complete the indexing work more easily.Clinical Relevance-Selecting relevant articles is important for clinicians and researchers, but also often a challenge, especially in complex subspecialties such as heart transplantation. In this study, a machine-learning model outperformed PubMed's manual annotation, which is promising for improved quality in information retrieval.
    DOI:  https://doi.org/10.1109/EMBC40787.2023.10340998