J Cancer Surviv. 2025 Nov 19.
Turkish Urooncology Association, Bladder Cancer Working Group
INTRODUCTION: Artificial intelligence (AI) is quickly transforming healthcare by improving patient and clinician access to and understanding of medical information. Generative AI models answer healthcare queries and provide tailored and quick responses. This research evaluates the readability and quality of bladder cancer (BC) patient information in 10 popular AI-enabled chatbots.
MATERIALS AND METHODS: We used the latest versions of ten popular chatbots: OpenAI's GPT-4o, Microsoft's Copilot Pro, Claude-3.5 Haiku, Sonar Large, Grok 2, Gemini Advanced 1.5 Pro, Mistral Large, Google Palm 2 (Google Bard), Meta's Llama 3.3, and Meta AI v2. Prompts were developed to provide texts about BC, non-muscle-invasive BC, muscle-invasive BC, and metastatic BC. The modified Ensuring Quality Information for Patients (mEQIP), the Quality Evaluating Scoring Tool (QUEST), and DISCERN were used to assess quality. The Average Reading Level Consensus (ARLC), Flesch Reading Ease (FKRE), and Flesch-Kincaid Grade Level (FKGL) were used to evaluate readability.
RESULTS: Ten chatbots exhibited statistically significant differences in mean mEQIP, DISCERN, and QUEST scores (p = 0.048, p = 0.025, and p = 0.021, respectively). Meta scored lowest on the average mEQIP, DISCERN, and QUEST, while Llama attained the highest. Statistically significant differences were also seen in the chatbots' average ARLC, FKGL, and FKRE scores (p = 0.002, p = 0.001, and p = 0.002, respectively), in which Google Palm produced texts that are easiest to read, and Llama is the most difficult chatbot to understand.
CONCLUSION: AI chatbots can produce information on BC that is of moderate quality and readability, while there is significant variability among platforms. Results should be evaluated with caution due to the single-query approach and the continuously advancing AI models. Clinicians can support safety in implementation by delivering structured feedback and incorporating content review stages into patient education processes. Continuous collaboration between healthcare practitioners and AI developers is crucial to maintain the accuracy, currency, and clarity of AI-generated content.
Keywords: Artificial intelligence; Bladder cancer; Chatbot; Claude; Copilot; GPT-4o; Gemini; Google Palm; Grok; Llama; Meta AI; Mistral; Sonar