Eur Thyroid J. 2026 Mar 20. pii: ETJ-25-0385. [Epub ahead of print]
Introduction: Artificial Intelligence (AI) chatbots are increasingly used in medicine, but their reliability in scenarios with multiple management options is unclear. Indeterminate thyroid nodules and low- to low-intermediate-risk papillary thyroid carcinoma (PTC) represent such cases.
Methods: In a nationwide web-based survey, 201 members of the Hellenic Endocrine Society evaluated 12 clinical vignettes on indeterminate thyroid nodules and low- to low-intermediate-risk PTC. Their responses were compared with those generated by four conversational AI models (ChatGPT, Gemini, Copilot, DeepSeek) at two time points, 11 months apart. DeepSeek, assessed only at the second time point. Chatbot outputs were assessed for agreement with endocrinologists' predominant answers, concordance with the most guideline-consistent options (American and European Thyroid Association recommendations), temporal stability, and inter-model agreement.
Results: Alignment between chatbots and endocrinologists' predominant responses was limited, reaching at most 25% across scenarios. In contrast, concordance with the most guideline-consistent options was higher, up to 83% (10/12 scenarios) depending on the model and time point. Across 12 scenarios, ChatGPT, Gemini, and Copilot changed their responses in 4, 7, and 5 scenarios, respectively, with some updates moving closer to, and others further from, guideline-based answers. Inter-model agreement ranged from 33% to 67%, indicating substantial variability among chatbots.
Conclusion: AI chatbots show evolving but inconsistent performance in complex thyroid management scenarios. While guideline concordance can be relatively high, substantial variability across models, limited temporal reproducibility, and poor alignment with clinical practice highlight the need for ongoing longitudinal evaluation before safe integration into clinical decision-making.
Keywords: Artificial intelligence; Chatbots; Clinical decision-making; Papillary thyroid cancer; Survey; Thyroid nodules