J Voice. 2025 Nov 21. pii: S0892-1997(25)00466-7. [Epub ahead of print]
OBJECTIVES/HYPOTHESIS: The objective of this study is to evaluate ChatGPT's responses in addressing common inquiries about voice disorders across two time slots.
METHODS: In this exploratory study, 30 frequently asked questions about voice disorders were gathered from a licensed clinical speech-language pathologist specialized in voice disorders and reputable online patient education sources. These questions were entered into the GPT‑4o mini at two different time slots (ie, November 2024 and April 2025), using a customized prompt that directed the model to act as a specialized voice-assistance chatbot, referred to as "VoiceHelp." The authors conducted independent evaluations of ChatGPT's responses, focusing on the accuracy, potential harm, and extent, alignment with medical consensus, empathy, as well as the overall quality. The readability of the responses was assessed with the Flesch Reading Ease Score (FRES), Gunning Fog Scale Level (GFSL), and Dale-Chall Score (D-CS), word count, sentence count, words per sentence, and characters per word.
RESULTS: Most generated responses (91.7%) were free from inaccurate or inappropriate content; 92.5% were rated as harmless, and 80% were consistent with the consensus. Although 38.3% of responses lacked empathy, the majority (92.5%) were scored between acceptable and very good in overall quality. The average scores for FRES, GFSL, D-CS, words count, sentence count, words per sentence, and characters per word at time slot 1 were 37.08, 15.76, 10.05, 117.53, 6.90, 17.48, and 5.47, respectively, indicating a high level of reading complexity for a general audience. The corresponding scores at time slot 2 were 45.09, 15.04, 9.26, 266.20, 13.6, 20.23, and 5.14, respectively.
CONCLUSIONS: ChatGPT consistently provided accurate and informative responses to common questions on voice disorders; however, the readability level of its responses was relatively low for the general public. This limitation appeared to be improved in the more recent version of the model. Further research is warranted before recommending ChatGPT as a reliable source of medical information for voice-disordered patients.
Keywords: ChatGPT; Dysphonia; Patient education; Voice; Voice disorders