Evaluation of ChatGPT-5.2 responses to frequently asked questions about benign bone tumors.

Batuhan Ayhan, Samet Batuhan Yoğurt, Zeliha Deniz Ayhan

OBJECTIVE: Patients increasingly seek health-related information through artificial intelligence (AI)-based chatbots. However, the reliability and clinical quality of chatbot-generated patient information remain uncertain. This study aimed to evaluate the quality and reliability of chatbot-generated responses to frequently asked patient questions regarding benign bone tumors using a structured assessment model. METHODS: This descriptive and methodological study comprised twenty patient-centered commonly asked questions formulated by three fellowship-trained orthopedic oncology specialists. The inquiries encompassed diagnosis, treatment, complications, follow-up, and lifestyle-related issues pertaining to prevalent benign bone tumors. The responses produced by ChatGPT-5.2 were assessed separately by three independent orthopedic oncology specialists who had no role in formulating the questions. The quality of the response was evaluated using the Quality Analysis of Medical Artificial Intelligence (QAMAI) methodology, encompassing accuracy, clarity, relevance, completeness, citation of sources and references, and utility. Each parameter was evaluated using a five-point Likert scale. The intraclass correlation coefficient (ICC) was employed to assess interobserver reliability. RESULTS: The greatest scores were observed in accuracy (mean score 4.27), but completeness (3.19) and the provision of sources and references (3.03) displayed somewhat lower values. The overall QAMAI score was 21.39 out of 30, reflecting good response quality consistent with the validated scoring range of 18-23 points. Interobserver agreement demonstrated good reliability for total QAMAI scores (ICC = 0.84; 95% CI: 0.74-0.91). The subdomain ICC values ranged from moderate to good agreement. CONCLUSION: Chatbot-generated responses provide accurate and useful preliminary information on benign bone tumors. However, shortcomings in completeness and reliance on evidence-based citations indicate that chatbot outputs should be employed under the oversight of a physician. AI chatbots can aid in patient education but cannot replace clinical decision-making processes.

Read on ELI