A recent comparative study conducted by the National Library of Medicine delved into the capabilities of ChatGPT™ (OpenAI®) and Bing AI™ (Microsoft®) in answering questions about kidney stone treatment, specifically evaluating their alignment with the American Urological Association (AUA) guidelines. The analysis assessed factors such as the appropriateness of responses, emphasis on consulting healthcare providers, inclusion of references, and adherence to established guidelines.
Methodology
Researchers developed 20 questions related to kidney stone evaluation and treatment, guided by the AUA Surgical Management of Stones guidelines. These questions were posed to both ChatGPT and Bing AI. Their responses were evaluated using the brief DISCERN tool, which measures the quality and reliability of health information, and their appropriateness was compared.
Key Findings
Response Clarity and Relevance
ChatGPT excelled in providing clear, relevant, and purpose-driven answers, significantly outperforming Bing AI on questions evaluating clarity, achievement of aims, and relevance (scores of 12.77 ± 1.71 vs. 10.17 ± 3.27, respectively; p < 0.01).
References and Source Quality
While ChatGPT consistently provided well-structured and relevant responses, it fell short in offering references. In contrast, Bing AI always included references, making its responses more verifiable. This led to Bing AI outperforming ChatGPT in questions assessing the quality of sources (scores of 10.8 vs. 4.28; p < 0.01).
Adherence to Guidelines
Both chatbots generally adhered to AUA guidelines, particularly for pre-operative testing. However, deviations were noted in specific treatment recommendations:
- Ureteral Stones in Adults: 30.5% of responses deviated from guidelines.
- Renal Stones in Adults: 52.5% of responses deviated.
- General Treatment Recommendations: 20.5% of responses deviated.
Implications
ChatGPT demonstrated superior performance in crafting clear and relevant responses, making it a strong option for guideline-aligned advice. However, Bing AI’s inclusion of references adds credibility and allows users to verify the source of information.
The study highlights the strengths and weaknesses of both ChatGPT and Bing AI in addressing kidney stone-related questions. While ChatGPT excels in clarity and relevance, Bing AI’s references bolster the quality of its information. Additional research is essential to explore how these AI tools can support clinicians and patients in urologic healthcare while ensuring guideline adherence.
Are you interested in how AI is changing healthcare? Subscribe to our newsletter, “PulsePoint,” for updates, insights, and trends on AI innovations in healthcare.