Can AI Outshine Clinicians in Responding to Patient Messages?

Table of Contents

Artificial intelligence (AI) is reshaping how healthcare professionals interact with patients, even in something as personal as responding to messages within electronic health records (EHRs). A recent study published in JAMA Network Open, explored how artificial intelligence (AI) might assist in responding to patient messages in electronic health records (EHRs).

Researchers analyzed a massive dataset of over 3.7 million Patient Medical Advice Requests and carefully selected 59 patient messages for this investigation. The goal? To compare how well AI and human clinicians could answer these messages in terms of quality, empathy, and patient satisfaction.

What Did the Study Look At?

Two cutting-edge AI models were used: ChatGPT-4 (a version from OpenAI) and a specialized model called Stanford GPT, developed by Stanford Health Care and Stanford School of Medicine. These AI models were tasked with generating responses to the selected messages. To ensure fairness and consistency, the researchers also created structured guidelines to reduce the possibility of bias during the selection and evaluation process.

ChatGPT-4 (OpenAI, December 2023 version)
Stanford GPT (a proprietary model developed by Stanford Health Care and the Stanford School of Medicine, January 2024 version)

The AI responses were evaluated alongside the original clinician responses by six licensed healthcare professionals. These professionals rated each response for information quality and empathy using a 5-point scale, where 1 represented the poorest performance and 5 represented the best. In addition, satisfaction with the responses was assessed by 30 participants recruited through the Stanford Research Registry. These participants rated the AI and clinician responses independently, using another 5-point scale, with 1 being “extremely dissatisfied” and 5 being “extremely satisfied.”

To ensure the results were accurate and meaningful, the researchers accounted for various factors such as age, sex, race, and ethnicity. They also examined whether the length of responses influenced satisfaction. Using advanced statistical methods, they analyzed the data to understand how AI-generated and clinician-generated responses compared in terms of quality, empathy, and satisfaction.

Throughout the study, strict ethical guidelines were followed. The researchers used systematically de-identified patient messages to ensure no protected health information was included. Stanford University’s Institutional Review Board approved the study, ensuring it met rigorous ethical standards.

The results provide fascinating insights into the potential for AI to complement healthcare communication while highlighting the importance of balancing quality, empathy, and patient expectations.

Study Outcomes

The study included a comprehensive analysis of 2,118 evaluations of AI-generated responses for quality and 408 evaluations for satisfaction. One of the standout findings was that participants were generally more satisfied with AI responses than those written by clinicians. On average, AI responses received a satisfaction score of 3.96 (out of 5), compared to 3.05 for clinician responses. This difference was statistically significant, meaning it was unlikely to be due to chance.

Interestingly, the type of medical question influenced satisfaction levels. AI responses to cardiology-related questions scored the highest in satisfaction, with an average score of 4.09. However, when it came to information quality and empathy—important measures for patient communication—AI responses to endocrinology questions stood out as the best.

A notable difference between AI and clinician responses was their length. On average, clinician responses were much shorter, containing about 254 characters, while AI responses were significantly longer, averaging 1,470 characters. The study found that the length of a clinician’s response was linked to satisfaction; longer responses generally led to higher satisfaction scores, particularly for cardiovascular questions. However, this relationship between response length and satisfaction did not hold true for AI responses.

These findings suggest that while AI-generated responses are well-received, their success isn’t solely tied to response length. Instead, it might be their ability to deliver detailed, comprehensive answers. For clinicians, the study highlights an opportunity: providing slightly more detailed responses might improve patient satisfaction, offering a potential area where human and AI communication styles can complement each other.

This study is one of the first to explore how satisfied people are with AI-generated responses to patient questions in electronic health records (EHRs). The results showed that, on average, participants rated AI responses higher than those written by clinicians, both overall and within medical specialties. However, this satisfaction did not always align with how clinicians rated the quality and empathy of the responses. For instance, satisfaction was highest with AI responses to cardiology questions, while clinicians found the highest quality and empathy in AI responses to endocrinology questions.

An interesting discovery was that the length of a clinician’s response mattered when it came to satisfaction—longer responses generally made participants happier. On the other hand, the length of AI responses didn’t seem to impact satisfaction, suggesting that AI’s ability to provide detailed, comprehensive answers might play a role in how well they are received. These findings hint that extremely short responses from clinicians may leave patients feeling less satisfied, which could be an important area for improvement in patient-clinician communication.

It’s important to note that the study does have limitations. Satisfaction was measured by survey participants, not the original patients who submitted the medical questions. This means their opinions may not fully reflect the experiences of actual patients. Still, the survey provides valuable insights and serves as a close proxy for patient perspectives on AI-generated responses.

Future research will need to explore how satisfaction with AI responses varies in different settings, such as hospitals in other regions, and among more diverse groups of patients with different languages and cultural backgrounds. Larger studies across a broader range of medical specialties could also help refine these findings.

Ultimately, this study highlights the importance of involving patients in the design and implementation of AI tools in healthcare. By doing so, healthcare systems can ensure that these tools meet patients’ expectations and provide meaningful support in improving patient-clinician communication.

Why These Findings Matter

The study highlights several implications for how AI could enhance patient-clinician communication in the future:

Augmenting Communication
Patients appreciate detailed and thoughtful responses, which AI is well-suited to deliver efficiently. Clinicians could leverage AI to provide more comprehensive answers while focusing on complex cases that require their expertise.
Tailoring Responses for Satisfaction
Clinicians’ shorter responses were less satisfying to participants, suggesting that brevity might hinder patient perceptions. AI tools, with their ability to generate longer, thorough responses, may bridge this gap.
Involving Patients in AI Design
The study underscores the importance of including patients as key stakeholders in designing AI tools. Their feedback ensures these tools align with patient expectations and improve satisfaction without sacrificing quality or empathy.

Future Directions for Research

The study raises new questions and opportunities for exploring AI in healthcare:

Direct Patient Feedback
Future research should gather satisfaction ratings directly from patients who receive AI-generated responses, offering more accurate insights.
Broader Demographics
Including diverse populations, languages, and healthcare settings would provide a more comprehensive understanding of AI’s role in communication.
Best Practices for Integration
Developing guidelines for integrating AI into patient communication workflows will be essential to balance efficiency, empathy, and quality.

The Takeaway

The study marks an important milestone in understanding how AI could support clinicians in patient communication. AI responses demonstrated higher satisfaction levels, but the findings also emphasized the value of longer, more personalized clinician responses.

As healthcare evolves, the key challenge will be to combine the precision and efficiency of AI with the empathy and human touch that patients value in their interactions with clinicians. This balance could pave the way for more satisfying and effective patient communication in the years to come. The ultimate goal? To provide patients with the best of both worlds: the precision of technology and the compassion of a caregiver.

Are you interested in learning more about AI healthcare? Subscribe to our newsletter, “PulsePoint,” for updates, insights, and trends on AI innovations in healthcare.

Can AI Outshine Clinicians in Responding to Patient Messages?

What Did the Study Look At?