Chatbots Make Pathology Reports Easier for Patients to Read, Study Shows

— Bard and GPT-4 simplified reports and interpreted findings correctly, but hallucinations occurred

by Kristina Fiore, Director of Enterprise & Investigative Reporting, 番茄社区app May 22, 2024

A photo of a senior woman sitting at a table during a conversation with a chatbot.

Artificial intelligence (AI) chatbots may be able to help interpret pathology reports for patients, but at this point they need clinician review before being delivered, researchers said.

Among 1,134 pathology reports, Google's Bard and OpenAI's GPT-4 respectively interpreted 87.57% and 97.44% of reports correctly, according to Eric Steimetz, MD, MPH, of SUNY Downstate Medical Center in Brooklyn, New York, and colleagues.

The chatbots were also able to bring the reports down in Flesch-Kincaid reading level, from an overall mean of 13.19 to 8.17 for Bard and 7.45 for GPT-4 (P<0.001 for both), they .

"Both chatbots did a pretty good job taking complex medical terms and making them significantly simpler," Steimetz told 番茄社区app in an email. "GPT-4 used some creative analogies to explain certain concepts."

However, both chatbots had hallucinations and made some incorrect interpretations, leading Steimetz to caution that the technology "hasn't fully matured yet." He warned that he "would not be comfortable with a patient relying on any chatbot to explain their pathology report without having a medical professional glance over it first."

"That being said," he added, "it is possible that within the next few years we will have better chatbots that could reliably explain pathology reports to patients."

Steimetz said that providers are legally required to post patients' test results to their patient portals, but these results -- especially pathology reports -- may not be easy for patients to understand.

"Patients often see their results before their physician does, and not knowing what the pathology or radiology report means is distressing, to say the least," he said. He noted that other teams have evaluated using chatbots to simplify hospital discharge summaries and instructions for patients, and that pathology reports would be another potential area to apply chatbots to simplify things for patients.

Steimetz and colleagues used 1,134 pathology reports from January 2018 through May 2023 from a single hospital in Brooklyn. Using sequential prompts, they asked the two chatbots to explain the reports in simple terms and identify key information.

Three pathologists rated the chatbot interpretations as medically correct, partially medically correct (having at least one erroneous statement, but not one that would drastically alter the medical management of the condition), or incorrect.

The researchers used an online tool, readable.com, to assess the readability metrics of both the original and simplified reports. "Of note, the grade level of the simplified reports is markedly lower than that of most online patient educational material, which often is written for those with an 11th grade educational level or higher," Steimetz said.

Both chatbots interpreted the majority of the reports correctly. Bard interpreted 8.99% partially correctly and 3.44% incorrectly, while GPT-4 interpreted 2.12% partially correctly and 0.44% incorrectly.

The most common error made by both chatbots was assuming that a resection specimen without lymph nodes (pNx) implied that lymph node status was negative (pNO), the researchers said.

Bard had 32 hallucinations (2.82%), while GPT-4 had three hallucinations (0.26%), they reported.

"One of the most frustrating hallucinations was when the chatbots began making inferences about the patient," Steimetz said. "For example, it would say that the patient is doing well. It might not be a consequential error since it makes no difference medically, but it does show that we can't blindly trust what they say."

The chatbots also made up explanations for unfamiliar terms, the researchers said.

Reviewers believed that responses from GPT-4 were better and more comprehensive, while those given by Bard were wordy and less helpful, they noted.

The study was limited because the pathology reports were sourced from one hospital, which may limit the findings' generalizability. Nonetheless, the researchers concluded that using chatbots "can potentially revolutionize how pathology reports are perceived and understood and allow patients to make informed decisions."

"I do hope that this study brings more attention to the lack of patient-readable material and the potential AI chatbots have to fill in this gap," Steimetz said. "Patient education is also an important topic that physicians don't always have time to prioritize. ... I really hope incorporating these kinds of solutions becomes the standard of care -- once the technology matures."

Kristina Fiore leads MedPage鈥檚 enterprise & investigative reporting team. She鈥檚 been a medical journalist for more than a decade and her work has been recognized by Barlett & Steele, AHCJ, SABEW, and others. Send story tips to k.fiore@medpagetoday.com.

Disclosures

Steimetz reported no conflicts of interest.

A co-author reported consulting for Paige and advising for PathPresenter Corporation.

Primary Source

JAMA Network Open

Steimetz E, et al "Use of artificial intelligence chatbots in preparation of pathology reports" JAMA Netw Open 2024; DOI: 10.1001/jamanetworkopen.2024.12767.