ITHACA, N.Y. – Conversational agents (CAs) such as Alexa and Siri are designed to answer questions, offer suggestions – and even display empathy. However, new research finds they do poorly compared to humans when interpreting and exploring a user’s experience.
ITHACA, N.Y. – Conversational agents (CAs) such as Alexa and Siri are designed to answer questions, offer suggestions – and even display empathy. However, new research finds they do poorly compared to humans when interpreting and exploring a user’s experience.
CAs are powered by large language models (LLMs) that ingest massive amounts of human-produced data, and thus can be prone to the same biases as the humans from which the information comes.
Researchers from Cornell University, Olin College and Stanford University tested this theory by prompting CAs to display empathy while conversing with or about 65 distinct human identities.
The team found that CAs make value judgments about certain identities – such as gay and Muslim – and can be encouraging of identities related to harmful ideologies, including Nazism.
“I think automated empathy could have tremendous impact and huge potential for positive things – for example, in education or the health care sector,” said lead author Andrea Cuadra, now a postdoctoral researcher at Stanford.
“It’s extremely unlikely that it (automated empathy) won’t happen,” she said, “so it’s important that as it’s happening, we have critical perspectives so that we can be more intentional about mitigating the potential harms.”
Cuadra will present “The Illusion of Empathy? Notes on Displays of Emotion in Human-Computer Interaction” at CHI ’24, the Association of Computing Machinery conference on Human Factors in Computing Systems, May 11-18 in Honolulu. Research co-authors at Cornell University included Nicola Dell, associate professor, Deborah Estrin, professor of computer science and Malte Jung, associate professor of information science.
Researchers found that, in general, LLMs received high marks for emotional reactions, but scored low for interpretations and explorations. In other words, LLMs are able to respond to a query based on their training but are unable to dig deeper.
Dell, Estrin and Jung said there were inspired to think about this work as Cuadra was studying the use of earlier-generation CAs by older adults.
“She witnessed intriguing uses of the technology for transactional purposes such as frailty health assessments, as well as for open-ended reminiscence experiences,” Estrin said. “Along the way, she observed clear instances of the tension between compelling and disturbing ‘empathy.’”
Funding for this research came from the National Science Foundation; a Cornell Tech Digital Life Initiative Doctoral Fellowship; a Stanford PRISM Baker Postdoctoral Fellowship; and the Stanford Institute for Human-Centered Artificial Intelligence.
For additional information, see this Cornell Chronicle story.
-30-
Article Title
The Illusion of Empathy? Notes on Displays of Emotion in Human-Computer Interaction