In a groundbreaking study that challenges conventional wisdom about the limits of artificial intelligence (AI), researchers from the University of Geneva (UNIGE) and the University of Bern (UniBE) have demonstrated that large language models (LLMs) exhibit impressive competence not only in understanding but also in generating emotionally intelligent behaviour. Published recently in Communications Psychology, their findings reveal that these AI systems, including the widely known ChatGPT, outperform average human scores on emotional intelligence (EI) tests and can create entirely new evaluative scenarios in mere moments—an achievement that could reshape the future of AI applications in fields traditionally dominated by human judgment.
The study centers on large language models, sophisticated AI frameworks designed to process, interpret, and generate human language using vast datasets and complex algorithms. These models, capable of answering intricate questions and navigating nuanced textual tasks, have primarily been seen as tools for informational retrieval, text synthesis, and problem-solving. However, the question posed by the UNIGE and UniBE team was whether these AI systems could extend their capabilities to the realm of emotional intelligence — a deeply human attribute involving the perception, understanding, and management of emotions both in oneself and others.
To tackle this question, the research team employed a set of five widely recognized emotional intelligence assessments commonly used in both psychological research and corporate environments. These tests are structured around emotionally charged scenarios that require decision-making reflective of emotional understanding and regulation. For example, one scenario describes a situation where a character named Michael is confronted with the fact that a colleague has stolen his idea and is being praised for it—a context calling for a sophisticated emotional and social response. The most emotionally intelligent option, verified by prior human consensus, was to “talk to his superior about the situation” rather than resorting to conflict, silence, or retaliatory theft.
The AI models tested included ChatGPT-4, ChatGPT-3.5 (referred to as ChatGPT-o1), Gemini 1.5 Flash, Copilot 365, Claude 3.5 Haiku, and DeepSeek V3—an array representing the cutting edge of generative AI technology. Upon administering the EI tests to these LLMs, the results were striking: the AIs achieved an average score of 82% correct answers, considerably higher than the human participants’ average of 56%. This disparity strongly suggests that large language models not only comprehend emotional contexts but can also emulate appropriate emotional responses with considerable accuracy.
Beyond merely solving tests, the study went further to explore whether these AI systems could create emotionally nuanced assessments themselves. In a subsequent phase, researchers tasked ChatGPT-4 with generating new emotional intelligence test scenarios from scratch. Remarkably, the scenarios produced by ChatGPT-4 matched the clarity, reliability, and realism of the original human-developed tests, which had taken years of iterative refinement to perfect. Over 400 human participants later took these AI-generated tests, affirming their validity and practical utility. Such rapid generation of new, credible emotional intelligence materials by AI underscores the potential for these systems to aid extensively in educational, psychological, and organizational environments.
This innovative experiment sheds light on several important facets of emotional intelligence as operationalized by AI. The ability of LLMs to grasp subtle emotional cues and recommend behaviours aligned with emotional competence suggests that these models possess a form of “emotional reasoning.” This goes beyond simple pattern recognition into the realm of understanding social norms, context, and the consequences of various emotional responses. The findings challenge the long-held notion that emotional intelligence is exclusively a human domain, mediated by empathy and lived experience.
While the implications are profound, the researchers caution against unregulated reliance on AI for emotionally sensitive roles. The study highlights the necessity of expert oversight and contextual awareness when deploying AI tools in coaching, education, or conflict resolution settings. AI can enhance human capacities, but nuanced judgment, ethical considerations, and cultural sensitivities still require human intervention to ensure appropriate and ethical use.
Technically, the models’ success can be attributed to the extensive training on diverse textual data, enabling pattern extraction from myriad social and emotional contexts embedded in language. By internalizing these patterns, LLMs form probabilistic representations of emotionally intelligent behaviour, allowing them to generalize effectively to novel scenarios, as demonstrated. The capacity to generate new evaluative instruments quickly stems from their generative design, which facilitates creative text production grounded in coherent emotional logic.
The research carried out by Katja Schlegel of UniBE and Marcello Mortillaro of UNIGE exemplifies interdisciplinary collaboration across psychology, affective science, and AI technology. Their methodology, combining rigorous psychological testing frameworks with cutting-edge AI benchmarking, provides a blueprint for future studies on the integration of emotional intelligence and artificial intelligence. This approach could accelerate AI development tailored not just to linguistic proficiency but also socio-emotional competence.
In conclusion, this study marks a pivotal moment in AI research, illustrating that large language models can be both proficient solvers and creators in the emotionally charged dimensions of human behaviour. As AI becomes increasingly embedded in personal and professional spheres, this expanded emotional toolkit within LLMs offers promising avenues for enhanced communication, empathy-driven interactions, and conflict management mediated by technology. The scientific community and industry alike are now tasked with responsibly harnessing these capabilities to amplify human potential without compromising ethical standards.
Subject of Research: Not applicable
Article Title: Large language models are proficient in solving and creating emotional intelligence tests
News Publication Date: 22-May-2025
Web References:
https://doi.org/10.1038/s44271-025-00258-x
Keywords: Artificial intelligence, Emotional intelligence, Large language models, ChatGPT, Emotional reasoning, Emotional intelligence tests, Generative AI, Human-AI collaboration, Affective computing, AI in education, Conflict management, Psychological assessment