In a recent study unveiled at the European Emergency Medicine Congress, researchers have presented compelling evidence that human clinicians outperform artificial intelligence (AI) systems when triaging patients in emergency departments (EDs). This comprehensive investigation, led by Dr. Renata Jukneviciene of Vilnius University, Lithuania, offers critical insights into the evolving role of AI tools like ChatGPT in acute medical settings, underscoring the indispensable nature of trained medical staff in patient prioritization. While AI demonstrated some promise in specific scenarios, it remains clear that its integration into clinical workflows should be cautious and complementary, rather than substitutive.
The impetus for this study was the mounting pressure on emergency departments worldwide, where overcrowding and workforce shortages increasingly jeopardize patient outcomes. Dr. Jukneviciene and her team sought to rigorously evaluate whether AI could ameliorate the triage bottleneck, streamline decision-making, and alleviate nurse workloads. By leveraging a set of real clinical cases curated from PubMed—a globally recognized biomedical literature database—the research aimed to compare human triage accuracy against machine analysis within the well-established Manchester Triage System framework. This five-level categorization stratifies patients from the most urgent cases demanding immediate intervention to those requiring the least urgent care.
The study’s methodology involved disseminating a digital and printed questionnaire featuring 110 randomly selected clinical scenarios to a cohort of emergency medicine professionals at Vilnius University Hospital Santaros Klinikos. Among these were six attending physicians and 51 nursing staff, with high response rates—100% from doctors and 86.3% from nurses—providing a robust dataset for analysis. Parallel to human evaluation, the same cases were subjected to AI triage using ChatGPT version 3.5, a general-purpose language model not specifically engineered for clinical decision support.
Results revealed a pronounced disparity favoring human clinicians across most performance metrics. Doctors achieved the highest accuracy, correctly classifying 70.6% of cases according to urgency, while nurses attained 65.5%. In contrast, AI managed an accuracy of 50.4%, indicating significant limitations in its ability to replicate nuanced human judgement. Sensitivity values, reflecting the capability to identify genuinely urgent cases, were also markedly higher among clinicians—83.0% for doctors and 73.8% for nurses—compared to AI’s modest 58.3%.
Intriguingly, AI outperformed nursing staff specifically in the most critical triage category, where immediate life-threatening conditions require prompt action. In this domain, AI’s accuracy was 27.3%, substantially exceeding the nurses’ 9.3%, while its specificity—correctly recognizing non-urgent cases—also surpassed nurses (27.8% versus 8.3%). This suggests that AI may exercise an appropriate degree of caution by flagging potential emergencies, potentially serving as a useful safety net. However, this tendency towards over-triage, evident across other categories, raises concerns about resource allocation and emergency department efficiency.
The distribution of cases triaged further illuminates AI’s differential approach. Whereas human clinicians allocated cases more evenly across urgency levels—with doctors assigning roughly 9% of cases to the highest urgency and 18% to the least urgent—AI disproportionately categorized nearly 29% as most urgent and underrepresented the lowest urgency group at 1%. Such skewed stratifications might burden emergency services with unnecessarily expedited interventions, counteracting the intended benefits of rapid triage.
Furthermore, when examining clinical scenarios involving therapeutic pathways, AI again displayed mixed competence. In surgical cases necessitating operative management, doctors demonstrated a reliability score of 68.4%, nurses 63%, while AI lagged behind at 39.5%. Conversely, for therapeutic interventions such as pharmacologic or supportive treatments, AI not only outperformed nurses—registering 51.9% reliability versus 44.5%—but also revealed potential as an adjunct support tool in treatment prioritization.
While these findings do not support replacing skilled clinicians with AI in high-stakes triage, they highlight a nuanced role for AI technologies. Dr. Jukneviciene emphasizes that AI’s strengths in identifying the most critical cases suggest it might be harnessed as a decision-support tool, particularly in overwhelmed emergency departments or to bolster less experienced personnel during peak demand. Nonetheless, unchecked reliance on AI risks excessive over-triage, necessitating stringent oversight and integration protocols that preserve human clinical judgement.
The study’s context merits consideration. Conducted at a single academic center with a modest sample size, and absent real-time clinical interaction such as vital signs evaluation or longitudinal follow-up, the research acknowledges inherent limitations. Additionally, ChatGPT 3.5’s general training corpus lacks specificity for medical applications, perhaps constraining its performance. The investigators plan to extend this work by testing newer, fine-tuned AI models in larger, multicenter cohorts and integrating physiological data streams such as electrocardiograms to elevate clinical relevance.
Several strengths bolster the study’s impact. Its use of authentic, peer-reviewed clinical cases and participation from a multidisciplinary clinical workforce enhances ecological validity. The mixed-mode questionnaire distribution increased accessibility, while the findings directly address contemporary challenges confronting emergency medicine—namely overcrowding and workforce shortages exacerbated by the global health crisis. Moreover, by demonstrating AI’s proclivity to over-triage, the research provides crucial evidence guiding safer AI deployment in acute care settings.
Commenting independently, Dr. Barbra Backus, chair of the EUSEM abstract selection committee and an emergency physician from Amsterdam, underscores AI’s potential while advocating prudence. “AI has already shown considerable utility in areas like radiographic interpretation,” she notes, “but its limitations in patient triage are now evident. It cannot supplant the clinical acumen of doctors and nurses, though it may expedite decision-making if judiciously integrated under expert supervision. Ongoing validation at each stage of AI advancement is essential.”
Complementing this investigation, at the same congress, assistant professor Rakesh Jalali from the University of Warmia and Mazury presented pioneering work on virtual reality (VR) for training clinical staff in managing polytrauma patients. Such technological innovations underscore the broader impetus to enhance emergency care through digital tools, balancing automation with human expertise.
In conclusion, this landmark study reaffirms the primacy of human clinical judgement in emergency triage while illuminating a collaborative pathway for integrating AI as an auxiliary resource. As emergency departments worldwide grapple with resource constraints and heightened patient flows, leveraging AI’s strengths in critically urgent case identification alongside clinician oversight offers a pragmatic model. Future advancements in AI algorithms, focused training data, and health system workflow integration will be pivotal to realizing safer, more efficient emergency care.
Subject of Research: People
Article Title: Patient triaging in the ED: can artificial intelligence become the gold standard?
News Publication Date: 30-Sep-2025
Image Credits: Dr. Renata Jukneviciene
Keywords: Artificial intelligence, Health care, Emergency medicine, Health care delivery, Emergency rooms, Nursing, Nursing assessment, Patient monitoring, Vital signs, Cardiology, Cardiovascular disorders, Human heart, Surgery, Orthopedics