Thursday, May 28, 2026
Science
No Result
View All Result
  • Login
  • HOME
  • SCIENCE NEWS
  • CONTACT US
  • HOME
  • SCIENCE NEWS
  • CONTACT US
No Result
View All Result
Scienmag
No Result
View All Result
Home Science News Technology and Engineering

Study Reveals AI Language Models Encounter Challenges with Basic Hospital Data Tasks

May 7, 2026
in Technology and Engineering
Reading Time: 4 mins read
0
Study Reveals AI Language Models Encounter Challenges with Basic Hospital Data Tasks — Technology and Engineering

Study Reveals AI Language Models Encounter Challenges with Basic Hospital Data Tasks

67
SHARES
606
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT

A recent investigation into the practical capabilities of large language models (LLMs) reveals significant limitations in their use for routine administrative tasks within hospital environments. Conducted by Eyal Klang and colleagues at the Icahn School of Medicine at Mount Sinai in New York, the study critically evaluates the performance of state-of-the-art LLMs on essential number-crunching operations that healthcare administrators depend on daily. Published in PLOS Digital Health, these findings provide essential technical insights into the challenges facing AI implementation in clinical administrative workflows.

Hospitals nowadays rely heavily on electronic health records (EHRs) – structured datasets that capture patient information, resource availability, and care events. Administrators utilize this data to monitor patient loads, allocate resources, and generate operational reports. Traditionally, these tasks are performed by specialized data analysts deploying programming languages and database queries, a process often fraught with delays when rapid answers are needed for decision-making. The promise of LLMs like GPT-4o and Llama has been to democratize data access by allowing non-technical staff to query these datasets directly using natural language prompts.

In the study, researchers subjected nine leading LLMs to a rigorous battery of tests designed to emulate two foundational administrative functions: counting how many patients meet a specific clinical condition and filtering records based on multiple inclusion criteria simultaneously. The data itself was sourced from a substantial real-world dataset of over 50,000 emergency department visits within the Mount Sinai Health System, grounding the evaluation in practical, messy clinical data rather than synthetic or simplified examples.

The initial experiments employed straightforward prompting techniques, where models were simply asked direct questions such as “How many patients were admitted from this table?” Across the board, all tested LLMs demonstrated subpar accuracy, failing to provide reliable answers when handling these structured queries. This underlines a fundamental disconnect between LLM training—and their practical applicability to real-world numerical and logical operations in healthcare datasets.

To enhance performance, the researchers explored a chain-of-thought prompting approach. This method instructs the model to transparently reason through the problem step-by-step before arriving at the final answer, theoretically enabling more accurate and consistent outputs. However, the results were underwhelming; only modest improvements were observed on smaller tables, and as the size and complexity of the data increased, accuracy declined precipitously. For instance, even GPT-4o, the best performing model under this regime, saw accuracy plummet from approximately 95% on small datasets to below 60% when confronted with larger tables.

Recognizing that prompting alone may not suffice, the research shifted focus to a tool-based model execution approach. Here, LLMs were tasked with generating executable code, such as SQL or Python scripts, to process the data programmatically. This method leverages the LLM’s natural language understanding to translate queries into precise machine-readable commands, which are then run directly against the EHR data for guaranteed accuracy. Impressively, this approach substantially improved results for the most advanced models. GPT-4o and Qwen-2.5-72B demonstrated near-perfect accuracy under these conditions, successfully navigating the intricacies of complex filters and large datasets.

Despite these successes, not all models fared well. LLMs optimized for speed and efficiency, such as distilled variants of DeepSeek, struggled to produce usable outputs even when provided with the ability to generate and run code. Furthermore, the Llama-3.1-8B model encountered major difficulties, failing to produce functional results in the majority of assessments and being ultimately excluded from further analysis. These discrepancies highlight the diverse capabilities within the current LLM ecosystem and caution against broad assumptions regarding their utility in structured data environments.

The study’s findings carry critical implications for the future deployment of LLMs in healthcare administration. Benjamin Glicksberg, one of the authors, emphasized that without integrating tool-based strategies—combining LLM-generated code with actual execution—large language models remain fundamentally unsuitable for standalone use in clinical administrative settings. Clinical workflows frequently involve complex structured data requiring absolute reliability and precision, conditions under which straightforward natural language query processing by LLMs falls short.

Moreover, the requirement for “agentic” approaches is underscored by this work. Agentic AI involves systems that act semi-autonomously, leveraging external tools and code execution capabilities to ensure results remain consistent and verifiable. By integrating LLMs with backend code execution engines, hospitals could dramatically accelerate administrative processes while maintaining data integrity. Such hybrid solutions may bridge the gap between cutting-edge AI capabilities and the stringent accuracy demands of healthcare operations.

This study shines a spotlight on the often-overlooked challenges of applying AI in clinical data environments. While the hype around LLMs centers on their conversational fluency and general knowledge, the ability to perform precise numerical computations and filtered data retrieval within complex EHR systems requires a fundamentally different kind of model reliability. The researchers’ meticulous experimental design and real-world data usage offer a vital reality check for the healthcare sector’s ongoing AI ambitions.

Lastly, the authors note that their work did not receive any external funding, and no competing interests were declared. The open-access publication ensures that the full details, along with extensive methodological descriptions and results, remain available to researchers, clinicians, and AI developers aiming to advance safe and effective AI integration into hospital administration.

Overall, these findings caution healthcare providers and AI developers alike to calibrate expectations around LLMs’ current abilities in administrative contexts. They also highlight the powerful potential unlocked by hybrid human-AI systems that combine natural language understanding with robust programming and execution frameworks. As digital healthcare continues to evolve, researchers and practitioners will need to navigate these complex trade-offs to harness AI’s benefits without compromising accuracy and trustworthiness.

Web References:
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0001326


Subject of Research: Not applicable
Article Title: Large language models are poor clinical administrators: An evaluation of structured queries in real-world electronic health records
News Publication Date: 7-May-2026
References: Klang E, Sorin V, Korfiatis P, Sawant AS, Freeman R, Charney AW, et al. (2026) Large language models are poor clinical administrators: An evaluation of structured queries in real-world electronic health records. PLOS Digit Health 5(5): e0001326. DOI: 10.1371/journal.pdig.0001326

Keywords

Large Language Models, Electronic Health Records, Clinical Administration, Artificial Intelligence, GPT-4o, Tool-based AI, Chain-of-Thought Prompting, Healthcare Data Analytics, AI Reliability, Code Generation, Hospital Resource Management, Clinical Workflow Automation

Tags: administrative data tasks in hospitalsAI for patient load monitoringAI language models in healthcareAI performance on hospital resource allocationchallenges of AI in hospital administrationdemocratizing data access in healthcareEHR data querying by non-technical staffGPT-4o and Llama in medical data analysishealthcare operational reporting with AIlarge language models for electronic health recordslimitations of LLMs in clinical workflowsnatural language processing for healthcare data
Share27Tweet17
Previous Post

Gaps in Postpartum Diabetes Care Highlighted by Widespread Missed A1C Testing

Next Post

Study Finds Lithium May Reduce Impulsive Decisions Linked to Suicide Risk

Related Posts

Miniaturized Passive Vacuum System Powers Cold Atom Sensors — Technology and Engineering
Technology and Engineering

Miniaturized Passive Vacuum System Powers Cold Atom Sensors

May 28, 2026
Clinical Trial Advances Intuitive Assistive Robotics for Individuals with Paralysis — Technology and Engineering
Technology and Engineering

Clinical Trial Advances Intuitive Assistive Robotics for Individuals with Paralysis

May 28, 2026
Global Computing Giant Announces New Leadership in Landmark Election — Technology and Engineering
Technology and Engineering

Global Computing Giant Announces New Leadership in Landmark Election

May 28, 2026
Doctor GPT: AI Achieves Nearly 76% Accuracy in Answering Healthcare Queries — Technology and Engineering
Technology and Engineering

Doctor GPT: AI Achieves Nearly 76% Accuracy in Answering Healthcare Queries

May 28, 2026
Nanofiber-Based Multidrug Therapy Emerges as a Promising Approach for Glioblastoma — Technology and Engineering
Technology and Engineering

Nanofiber-Based Multidrug Therapy Emerges as a Promising Approach for Glioblastoma

May 28, 2026
Enhancing SiC-Based Heterostructures with Dual Rare-Earth Modification and Interface Engineering for Multi-Frequency Electromagnetic Wave Absorption — Technology and Engineering
Technology and Engineering

Enhancing SiC-Based Heterostructures with Dual Rare-Earth Modification and Interface Engineering for Multi-Frequency Electromagnetic Wave Absorption

May 28, 2026
Next Post
Study Finds Lithium May Reduce Impulsive Decisions Linked to Suicide Risk — Medicine

Study Finds Lithium May Reduce Impulsive Decisions Linked to Suicide Risk

  • Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    27650 shares
    Share 11056 Tweet 6910
  • University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

    1053 shares
    Share 421 Tweet 263
  • Bee body mass, pathogens and local climate influence heat tolerance

    680 shares
    Share 272 Tweet 170
  • Researchers record first-ever images and data of a shark experiencing a boat strike

    543 shares
    Share 217 Tweet 136
  • Groundbreaking Clinical Trial Reveals Lubiprostone Enhances Kidney Function

    529 shares
    Share 212 Tweet 132
Science

Embark on a thrilling journey of discovery with Scienmag.com—your ultimate source for cutting-edge breakthroughs. Immerse yourself in a world where curiosity knows no limits and tomorrow’s possibilities become today’s reality!

RECENT NEWS

  • Boosting Anandamide: New Psychiatric Treatment Strategy
  • Pain, Sleep, Depression Linked to Cognitive Decline
  • Deep Tree Roots Threaten Groundwater Under Clay Layers
  • Iron Levels Trigger Biphasic Glucose Breakdown in Marine Fungi

Categories

  • Agriculture
  • Anthropology
  • Archaeology
  • Athmospheric
  • Biology
  • Biotechnology
  • Blog
  • Bussines
  • Cancer
  • Chemistry
  • Climate
  • Earth Science
  • Editorial Policy
  • Marine
  • Mathematics
  • Medicine
  • Pediatry
  • Policy
  • Psychology & Psychiatry
  • Science Education
  • Social Science
  • Space
  • Technology and Engineering

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 5,146 other subscribers

© 2025 Scienmag - Science Magazine

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • HOME
  • SCIENCE NEWS
  • CONTACT US

© 2025 Scienmag - Science Magazine

Discover more from Science

Subscribe now to keep reading and get access to the full archive.

Continue reading