Thursday, May 7, 2026
Science
No Result
View All Result
  • Login
  • HOME
  • SCIENCE NEWS
  • CONTACT US
  • HOME
  • SCIENCE NEWS
  • CONTACT US
No Result
View All Result
Scienmag
No Result
View All Result
Home Science News Technology and Engineering

Study Reveals AI Language Models Encounter Challenges with Basic Hospital Data Tasks

May 7, 2026
in Technology and Engineering
Reading Time: 4 mins read
0
Study Reveals AI Language Models Encounter Challenges with Basic Hospital Data Tasks — Technology and Engineering

Study Reveals AI Language Models Encounter Challenges with Basic Hospital Data Tasks

65
SHARES
588
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT

A recent investigation into the practical capabilities of large language models (LLMs) reveals significant limitations in their use for routine administrative tasks within hospital environments. Conducted by Eyal Klang and colleagues at the Icahn School of Medicine at Mount Sinai in New York, the study critically evaluates the performance of state-of-the-art LLMs on essential number-crunching operations that healthcare administrators depend on daily. Published in PLOS Digital Health, these findings provide essential technical insights into the challenges facing AI implementation in clinical administrative workflows.

Hospitals nowadays rely heavily on electronic health records (EHRs) – structured datasets that capture patient information, resource availability, and care events. Administrators utilize this data to monitor patient loads, allocate resources, and generate operational reports. Traditionally, these tasks are performed by specialized data analysts deploying programming languages and database queries, a process often fraught with delays when rapid answers are needed for decision-making. The promise of LLMs like GPT-4o and Llama has been to democratize data access by allowing non-technical staff to query these datasets directly using natural language prompts.

In the study, researchers subjected nine leading LLMs to a rigorous battery of tests designed to emulate two foundational administrative functions: counting how many patients meet a specific clinical condition and filtering records based on multiple inclusion criteria simultaneously. The data itself was sourced from a substantial real-world dataset of over 50,000 emergency department visits within the Mount Sinai Health System, grounding the evaluation in practical, messy clinical data rather than synthetic or simplified examples.

The initial experiments employed straightforward prompting techniques, where models were simply asked direct questions such as “How many patients were admitted from this table?” Across the board, all tested LLMs demonstrated subpar accuracy, failing to provide reliable answers when handling these structured queries. This underlines a fundamental disconnect between LLM training—and their practical applicability to real-world numerical and logical operations in healthcare datasets.

To enhance performance, the researchers explored a chain-of-thought prompting approach. This method instructs the model to transparently reason through the problem step-by-step before arriving at the final answer, theoretically enabling more accurate and consistent outputs. However, the results were underwhelming; only modest improvements were observed on smaller tables, and as the size and complexity of the data increased, accuracy declined precipitously. For instance, even GPT-4o, the best performing model under this regime, saw accuracy plummet from approximately 95% on small datasets to below 60% when confronted with larger tables.

Recognizing that prompting alone may not suffice, the research shifted focus to a tool-based model execution approach. Here, LLMs were tasked with generating executable code, such as SQL or Python scripts, to process the data programmatically. This method leverages the LLM’s natural language understanding to translate queries into precise machine-readable commands, which are then run directly against the EHR data for guaranteed accuracy. Impressively, this approach substantially improved results for the most advanced models. GPT-4o and Qwen-2.5-72B demonstrated near-perfect accuracy under these conditions, successfully navigating the intricacies of complex filters and large datasets.

Despite these successes, not all models fared well. LLMs optimized for speed and efficiency, such as distilled variants of DeepSeek, struggled to produce usable outputs even when provided with the ability to generate and run code. Furthermore, the Llama-3.1-8B model encountered major difficulties, failing to produce functional results in the majority of assessments and being ultimately excluded from further analysis. These discrepancies highlight the diverse capabilities within the current LLM ecosystem and caution against broad assumptions regarding their utility in structured data environments.

The study’s findings carry critical implications for the future deployment of LLMs in healthcare administration. Benjamin Glicksberg, one of the authors, emphasized that without integrating tool-based strategies—combining LLM-generated code with actual execution—large language models remain fundamentally unsuitable for standalone use in clinical administrative settings. Clinical workflows frequently involve complex structured data requiring absolute reliability and precision, conditions under which straightforward natural language query processing by LLMs falls short.

Moreover, the requirement for “agentic” approaches is underscored by this work. Agentic AI involves systems that act semi-autonomously, leveraging external tools and code execution capabilities to ensure results remain consistent and verifiable. By integrating LLMs with backend code execution engines, hospitals could dramatically accelerate administrative processes while maintaining data integrity. Such hybrid solutions may bridge the gap between cutting-edge AI capabilities and the stringent accuracy demands of healthcare operations.

This study shines a spotlight on the often-overlooked challenges of applying AI in clinical data environments. While the hype around LLMs centers on their conversational fluency and general knowledge, the ability to perform precise numerical computations and filtered data retrieval within complex EHR systems requires a fundamentally different kind of model reliability. The researchers’ meticulous experimental design and real-world data usage offer a vital reality check for the healthcare sector’s ongoing AI ambitions.

Lastly, the authors note that their work did not receive any external funding, and no competing interests were declared. The open-access publication ensures that the full details, along with extensive methodological descriptions and results, remain available to researchers, clinicians, and AI developers aiming to advance safe and effective AI integration into hospital administration.

Overall, these findings caution healthcare providers and AI developers alike to calibrate expectations around LLMs’ current abilities in administrative contexts. They also highlight the powerful potential unlocked by hybrid human-AI systems that combine natural language understanding with robust programming and execution frameworks. As digital healthcare continues to evolve, researchers and practitioners will need to navigate these complex trade-offs to harness AI’s benefits without compromising accuracy and trustworthiness.

Web References:
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0001326


Subject of Research: Not applicable
Article Title: Large language models are poor clinical administrators: An evaluation of structured queries in real-world electronic health records
News Publication Date: 7-May-2026
References: Klang E, Sorin V, Korfiatis P, Sawant AS, Freeman R, Charney AW, et al. (2026) Large language models are poor clinical administrators: An evaluation of structured queries in real-world electronic health records. PLOS Digit Health 5(5): e0001326. DOI: 10.1371/journal.pdig.0001326

Keywords

Large Language Models, Electronic Health Records, Clinical Administration, Artificial Intelligence, GPT-4o, Tool-based AI, Chain-of-Thought Prompting, Healthcare Data Analytics, AI Reliability, Code Generation, Hospital Resource Management, Clinical Workflow Automation

Tags: administrative data tasks in hospitalsAI for patient load monitoringAI language models in healthcareAI performance on hospital resource allocationchallenges of AI in hospital administrationdemocratizing data access in healthcareEHR data querying by non-technical staffGPT-4o and Llama in medical data analysishealthcare operational reporting with AIlarge language models for electronic health recordslimitations of LLMs in clinical workflowsnatural language processing for healthcare data
Share26Tweet16
Previous Post

Gaps in Postpartum Diabetes Care Highlighted by Widespread Missed A1C Testing

Next Post

Study Finds Lithium May Reduce Impulsive Decisions Linked to Suicide Risk

Related Posts

Engineered Biochar: A Sustainable Solution for Capturing Carbon Dioxide — Technology and Engineering
Technology and Engineering

Engineered Biochar: A Sustainable Solution for Capturing Carbon Dioxide

May 7, 2026
Mapping Children’s Cancer Palliative Care Needs Holistically — Technology and Engineering
Technology and Engineering

Mapping Children’s Cancer Palliative Care Needs Holistically

May 7, 2026
High-Performance Optoelectronics via Thin-Film Perovskites — Technology and Engineering
Technology and Engineering

High-Performance Optoelectronics via Thin-Film Perovskites

May 7, 2026
Adipocytokine Imbalance Affects Bone Health in Hispanic Youth — Technology and Engineering
Technology and Engineering

Adipocytokine Imbalance Affects Bone Health in Hispanic Youth

May 7, 2026
Rice University Students Transform Classroom Project into Groundbreaking Aerospace Composites Discovery — Technology and Engineering
Technology and Engineering

Rice University Students Transform Classroom Project into Groundbreaking Aerospace Composites Discovery

May 7, 2026
Quantum Metallurgy: How Electron Crystals Bend and Melt — Technology and Engineering
Technology and Engineering

Quantum Metallurgy: How Electron Crystals Bend and Melt

May 7, 2026
Next Post
Study Finds Lithium May Reduce Impulsive Decisions Linked to Suicide Risk — Medicine

Study Finds Lithium May Reduce Impulsive Decisions Linked to Suicide Risk

  • Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    27640 shares
    Share 11052 Tweet 6908
  • University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

    1044 shares
    Share 418 Tweet 261
  • Bee body mass, pathogens and local climate influence heat tolerance

    678 shares
    Share 271 Tweet 170
  • Researchers record first-ever images and data of a shark experiencing a boat strike

    541 shares
    Share 216 Tweet 135
  • Groundbreaking Clinical Trial Reveals Lubiprostone Enhances Kidney Function

    527 shares
    Share 211 Tweet 132
Science

Embark on a thrilling journey of discovery with Scienmag.com—your ultimate source for cutting-edge breakthroughs. Immerse yourself in a world where curiosity knows no limits and tomorrow’s possibilities become today’s reality!

RECENT NEWS

  • Nationwide Study Aims to Enhance Sleep Quality in ICU Patients
  • Risky Choices Expose Biased Sampling, Sequence Effects
  • Rebecca T. Hahn, MD, to Receive TCT 2026 Master Operator Award
  • Study Finds Lithium May Reduce Impulsive Decisions Linked to Suicide Risk

Categories

  • Agriculture
  • Anthropology
  • Archaeology
  • Athmospheric
  • Biology
  • Biotechnology
  • Blog
  • Bussines
  • Cancer
  • Chemistry
  • Climate
  • Earth Science
  • Editorial Policy
  • Marine
  • Mathematics
  • Medicine
  • Pediatry
  • Policy
  • Psychology & Psychiatry
  • Science Education
  • Social Science
  • Space
  • Technology and Engineering

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 5,146 other subscribers

© 2025 Scienmag - Science Magazine

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • HOME
  • SCIENCE NEWS
  • CONTACT US

© 2025 Scienmag - Science Magazine

Discover more from Science

Subscribe now to keep reading and get access to the full archive.

Continue reading