Sunday, August 10, 2025
Science
No Result
View All Result
  • Login
  • HOME
  • SCIENCE NEWS
  • CONTACT US
  • HOME
  • SCIENCE NEWS
  • CONTACT US
No Result
View All Result
Scienmag
No Result
View All Result
Home Science News Technology and Engineering

Large Language Models and Clinical Errors: Humans and Machines

May 28, 2025
in Technology and Engineering
Reading Time: 4 mins read
0
66
SHARES
598
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT

In recent years, the integration of artificial intelligence (AI) into healthcare has promised unprecedented advancements in patient care, diagnostics, and clinical decision-making. Among AI technologies, large language models (LLMs) have emerged as powerful tools capable of interpreting and generating human-like text, potentially revolutionizing the way clinicians access and process medical information. However, a groundbreaking new study published in Pediatric Research raises critical questions about the reliability of these models in performing clinical calculations, emphasizing that errors are not limited to human practitioners but extend to the machines designed to assist them.

Large language models such as GPT-4 and its successors have demonstrated remarkable capabilities in understanding complex medical queries, synthesizing evidence-based recommendations, and providing instant explanations. These traits have naturally led to enthusiasm around their deployment in clinical settings, from administrative tasks to direct patient interaction. Nevertheless, the study led by Kilpatrick, Greenberg, Boyce, and colleagues meticulously dissects instances where LLMs falter—particularly in executing clinical calculations that require numerical precision and contextual judgment.

The core of the study underscores a subtle but significant vulnerability: LLMs, while adept at language-processing tasks, are fundamentally pattern recognition systems rather than arithmetic engines. Clinical calculations, such as dosage adjustments based on patient weight, renal function, or lab values, demand an exactness that often eludes language models, which rely on probabilistic token predictions rather than deterministic computations. The presence of errors, even marginal in appearance, can have cascading consequences in pediatric care, where dosage windows are narrow and the margin for mistake remarkably small.

ADVERTISEMENT

In a series of rigorously designed experiments, the research team tested multiple prominent LLMs on a battery of pediatric clinical calculation tasks. These ranged from estimating body surface area and calculating medication dosages to interpreting laboratory indices critical for therapeutic decision-making. The results were eye-opening. Errors occurred not only in simple arithmetic but also in applying clinical formulas correctly—such as the Schwartz equation for glomerular filtration rate estimation—highlighting the models’ inconsistency under pressure from clinical complexity.

Interestingly, the nature of these errors varied. Some stemmed from fundamental mathematical mistakes—adding or multiplying incorrectly—while others arose from misinterpretations of clinical context, such as confusing units or applying adult-centric formulas in pediatric scenarios. For practitioners trusting AI-based tools, these pitfalls are alarming. They underscore the fact that while AI can augment clinical workflows, it remains an imperfect assistant that requires vigilant oversight.

The researchers place their findings within the broader framework of human error in medicine, a well-documented source of adverse events in hospitals worldwide. Traditional clinical practice acknowledges that humans, despite experience and training, are prone to mistakes, especially under stress or fatigue. AI technologies were introduced partly to mitigate these risks. However, the study’s message is clear: machines are not exempt from error, and in some cases, their shortcomings can mirror or even exacerbate human fallibility.

One crucial implication is that reliance on LLMs without appropriate safeguards could be hazardous. For instance, clinicians using natural language interfaces for quick medication dosing recommendations might receive plausible but incorrect answers. The linguistic fluency of these models could inadvertently foster misplaced confidence, as coherent explanations may mask underlying inaccuracies in calculations. Hence, the study advocates for systematic validation and integration of AI outputs with human expertise rather than uncritical acceptance.

The technical architecture of LLMs contributes to this dilemma. These models use vast datasets during training, encompassing a myriad of textual inputs, including some medical literature. But their architecture is not specifically tuned for numeric reasoning, leading to "hallucinations"—where the model generates syntactically correct but factually wrong information. While progress has been made in enhancing AI’s capabilities in math and logic, clinical calculations represent a particularly challenging category, combining precise numeracy with context-dependent decision rules.

Moreover, the study sheds light on the ethical and legal dimensions of incorporating AI in medicine. When an AI tool errs in clinical calculations that result in patient harm, determining accountability becomes complex. Is the fault with the software developers, the healthcare institution adopting the tool, or the clinician who deployed it? These questions are at the forefront of ongoing debates about AI governance in health systems and are exacerbated by studies like this one exposing real-world risks.

To address these challenges, Kilpatrick and colleagues suggest multiple pathways forward. First, embedding specialized numerical reasoning modules within LLM frameworks could improve accuracy in calculation-heavy tasks. Second, creating hybrid models that integrate deterministic algorithms for clinical formulas alongside generative language components may strike a better balance between linguistic sophistication and computational precision. Finally, rigorous external validation standards and transparent reporting of AI limitations must become mandatory prerequisites before clinical deployment.

Importantly, the study also emphasizes the continued necessity of human expertise. Rather than viewing AI as a replacement for clinicians, the authors argue for a model of synergy—using AI to augment human reasoning but reinforcing the clinician’s role as the ultimate arbiter of patient care decisions. This partnership can harness the speed and scalability of LLMs while hedging against their vulnerabilities through human judgment and experience.

The timing of this research is particularly relevant as health systems worldwide face increasing patient volumes and complex cases. AI offers alluring solutions for alleviating cognitive loads on healthcare workers, but this study serves as a timely reminder that technology is fallible. Careful integration and cautious skepticism should guide the ongoing adoption of AI tools in medicine to safeguard patient safety.

In conclusion, the findings presented by Kilpatrick, Greenberg, Boyce, and their team represent an important milestone in the evolving narrative of AI’s role in healthcare. Their meticulous assessment of large language models in pediatric clinical calculations reveals a nuanced picture: while AI can vastly enhance accessibility and efficiency, inherent limitations in numerical reasoning persist, necessitating caution and continuous improvement. As the landscape of medicine increasingly entwines with AI, balancing innovation with patient safety remains paramount.

As clinicians, researchers, and technologists collaborate to refine AI tools, the overarching lesson is clear—both humans and machines are fallible. Identifying where and why errors occur enables the design of safer systems that harness the best qualities of both. The future of medicine lies not in replacing human intellect but in complementing it with intelligent technologies that recognize and compensate for their own imperfections.


Subject of Research: The reliability and limitations of large language models in performing clinical calculations in pediatric medicine.

Article Title: Large language models and clinical calculations: to err is human and machines are not exempt.

Article References:

Kilpatrick, R., Greenberg, R.G., Boyce, D. et al. Large language models and clinical calculations: to err is human and machines are not exempt. Pediatr Res (2025). https://doi.org/10.1038/s41390-025-04166-y

Image Credits: AI Generated

Tags: AI and human error in healthcareAI-assisted clinical calculationschallenges of AI in patient careclinical decision-making with AIclinical errors in artificial intelligenceGPT-4 in clinical settingsinterpreting medical information with LLMslarge language models in healthcarelimitations of language models in medicinenumerical precision in clinical calculationspatient safety with AI technologyreliability of AI in medicine
Share26Tweet17
Previous Post

Discovering Diversity: First Comprehensive Assessment Reveals Over 100 Ribbon Worm Species in Oman, Mostly New to Science

Next Post

ADAMTS9 Cleaves TMEM67, Disrupting Wnt and Cilia

Related Posts

blank
Technology and Engineering

Enhancing Lithium Storage in Zn3Mo2O9 with Carbon Coating

August 10, 2025
blank
Technology and Engineering

Corticosterone and 17OH Progesterone in Preterm Infants

August 10, 2025
blank
Technology and Engineering

Bayesian Analysis Reveals Exercise Benefits Executive Function in ADHD

August 9, 2025
blank
Technology and Engineering

Emergency Transport’s Effect on Pediatric Cardiac Arrest

August 9, 2025
blank
Technology and Engineering

Bioinformatics Uncovers Biomarkers for Childhood Lupus Nephritis

August 9, 2025
blank
Technology and Engineering

Cross-Vendor Diagnostic Imaging Revolutionized by Federated Learning

August 9, 2025
Next Post
blank

ADAMTS9 Cleaves TMEM67, Disrupting Wnt and Cilia

  • Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    27531 shares
    Share 11009 Tweet 6881
  • University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

    944 shares
    Share 378 Tweet 236
  • Bee body mass, pathogens and local climate influence heat tolerance

    641 shares
    Share 256 Tweet 160
  • Researchers record first-ever images and data of a shark experiencing a boat strike

    507 shares
    Share 203 Tweet 127
  • Warm seawater speeding up melting of ‘Doomsday Glacier,’ scientists warn

    310 shares
    Share 124 Tweet 78
Science

Embark on a thrilling journey of discovery with Scienmag.com—your ultimate source for cutting-edge breakthroughs. Immerse yourself in a world where curiosity knows no limits and tomorrow’s possibilities become today’s reality!

RECENT NEWS

  • Detecting Gravitational Waves: Ground and Space Interferometry
  • Charged Black Holes: Gravitational Power Unveiled.
  • Exploring the Cosmos: New Insights from Emerging Probes
  • Black Hole Maglev: Kaluza-Klein, Kerr/CFT Revealed

Categories

  • Agriculture
  • Anthropology
  • Archaeology
  • Athmospheric
  • Biology
  • Bussines
  • Cancer
  • Chemistry
  • Climate
  • Earth Science
  • Marine
  • Mathematics
  • Medicine
  • Pediatry
  • Policy
  • Psychology & Psychiatry
  • Science Education
  • Social Science
  • Space
  • Technology and Engineering

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 4,860 other subscribers

© 2025 Scienmag - Science Magazine

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • HOME
  • SCIENCE NEWS
  • CONTACT US

© 2025 Scienmag - Science Magazine

Discover more from Science

Subscribe now to keep reading and get access to the full archive.

Continue reading