Artificial intelligence (AI) is revolutionizing healthcare, promising unprecedented advancements in medical diagnostics, personalized treatment strategies, and patient care. Yet beneath this technological optimism lies a complex challenge: ensuring that the interaction between humans and AI-enabled medical devices is safe, reliable, and effective. A recent groundbreaking study led by Professor Stephen Gilbert and his interdisciplinary team at the Else Kröner Fresenius Center (EKFZ) for Digital Health, affiliated with the Dresden University of Technology, confronts this challenge head-on. Published in NEJM AI, the research provides a critical, systematic analysis of the risks emerging from human-AI interactions in clinical contexts, highlighting a dimension often overshadowed by technical algorithmic performance—namely, the human factors that dictate real-world outcomes.
AI-driven medical devices have rapidly proliferated across diverse clinical settings, offering substantial clinical benefits. From advanced radiology systems that enhance the early detection of cancers to clinical decision-support platforms that tailor therapies to individual patient profiles, AI is poised to transform modern medicine. However, these benefits hinge not only on the precision of the underlying algorithms but also on how healthcare professionals interpret, trust, and integrate AI insights into their workflow. The research team underscores that human factors—cognitive, behavioral, and organizational—are central to understanding why even technically sophisticated AI tools may falter or cause unintended harm in practice.
One of the core issues addressed is the opacity inherent to many AI systems. Unlike conventional medical devices with deterministic outputs, AI models, particularly those based on complex neural networks, function as “black boxes.” This can lead to frequent misunderstandings or misinterpretations of AI-generated data by clinicians, dramatically affecting clinical decision-making. Miscalibrated trust presents dual hazards: overreliance on AI may lead physicians to accept flawed recommendations uncritically, while skepticism might result in ignoring beneficial AI guidance, ultimately compromising patient care.
The phenomenon of automation bias emerges prominently in this context. Automation bias refers to the human propensity to defer to automated recommendations by default, sidelining critical independent judgment. This behavioral pitfall can cause healthcare providers to miss errors that could otherwise be caught through rigorous scrutiny. Equally concerning is the risk of deskilling, where prolonged reliance on AI assistance gradually diminishes clinicians’ expertise, threatening long-term competency and clinical intuition.
Moreover, technostress—a psychological strain associated with adapting to complex AI systems—can induce user fatigue and reduced vigilance, indirectly increasing the chance of errors. The study also introduces the concept of “indication creep,” where AI applications are employed beyond their originally intended clinical contexts without sufficient validation, raising ethical and safety concerns. System changes, software updates, and operation mode variations introduce additional failure points if human users are not adequately trained or informed, compounding these layered risks in dynamic clinical environments.
Recognizing these multifaceted challenges, the Dresden research group has developed a pioneering, practical framework specifically designed to address human factors risks in AI-enabled medical devices. Their approach integrates insights from usability engineering, human-computer interaction, and regulatory science, validated through expert consultation spanning clinicians, regulators, and human factors specialists. The resulting guide is not a disjointed set of recommendations but a holistic blueprint to enhance AI safety and efficacy from design through post-market surveillance.
Central to the framework is the imperative to explicitly delineate roles and responsibilities between human users and AI systems. Defining who the users are, clarifying their clinical environments, and specifying task allocations can substantially mitigate confusion and promote seamless integration. The guide advocates for presenting AI outputs in formats that are comprehensible and contextually relevant, avoiding cryptic or excessive technical detail that could impede clinical interpretation. Equally important is embedding AI tools into existing clinical workflows to support, rather than disrupt, everyday practice.
Training mechanisms tailored to the needs and skill levels of diverse user groups form another cornerstone of the recommendations. The guide stresses ongoing education as essential—not only prior to device deployment but continuously, adapting to system updates and evolving clinical contexts. Importantly, fallback options and safeguards should be established to support clinicians in cases of system failure or anomalies. This multi-layered safety net enhances resilience, enabling clinicians to maintain control even when AI systems falter.
Post-market monitoring emerges as a critical and proactive strategy in the framework. Continuous observation of how AI systems are utilized, potential instances of unintended misuse, and patterns of overreliance is crucial. Such real-world data enable timely interventions, iterative improvements, and transparent communication about system modifications. This approach addresses a significant gap in current regulatory schemes, which often emphasize pre-market technical validation but inadequately oversee real-world human-AI dynamics.
The researchers deliberately framed their recommendations in broad yet regulation-aligned language, aiming for adaptability across differing AI-enabled medical devices and clinical settings. This design ensures that the guidance remains relevant as technology and use cases evolve, providing regulators and manufacturers with an adaptable toolkit for risk mitigation. In their next scientific ventures, the team plans to apply and refine these guidelines in pilot projects, benchmarking them in concrete clinical implementations to maximize practical utility.
The implications of this work extend well beyond the immediate scope of medical AI. It signals a paradigm shift in the development and oversight of intelligent devices, placing human factors at the forefront of innovation. Embedding these considerations throughout the product lifecycle—from design and regulatory approval to clinical use and post-market evaluation—promises to reduce avoidable errors, safeguard patient safety, and foster sustainable innovation in digital health technologies.
This pioneering study reflects the collaborative strength of interdisciplinary science, uniting expertise from the TU Dresden’s EKFZ for Digital Health, the Chair of Industrial Design Engineering, and the Faculty of Business and Economics, alongside distinguished partners at the University of Oxford and Geneva University Hospital. Their combined effort underscores the complexity of embedding AI safely within healthcare—an endeavor that requires continuous dialogue between technology creators, users, and regulators.
As AI continues to weave itself into the fabric of medicine, the critical insights distilled by Professor Gilbert’s team remind us that technology is never neutral. Its impact depends fundamentally on human interaction and systemic integration. By rigorously analyzing and addressing human factors-related risks, this research charts a pathway toward not only smarter but safer AI-enabled healthcare, ensuring that these powerful tools fulfill their promise of improved outcomes without compromising patient safety or clinical autonomy.
Subject of Research: Not applicable
Article Title: Evaluation of Human Factors-Related Risks in AI-Enabled Medical Devices: A Practical Guide
News Publication Date: 26-Mar-2026
Web References: DOI 10.1056/AIpc2501297
Image Credits: EKFZ – Anja Stübner

