In the rapidly evolving intersection of artificial intelligence and medical education, a recent study from the University of Arizona’s R. Ken Coit College of Pharmacy sheds critical light on the current capabilities and limitations of generative AI models in academic settings. The investigation specifically compared the performance of ChatGPT 3.5, a prominent large language model, against Doctor of Pharmacy (PharmD) students undertaking rigorous therapeutics examinations. This research presents a nuanced understanding of how such AI tools measure up to the high standards required for competent clinical decision-making and patient care.
Pharmacy education demands mastery not only of vast factual knowledge but also the application of this knowledge to intricate clinical scenarios. The researchers scrutinized ChatGPT’s performance across a battery of 210 questions derived from six exams in two core pharmacotherapeutics courses. These courses covered topics from common nonprescription medication disorders to advanced subjects such as cardiology, neurology, and critical care. By juxtaposing AI-derived responses with aggregate student outcomes, the study aimed to empirically quantify how well this AI could navigate the complexities of pharmacy exams.
The results compellingly demonstrated that ChatGPT, while proficient in fact recall, significantly underperformed relative to human students, especially on application and case-based questions. For instance, the AI model correctly answered 80% of factual recall questions but only 44% of application-based questions. This disparity underscores the AI’s struggle with nuanced clinical reasoning that demands judgment beyond rote memorization. Case-based questions, which simulate real-world patient situations requiring integrative problem-solving, were answered correctly only 45% of the time by ChatGPT, in contrast to a 74% success rate for non-case questions.
These findings expose important boundaries for AI in healthcare education. While generative models like ChatGPT excel at regurgitating information, they lag when confronting clinical uncertainty and the complexities of patient-centered care. Brian Erstad, PharmD, interim dean at the Coit College of Pharmacy, emphasized that the AI’s weaknesses aligned with areas demanding clinical judgment—ironically, the very challenges clinicians face daily. The inability of ChatGPT to reliably manage these questions highlights the critical need for human reasoning in pharmacotherapeutic decision-making and the cautious integration of AI tools in evaluative processes.
The study also addressed an emerging concern within academic medicine regarding the potential misuse of AI on examinations. Christopher Edwards, PharmD, an associate clinical professor involved in the research, pointed out that one impetus for the study was to assess how reliance on AI might affect student performance. The data suggest that students do not necessarily require AI assistance to succeed, promoting traditional rigorous study habits over dependency on evolving technology. This insight carries significant implications as educational institutions grapple with policy formation surrounding AI use on assessments.
Methodologically, the research team calculated composite performance scores to facilitate direct comparisons. ChatGPT’s mean composite score across six exams was 53, whereas pharmacy students averaged 82. This quantitative gap reinforces the narrative that current AI models are not yet substitutes for expert clinical acumen within pharmacy education frameworks. The systematic evaluation of AI on distinct question types establishes a foundation for future refinement of both AI capabilities and assessment strategies.
Beyond performance metrics, the investigation fuels ongoing academic debates over the role of AI in medical and health science training. Educators express concern that overreliance on AI-generated answers might hamper the cultivation of critical thinking skills, essential for navigating the ambiguity and variability inherent in clinical environments. The Coit College study contributes valuable empirical evidence to this discourse, emphasizing the complementary rather than substitutive nature of AI in professional education.
Furthermore, these results have broader implications for the design of pharmacy examinations and educational curricula. Understanding which types of questions challenge AI can inform question development, ensuring that assessments measure competencies beyond superficial knowledge recall. This strategic approach safeguards the integrity of professional assessments and aligns educational objectives with real-world clinical demands.
It is important to recognize that the evaluated AI model, ChatGPT 3.5, represents a current generation of technology. Both Erstad and Edwards acknowledge that ongoing advancements in AI could shift these dynamics in future iterations. Therefore, continuous monitoring and assessment of AI performance in educational contexts remain imperative. The trajectory of AI development necessitates adaptive educational policies and teaching practices that integrate AI as a tool while preserving essential human expertise.
This pioneering study opens pathways to leveraging AI thoughtfully within pharmacy education, highlighting both its prospective utility and limitations. Rather than replacing human judgment, AI may evolve into an adjunct resource facilitating learning and supporting clinical decisions under professional supervision. The balance between harnessing technology and cultivating foundational reasoning skills will shape the future landscape of healthcare education.
Ultimately, the University of Arizona team’s work represents an important milestone in understanding the interplay between cutting-edge AI and complex professional training. Their data-driven approach and comprehensive analysis provide stakeholders with a clearer picture of where AI excels and where it falls short, offering a roadmap for responsible integration of artificial intelligence into the realm of pharmacy education and practice.
Subject of Research: People
Article Title: Comparison of a generative large language model to pharmacy student performance on therapeutics examinations
News Publication Date: 11-Aug-2025
Web References:
Currents in Pharmacy Teaching and Learning
References:
Edwards, C., Erstad, B., Cornelison, B. (2025). Comparison of a generative large language model to pharmacy student performance on therapeutics examinations. Currents in Pharmacy Teaching and Learning. DOI: 10.1016/j.cptl.2025.102394
Image Credits:
Photo by Kris Hanning, U of A Office of Research and Partnerships
Keywords:
Artificial intelligence, Education technology, Graduate students, Doctoral students