Generative artificial intelligence can not yet reliably read and extract

August 19, 2024—It may someday be possible to use Large Language Models (LLM) to automatically read clinical notes in medical records and reliably and efficiently extract relevant information to support patient care or research. But recent research from Columbia University Mailman School of Public Health using ChatGPT-4 to read medical notes from Emergency Department admissions to determine whether injured scooter and bicycle riders were wearing a helmet finds that LLM can’t yet do this reliably. The findings are published in JAMA Network Open.

In a study of 54,569 emergency department visits among patients injured while riding a bicycle, scooter or other micromobility conveyance from 2019 to 2022, the AI LLM had difficulty replicating results of a text string–search based approach for extracting helmet status from clinical notes. The LLM only performed well when the prompt included all of the text used in the text string search-based approach. The LLM also had difficulty replicating its work across trials on each of five successive days, it did better t replicating its hallucinations than its accurate work. It particularly struggled when phrases were negated, such as reading “w/o helmet” or “unhelmeted” and reporting that the patient wore a helmet.

Large amounts of medically relevant data are included in electronic medical records in the form of written clinical notes, a type of unstructured data. Efficient ways to read and extract information from these notes would be extremely useful for research. Currently information from these clinical notes can be extracted using simple string-matching text search approaches or through more sophisticated artificial intelligence (AI)–based approaches such as natural language processing. The hope was that new LLM, such as ChatGPT-4, could extract information faster and more reliably.

“While we see potential efficiency gains in using the generative AI LLM for information extraction tasks, issues of reliability and hallucinations currently limit its utility,” said Andrew Rundle, DrPH, professor of Epidemiology at Columbia Mailman School and senior author. “When we used highly detailed prompts that included all of the text strings related to helmets, on some days ChatGPT-4 could extract accurate data from the clinical notes. But the time required to define and test all of the text that had to be included in the prompt and ChatGPT-4’s inability to replicate its work, day after day, indicates to us that ChatGPT-4 was not yet up to this task”.

Using publicly available 2019 to 2022 data from the U.S. Consumer Product Safety Commission’s National Electronic Injury Surveillance System, a sample of 96 U.S. hospitals, Rundle and colleagues analyzed emergency department records of patients injured in e-bike, bicycle, hoverboard, and powered scooter accidents. They compared the results of ChatGPT-4’s analyses of the records to data generated using more traditional text-string-based searches, and for 400 records, they compared ChatGPT’s analyses to their own reading of the clinical notes in the records.

This research builds on their work studying how to prevent injuries among micromobility users (i.e. bicyclists, e-bike riders, scooter riders). “Helmet use is a key factor in injury severity, yet in most emergency department medical records and incident reports information on helmet use is buried in the clinical notes written by the physician or EMS respondent. There is a significant research need to be able to reliably and efficiently access this information.” said Kathryn Burford, the lead author on the paper and a post-doctoral fellow in the Department of Epidemiology at the Mailman School.

“Our study examined the potential of an LLM for extracting information from clinical notes, a rich source of information for health professionals and researchers,” said Rundle. “But at the time we used ChatGPT-4 it could not reliably provide us with data”.

Co-authors are Nicole G. Itzkowitz, Columbia Mailman School of Public Health; Ashley G. Ortega, Columbia Population Research Center; and Julien O. Teitler, Columbia School of Social Work.

The study was supported by the National Institute of Health and Human Development, 5P2CHD058486; National Institute of Environmental Health Sciences, 5T32ES007322-21 and 5T32ES007322-22; and the Centers for Disease Control and Prevention–funded Columbia Center for Injury Science and Prevention, R49CE003094.

Columbia University Mailman School of Public Health

Founded in 1922, the Columbia University Mailman School of Public Health pursues an agenda of research, education, and service to address the critical and complex public health issues affecting New Yorkers, the nation and the world. The Columbia Mailman School is the fourth largest recipient of NIH grants among schools of public health. Its nearly 300 multi-disciplinary faculty members work in more than 100 countries around the world, addressing such issues as preventing infectious and chronic diseases, environmental health, maternal and child health, health policy, climate change and health, and public health preparedness. It is a leader in public health education with more than 1,300 graduate students from 55 nations pursuing a variety of master’s and doctoral degree programs. The Columbia Mailman School is also home to numerous world-renowned research centers, including ICAP and the Center for Infection and Immunity. For more information, please visit www.mailman.columbia.edu.

Journal

JAMA Network Open

DOI

10.1001/jamanetworkopen.2024.25981

Article Title

Use of Generative AI to Identify Helmet Status Among Patients With Micromobility-Related Injuries From Unstructured Clinical Notes

Generative artificial intelligence can not yet reliably read and extract information from clinical notes in medical records

New technology could help treat hearing loss more effectively

UCLA Health Jonsson Comprehensive Cancer Center earns national accreditation from the American College of Surgeons Commission on Cancer

Related Posts

How Sibling and Friend Playtime Enhances Safety for Children in Online Video Games

Ancient Human Relatives Moved Stones Long Distances to Make Tools 600,000 Years Earlier Than Previously Believed

Telework Choices Boost Employee Performance, Life Satisfaction

Long-Term Trends in Division III College Football Attendance

New Research Reveals Impact of Family Exclusion on Leadership and Workplace Performance

Revolutionizing English Teaching with BERT-LSTM Tools

UCLA Health Jonsson Comprehensive Cancer Center earns national accreditation from the American College of Surgeons Commission on Cancer

Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

Bee body mass, pathogens and local climate influence heat tolerance

Researchers record first-ever images and data of a shark experiencing a boat strike

Warm seawater speeding up melting of ‘Doomsday Glacier,’ scientists warn

RECENT NEWS

Categories

Subscribe to Blog via Email

Welcome Back!

Retrieve your password

Generative artificial intelligence can not yet reliably read and extract information from clinical notes in medical records

Journal

DOI

Article Title

New technology could help treat hearing loss more effectively

UCLA Health Jonsson Comprehensive Cancer Center earns national accreditation from the American College of Surgeons Commission on Cancer

Related Posts

RECENT NEWS

Categories

Subscribe to Blog via Email

Welcome Back!

Retrieve your password

Discover more from Science