Software project aims to keep patient data protected in COVID-19 research
Computer scientist receives NSF grant to help develop tool
Credit: The University of Texas at Dallas
The COVID-19 pandemic has created an urgent need for sharing patient data to help scientists learn more about the virus and how to stop it from spreading. One key ethical issue, however, is how much information can health providers disclose to researchers without violating patient privacy?
Dr. Murat Kantarcioglu, professor of computer science in the Erik Jonsson School of Engineering and Computer Science at The University of Texas at Dallas, jointly with Vanderbilt University Medical Center, has received a $200,000 grant through the National Science Foundation’s Rapid Response Research (RAPID) program to create an open-source software tool to help policymakers and health care providers make those decisions.
“The issue is: What kind of details can we give to researchers while protecting a patient’s privacy?” Kantarcioglu said. “It’s possible that disclosing certain features about a patient’s medical history may make it easier to identify a person.”
Epidemiologists use patient data to create statistical models to predict the potential spread of disease and to determine what factors might make specific populations more at risk. Much of the data used for research comes in the form of aggregate statistics, which show the number of cases without any identifying information about individual patients. For coronavirus, however, person-level data is critical to understanding how various health factors might affect the virus’s spread and impact individuals.
Concerns over patient privacy have emerged during the COVID-19 pandemic as public health officials track the spread of the highly contagious virus and consider tech-enabled, contact-tracing systems. The UT Dallas project, which also involves Dr. Brad Malin, vice chair for research in biomedical informatics at Vanderbilt University Medical Center, focuses on the risks of an individual being identified when patient data is released for research purposes.
Kantarcioglu, who also studies privacy risks involved with genomic data in another project with Malin, said current tools to evaluate the risks of sharing patient data do not typically account for changes in a disease’s spread over time or location. That is, the usefulness of information about COVID-19 patients changes quickly and might differ from day to day. For example, the precise location of new cases in a hotspot might be most important for initial contact tracing, but such location data becomes less useful as the disease spreads.
The decision tool Kantarcioglu is developing could evaluate whether releasing data about patients’ locations or medical histories — such as smoking history or prescription drug use — increases the risk of identification.
In one possible outcome, certain data could not be shared publicly and can only be shared with researchers with restricted access, he said.
“We would like to give researchers as much data as possible for this kind of analysis,” Kantarcioglu said. “But we want to make sure that the risk of a person being identified is low.”
The NSF’s RAPID program supports nonmedical, nonclinical-care research related to modeling and understanding the spread of COVID-19, informing and educating about the science of virus transmission and prevention, and encouraging the development of processes and actions to address the global challenge.