NSF grant harnesses big data & AI to advance disease prevention
(Millbrook, NY) Through a new $2 million National Science Foundation grant, scientists at the Cary Institute of Ecosystem Studies, the University of Georgia, and North Carolina A&T State University are harnessing the power of machine learning to forecast outbreaks of zoonotic disease.
Each year, more than a billion people become sick from Ebola, Zika, SARS, and other pathogens acquired from wildlife, livestock, and other animals. Prevention relies on an ability to predict when and where pathogens are likely to make the leap from animals to people.
Barbara Han, a disease ecologist at the Cary Institute, is leading the five-year study. She explains, "We want to help shift society from a reactive to a proactive approach to managing zoonotic disease. Instead of responding to outbreaks, let's try to stop them from happening in the first place. Using big data as a potential surveillance tool is an exciting new step toward prevention."
Funding will enable the team to bring together information on pathogens, potential animal hosts, and environmental factors known to facilitate disease transmission, with the goal of developing innovative methods of mapping when and where the next major zoonotic disease outbreak might occur.
John Drake of the University of Georgia explains, "We are creating models which draw 'boundaries' around which species can host which pathogens, which pathogens can pass from animals to humans, and what combination of environmental factors facilitate spillover and human-to-human transmission. On the basis of these biological properties, we hope to pinpoint where new diseases will emerge in the future."
Phase one of the study involves building predictive statistical models that will help the researchers identify traits common among animals that carry disease, and pathogens and parasites that cross the species barrier. "We are looking at data that describe hosts, pathogens, and their environments, to determine which combinations of these features presage disease being realized on a global landscape," Han says.
Models are built using extensive data sets on the physical and life history traits of host species and known pathogens. Host-pathogen pairings are then linked to the geographical locations with suitable environmental conditions. Also considered are conditions surrounding documented disease outbreaks to determine what factors were at play when that disease broke out.
Suzanne O'Regan of North Carolina A&T State University explains, "By using data that is global in scale, we are seeking to reveal generalizable features of 'good' disease carriers. Over 50 life history features are being incorporated into models for most mammal groups." This includes data on animals' physical characteristics, metabolic and reproductive rates, range of diet, and timing of daily activity – whether the animal is primarily active during the day, at night, or at dawn and dusk.
On the pathogen side, the team is interested in: whether a pathogen is able to survive in a given host and environment, the mechanism by which the pathogen is transmitted between hosts, and whether it exhibits sustained transmission between people – as opposed to a single 'dead-end' transmission from animal host to human.
Environmental features broadly consider temperature, precipitation, seasonality, and biome. The study will also encompass country-specific socioecological factors such as GDP, public health infrastructure, and investment in research and healthcare – all of which bear important implications for how effectively a country can manage disease prevalence and respond to an outbreak.
The second subproject will investigate how diseases move dynamically within a system. Once the traits of hosts, pathogens, and their environments – and the relationships among them – are known, the team will incorporate these into mathematical models to reveal how disease dynamics might play out in animal populations over time. This approach accounts for traits such as lifespan and rate of reproduction, which directly impact how fast a pathogen can spread via a particular host.
Han explains, "The novelty of this work is in bringing biological realism via machine learning into a classic body of theory, leveraging large sets of biological data available to us. These tools merge data mining and machine learning with established methods of studying disease dynamics to help us think carefully about what's distinguishing animal groups from each other in terms of zoonotic disease, and eventually, for risk of human spillover and epidemics."
The team also plans to use the models and techniques developed in this project to respond to zoonotic disease outbreaks that might occur during the course of the study.
Barbara Han – Disease Ecologist – Cary Institute of Ecosystem Studies (Principal Investigator) John Drake – Distinguished Research Professor – University of Georgia (Co-Principal Investigator)
Suzanne O'Regan – Assistant Professor – North Carolina A&T State University (Co-Principal Investigator)
John Paul Schmidt – Assistant Research Scientist – University of Georgia (Senior Personnel)
The Cary Institute of Ecosystem Studies is one of the world's leading independent environmental research organizations. Areas of expertise include disease ecology, forest and freshwater health, climate change, urban ecology, and invasive species. Since 1983, Cary Institute scientists have produced the unbiased research needed to inform effective management and policy decisions.
Lori M Quillen