Machine learning lets scientists reverse-engineer cellular control networks
The flow of information between cells in our bodies is exceedingly complex: sensing, signaling, and influencing each other in a constant flow of microscopic engagements. These interactions are critical for life, and when they go awry can lead to the illness and injury.
Scientists have isolated thousands of individual cellular interactions, but to chart the network of reactions that leads cells to self-organize into organs or form melanomas has been an extreme challenge.
"We, as a community are drowning in quantitative data coming from functional experiments," says Michael Levin, professor of biology at Tufts University and director of the Allen Discovery Center there. "Extracting a deep understanding of what's going on in the system from the data in order to do something biomedically helpful is getting harder and harder."
Working with Maria Lobikin, a Ph.D. student in his lab, and Daniel Lobo, a former post-doc and now assistant professor of biology and computer science at the University of Maryland, Baltimore County (UMBC), Levin is using machine learning to uncover the cellular control networks that determine how organisms develop, and to design methods to disrupt them. The work paves the way for computationally-designed cancer treatments and regenerative medicine.
"In the end, the value of machine learning platforms is in whether they can get us to new capabilities, whether for regenerative medicine or other therapeutic approaches," Levin says.
Writing in Scientific Reports in January 2016, the team reported the results of a study where they created a tadpole with a form of mixed pigmentation never before seen in nature. The partial conversion of normal pigment cells to a melanoma-like phenotype — accomplished through a combination of two drugs and a messenger RNA — was predicted by their machine learning code and then verified in the lab.
Their work was facilitated by the Stampede supercomputer at the Texas Advanced Computing Center — one of the most powerful in the world — which enabled the team to run billions of simulations in order to model of the cellular network and the means of altering it.
Hacking the (cell) network
Tadpoles from the Xenopus genus of aquatic frogs possess a group of pigment cells that the Levin lab previously showed could be converted to a melanoma-like outcome by interrupting their electrical communication with other cell types.
Through years of experiments, they found that various treatments could induce conversions, but some treated animals would convert and some wouldn't.
"The outcome was probabilistic, like tossing a biased coin," Levin says. "But remarkably, all of the cells were tossing the same coin: a given animal would either convert or not, as a whole. Individual cells did not make independent decisions."
One of the most important tests of their artificial intelligence-derived model was to see if it could be used to discover a treatment that would break the normal concordance among cells, and induce a salt-and-pepper pattern in which individual cells within a single tadpole would choose to become melanoma-like or not.
They were not only able to produce this effect, but to predict the percentage of the population of tadpoles that would have the mixed pigmentation.
"I was blown away by the fact that the machine learning platform got us to a capability to do something we couldn't do before, at the bench, in real living organisms," Levin says. "It was good enough to predict new outcomes to experiments that no one had done before."
Mapping the model
The results expanded on previous research by the team that used machine learning to derive the cellular control model for Xenopus. To identify the model, the team input the results of nearly a decade's worth of laboratory experiments into Stampede, as well as the facts they had learned from these experiments and those of other labs working on these pathways.
The existing experiments showed a variety of ways that a drug or protein might affect a given process or cellular receptor, but not the full picture of how the complex system interrelated or how the signaling dynamics gave rise to specific frequencies of melanoma-converted animals from a given treatment applied to a population of animals.
Lobo developed a code that treated the drug and cellular interactions as nodes on a network and characterized how each component behaved as a differential equation. The code then randomly combined the various equations at each node as a chain of interactions and scored how close this network of interactions came to reproducing the lab experiments.
It dismissed the results that did not approximate the experimental outcomes, kept those that were closer, and then recombined the components.
Repeating this cycle many times, the combination of processes got better and better in a manner akin to evolution, until it arrived at a system capable of predicting laboratory results. This method, called evolutionary computation, has been used for decades in high-performance computing, but never before for the problem of cellular control networks.
"This approach uses a lot of computational power," Lobo says. "The model is not deterministic. So just as we apply a drug to 100 tadpoles, we have to simulate the model 100 times to get an accurate result. Even if the models are fast to compute, the machine learning algorithm needs to compute billions of simulations to precisely discover the correct equations explaining the data."
The team reported the results of this initial work in Science Signaling in October 2015.
With this model in hand, they began reverse-engineering drug interventions that might create a specific result: speckled tadpoles.
Performing 562 of the type of experiments they would typically do in the lab virtually on Stampede, the model predicted exactly one path to speckled pigmentation: the combination of three reagents — two drug inhibitors and one messenger RNA — that would break the all-or-none concordance.
Laboratory experiments confirmed this prediction, resulting in the partial conversion of pigment cells within individual tadpoles.
The model they derived has only been tested in amphibia so far, although the specific pathways targeted are conserved in humans. Moreover, the methodology for model discovery and interrogation will be applicable to a wide range of phenomena.
"This is a great step forward for the aspirational goal of computationally predicting complex phenotypes, and using the modeling predictions for improving health, for treating disease, and engineering useful living organisms," said Tom Skalak, Executive Director of The Paul G. Allen Frontiers Group.
Levin's lab is interested in applying this method to regenerative medicine and the ways that cells make decisions about how to form and repair complex anatomical structures. (Previous results by the team described machine learning efforts to reverse-engineer the planarian worm's ability to regenerate its entire body from fragments of a worm.)
"Beyond the current tools of bioinformatics, which handle genomic and protein data, we want to develop AI platforms to help us understand and control large scale patterning, the algorithms that define anatomical shape, not just the mechanisms guiding individual cell behaviors," Levin says.
Lobo's lab is applying the method to cancer research to determine what type of interventions might stop metastasis in its tracks without damaging other cells.
"Traditional approaches like chemotherapy attack the cells that grow the most, but leaves cells that are signaling others to grow and that may be the most important," Lobo says. "We're using machine learning to find out the communication networks between these cells and hopefully to discover a treatment that can cause the tumor to collapse."
The results of their tadpole study show how that machine learning can uncover hidden relationships in complex living systems and identify specific manipulations that can achieve a therapeutic outcome.
"The machine learning system contributed to the most creative thing that scientists do: it helped us find a model explaining what's going on in this complex system," Levin says. "In the future, as data continue to accumulate, computers are going to be an essential component of the scientific process, helping us make hypotheses and formulating predictive, quantitative models of how biological systems work."
The work was supported by an Allen Discovery Center award from the Paul G. Allen Frontiers Group (12171), NSF grant (EF-1124651), and The G. Harold and Leila Y. Mathers Charitable Foundation (TFU141). The computations used Stampede, which is supported by an NSF grant (1134872) and allocated through the Extreme Science and Engineering Discovery Environment (XSEDE), and a cluster computer awarded by Silicon Mechanics.