Artificial intelligence has revolutionized numerous scientific domains by offering rapid and sophisticated solutions to complex problems. At the forefront of this revolution are researchers who design and implement the algorithms that enable AI to dissect and interpret vast amounts of biological data. At The University of Texas at Arlington (UTA), a team of data scientists is harnessing the power of AI to explore the intricate mechanisms that drive disease development, immune responses, and therapeutic efficacy. Their work is reshaping how biomedical researchers approach and understand cellular dynamics in health and disease.
Leading this pioneering initiative is Xinlei (Sherry) Wang, the Jenkins Garrett professor of statistics and data science in UTA’s Department of Mathematics. Recently, Dr. Wang secured a prestigious four-year federal grant totaling $1.28 million to advance her project titled “Statistical and Deep Generative Modeling for Enhanced CyTOF Data Interpretation and Discovery.” This grant reflects the critical importance and potential impact of her interdisciplinary research that melds advanced statistics, deep learning, and biomedical science.
At the core of Wang’s research lies CyTOF (Cytometry by Time-Of-Flight), a cutting-edge technology that simultaneously quantifies dozens of proteins across thousands of individual cells. CyTOF generates exorbitantly complex datasets, capturing heterogeneity at the single-cell level—a necessary granularity for unraveling biological phenomena that are invisible in bulk analyses. However, this data’s complexity also presents a daunting challenge: how to transform this high-dimensional data into clear, interpretable insights usable by biomedical scientists who may lack computational expertise.
To tackle this challenge, Wang’s team employs a Bayesian statistical framework—an approach grounded in probability theory that quantifies uncertainty and integrates prior knowledge with observed data. Bayesian methods are particularly suited for biological data because they provide transparent and interpretable models, allowing researchers to infer biologically meaningful parameters such as protein expression differences between diseased and healthy cells. Wang’s group is developing unified statistical models that mechanistically characterize CyTOF data generation processes, thereby illuminating the hidden patterns and relationships embedded within.
The integration of artificial intelligence within this Bayesian framework dramatically enhances scalability and speed. Traditional computational techniques for single-cell data analysis can take several days to process millions of cells, impeding rapid discovery. By combining AI with Bayesian statistics, Wang’s models yield rigorous and reliable results within seconds. This remarkable acceleration doesn’t sacrifice interpretability; instead, it synergizes with the statistical rigor to enable both hypothesis generation and testing in real-time.
Crucially, Wang’s approach synthesizes data from single-cell transcriptomics and CyTOF protein profiling. Single-cell transcriptomics catalogs gene expression at an unprecedented scale, providing complementary information to the protein-level data generated by CyTOF. The joint analysis empowers researchers to capture a more comprehensive molecular portrait of cellular states and transitions. This integrated data fusion is pivotal for decoding complex biological circuits underlying diseases such as cancer, autoimmune disorders, and infectious diseases.
The resulting AI-driven toolkit can analyze millions of cells simultaneously, each characterized by 40 to 100 protein markers or tens of thousands of gene expression values. It excels at identifying distinct cell subtypes and comparing their molecular signatures across healthy and pathological conditions. By distinguishing subtle cellular differences, these models open new avenues for precision medicine, enabling tailored therapeutic interventions and improved prognostic assessments.
Wang’s team has already garnered significant recognition for their innovative contributions. Kevin Wang, a recent doctoral graduate mentored by Dr. Wang, received the Best PhD Poster Award at the 2025 Conference of Texas Statisticians for presenting their preliminary findings. This accolade underscores the growing impact of their work within the statistical community and its potential to transform biomedical research.
Further emphasizing their commitment to translational impact, the group recently published a study in the journal Nature Communications introducing BIT (Bayesian Identification of Transcriptional Regulators from Epigenomics-Based Query Regions Sets). BIT enhances the precision of identifying gene regulatory mechanisms by leveraging epigenomic data, a testament to the team’s expertise in bridging statistical modeling with cutting-edge genomic technologies.
The collaborative nature of Wang’s research extends beyond UTA. Key members include Li Wang, an associate professor of mathematics; Yike Shen, an assistant professor of earth and environmental sciences; and researchers at UT Southwestern such as Yuqiu Yang and Andy Xiao. Together, they form a multidisciplinary consortium advancing the frontiers of AI-driven biomedical data interpretation.
A critical innovation Wang highlights is the creation of user-friendly, open-source software packages that encapsulate the team’s complex algorithms while remaining accessible to end users. This democratization of technology ensures that researchers without extensive computational backgrounds can harness powerful AI tools on standard laptops. Existing methods often falter when confronted with big biological datasets, but Wang’s framework integrates statistical rigor, uncertainty quantification, and scalability to overcome these limitations seamlessly.
Dr. Wang aptly observes that although AI is potent, it is often a “black box” where decision-making processes are obscured. By embedding AI within transparent Bayesian models, her research restores interpretability, enabling users to understand the biological significance of algorithmic outputs and fostering trust in AI-driven discoveries.
The implications of this work are profound. As biomedical datasets continue to expand exponentially in both size and complexity, sophisticated analytical frameworks capable of delivering fast, interpretable, and scalable insights will be indispensable. Wang’s research not only addresses this need but sets a benchmark for integrating statistical theory, machine learning, and biological domain knowledge into a cohesive, practical system.
As the University of Texas at Arlington celebrates its 130th anniversary in 2025, this project exemplifies the institution’s growing stature as a Carnegie R-1 research university and its commitment to producing innovative solutions that impact health and society. With over 42,700 students and a significant economic influence in the Dallas-Fort Worth metroplex, UTA continues to foster groundbreaking research that pushes the boundaries of knowledge and technology.
In conclusion, Xinlei (Sherry) Wang’s research embodies the transformative potential at the nexus of AI, statistics, and biomedical science. By harnessing Bayesian methodologies and deep generative models to interpret CyTOF and transcriptomic data, her team is unveiling the cellular mysteries fundamental to disease processes. Their work offers a pathway to accelerated, interpretable, and large-scale biological discovery, with far-reaching implications for diagnosis, treatment, and prevention in medicine.
Subject of Research: Artificial intelligence application in Bayesian statistical modeling for single-cell CyTOF and transcriptomic data analysis.
Article Title: Statistical and Deep Generative Modeling for Enhanced CyTOF Data Interpretation and Discovery.
News Publication Date: Not explicitly stated (context suggests early 2025).
Web References: http://dx.doi.org/10.1038/s41467-025-60269-4
References:
Wang et al. (2025). “Bayesian Identification of Transcriptional Regulators from Epigenomics-Based Query Regions Sets” in Nature Communications.
Image Credits: UT Arlington
Keywords: Artificial intelligence, Bayesian statistics, CyTOF, single-cell analysis, deep generative modeling, transcriptomics, bioinformatics, statistical modeling, biomedical data interpretation, uncertainty quantification, scalable AI, cancer research.