In a groundbreaking advancement poised to revolutionize infectious disease prevention and pandemic preparedness, researchers at the Technical University of Denmark (DTU), in collaboration with global partners, have unveiled an innovative artificial intelligence tool named PathogenFinder2. This cutting-edge AI system distinguishes itself by its ability to predict the pathogenic potential of unknown bacteria through genome-wide analysis, marking a major leap beyond traditional microbiological methods reliant on known pathogen similarities.
PathogenFinder2 leverages the power of advanced protein language models—AI frameworks trained on millions of protein sequences—to interpret whole bacterial genomes. Unlike conventional techniques which often falter when encountering previously uncharacterized bacterial species, this novel approach decodes the “language” of proteins, enabling the identification of subtle biochemical patterns indicative of disease-causing potential even in unprecedented bacteria. This breakthrough offers a proactive strategy for identifying microbial threats well before they manifest as infections, thus empowering health authorities to preempt outbreaks rather than merely react to them.
The impetus for developing PathogenFinder2 stems from the critical challenge scientists face in distinguishing harmful bacteria amid a vast and expanding microbial universe. With climate change accelerating ecosystem shifts and researchers cataloging an ever-growing diversity of bacterial species—many entirely undocumented—traditional laboratory experiments prove too slow and costly. These methods also rely heavily on genetic similarity to known pathogens, limiting their ability to detect truly novel threats. PathogenFinder2 addresses this limitation by applying AI-driven protein language models to uncover pathogenic signatures invisible to earlier computational tools.
Beyond prediction, PathogenFinder2 excels in interpretability. The system not only assesses bacterial genomes for risk but also highlights specific proteins that most strongly influence its conclusions. These proteins often encompass classical virulence factors such as toxins and adhesion molecules but may also include previously uncharacterized proteins, hinting at novel mechanisms of bacterial pathogenicity. This capacity for detailed insight opens powerful new avenues for research in diagnostic development, vaccine design, and understanding bacterial infection strategies, profoundly enhancing our capacity to combat microbial threats.
Integral to the development of this tool was the assembly of an unprecedented dataset comprising more than 21,000 bacterial genomes. These genomes encompass a broad spectrum of bacterial life, including pathogens from human infections, members of the healthy human microbiome, probiotics employed in food production, and extremophiles thriving in inhospitable environments. This comprehensive training set provided PathogenFinder2 with a robust foundation to discriminate between non-pathogenic and pathogenic bacteria with high accuracy, fundamentally enhancing its generalization capabilities to novel species.
Remarkably, PathogenFinder2 also enabled the construction of the first-ever Bacterial Pathogenic Capacity Landscape—a multidimensional map illustrating the relationships among thousands of bacteria based on shared disease-associated features. This landscape reveals clusters of bacteria with similar tissue tropisms and metabolic strategies, offering unprecedented insight into microbial evolutionary pathways and interspecies interactions. Such a panoramic overview holds transformative potential for microbiology, epidemiology, and evolutionary biology.
The significance of PathogenFinder2 transcends academic breakthroughs; it directly addresses urgent global health needs. Its free accessibility as part of the Global Pathogen Analysis Platform (GPAP) ensures that researchers and public health officials worldwide can harness its predictive power. By analyzing samples from sewage systems, as well as from healthy humans and animals, it can flag emerging pathogens before they cause outbreaks, thus bolstering early-warning systems and informing timely development of diagnostic tests, vaccines, and therapeutic interventions.
This leap forward in microbial threat detection exemplifies the immense promise of marrying AI with genomics. As Frank Møller Aarestrup, head of the Genomic Epidemiology group at DTU, articulates, the ability to anticipate pathogenic risks posed by bacteria previously unknown to science represents a paradigm shift in infectious disease control. Instead of retroactively managing epidemics, we now possess tools to proactively monitor and mitigate emerging threats on a genomic scale.
Alfred Ferrer Florensa, the lead researcher who conducted his PhD work on PathogenFinder2, underscores the tool’s transformative scope: it redefines the detection of bacterial danger by using sophisticated machine learning that “listens” to protein sequences to discern pathogenic signals that are undetectable by traditional methods. This sophistication not only enhances prediction accuracy but also enriches our molecular understanding of virulence, catapulting pathogen surveillance into a new era.
The study detailing this work, entitled “Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models with PathogenFinder2,” was published in the prestigious journal Bioinformatics. It was supported by significant funding from the European Union’s Horizon 2020 program, the U.S. National Institute of Allergy and Infectious Diseases, and the Novo Nordisk Foundation, reflecting the high international priority of pandemic preparedness research.
Looking ahead, the implications of PathogenFinder2 stretch beyond pathogen prediction. By uncovering novel virulence-related proteins, the model inspires targeted experiments that may yield new therapeutic targets. Furthermore, its underlying AI methodologies promise to catalyze analogous breakthroughs in studying other microbes and infectious agents, widening the horizon of biological data interpretation and translational application.
This pioneering research redefines the microbial threat landscape. By embracing the complexity of bacterial genomes through sophisticated AI-driven protein language models, scientists can now anticipate and counteract emerging bacterial threats with unprecedented precision. PathogenFinder2 thus stands as a beacon of innovation, illuminating our path toward a safer, pandemic-resilient future.
Subject of Research: Prediction of bacterial pathogenic capacity using AI and protein language models
Article Title: Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models with PathogenFinder2
News Publication Date: 20-Mar-2026
Web References:
References:
Ferrer Florensa, A., Almagro Armenteros, J. J., Aarestrup, F. M., et al. (2026). Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models with PathogenFinder2. Bioinformatics. DOI: 10.1093/bioinformatics/btag129
Image Credits: Lene Hundborg Koss

