In an era dominated by digital communication and social media, the ability to extract meaningful health-related insights from unstructured online conversations represents a frontier in public health surveillance. A groundbreaking new artificial intelligence (AI) tool named “Waldo” has been developed to harness this potential, offering a powerful automated solution for detecting adverse events linked to consumer health products through social media data mining. Recently published in the open-access journal PLOS Digital Health, this innovative research by John Ayers from the University of California, San Diego, and his colleagues reveals how Waldo ushers a transformative approach to post-market health surveillance.
Post-market monitoring of consumer products, including medications, supplements, and cannabis-derived items, is crucial for ensuring safety after regulatory approval. Traditionally, adverse event (AE) detection relies heavily on voluntary reporting systems where physicians and manufacturers submit data to agencies such as the U.S. Food and Drug Administration. However, this passive system often suffers from underreporting and latency, leaving many potential harms undetected. The rise in consumer health products, especially those outside strict regulatory frameworks, has created an urgent need for advanced, proactive mechanisms that can automatically identify safety signals from vast and dynamic data sources.
Waldo addresses this gap by leveraging machine learning techniques to parse unstructured text—such as the natural language found in Reddit posts—to detect mentions of adverse effects related to consumer health products. The team’s development process involved training the system on a dataset of annotated social media text, teaching it to discern nuanced health information from everyday user experiences shared online. What distinguishes Waldo is its impressive precision and capability, which dramatically outperforms even sophisticated general-purpose AI models such as ChatGPT in the task of AE detection.
Quantitative evaluations demonstrate Waldo’s exceptional accuracy of 99.7% when benchmarked against human annotations of Reddit posts detailing adverse experiences with cannabis-derived products. Extending its application, the AI processed more than 437,000 Reddit posts, identifying nearly 29,000 instances of potential harm reports. Manual verification of a random subset validated 86% of these findings as genuine adverse events, underscoring the tool’s reliability and practical value. By automating this labor-intensive surveillance task, Waldo offers unprecedented scalability and responsiveness, vital for keeping pace with the constantly evolving landscape of consumer health usage.
Beyond cannabis-derived products, the developers envision Waldo’s adaptable framework being applied broadly to detect safety concerns linked to other health products that lack rigorous pre- and post-market regulatory oversight, such as dietary supplements and wellness items. This cross-domain versatility is grounded in Waldo’s robust natural language processing architecture, which can be fine-tuned for varying linguistic patterns, product categories, and health contexts. By democratizing access through open-source availability, Waldo empowers researchers, clinicians, and regulatory authorities worldwide to harness social media as a real-time source of patient-reported safety data.
From a technical perspective, Waldo builds upon the state-of-the-art machine learning model RoBERTa, an advanced transformer-based architecture renowned for its contextual understanding of language. The research team carefully curated and trained Waldo to distinguish mentions of adverse experiences with high sensitivity and specificity, outperforming generic chatbot approaches not specifically optimized for adverse event detection. This specialization enables Waldo to cut through the “noise” of social chatter and identify meaningful safety signals expressed in user narratives, a critical challenge given the complexity and ambiguity inherent in informal health discussions online.
The implications for public health surveillance are profound. Healthcare providers and regulatory agencies have historically grappled with delayed or incomplete reporting that handicaps timely interventions. Tools like Waldo offer a paradigm shift by tapping into the vast reservoir of user-generated content to capture early warnings of product-related harms that would otherwise remain invisible. This enhanced visibility can facilitate preemptive safety measures, inform clinical guidelines, and ultimately protect patient safety on a global scale.
Lead author Karan Desai emphasizes that online health discourse should not be dismissed as idle chatter but recognized as a valuable repository of real-world evidence. “Waldo shows that the health experiences people share online are not just noise, they’re valuable safety signals. By capturing these voices, we can surface real-world harms that are invisible to traditional reporting systems,” Desai explains. Such insights complement traditional pharmacovigilance methods, enriching our understanding of how products perform outside controlled clinical environments.
John Ayers remarks on the broader potential of digital health technologies, stating, “This project highlights how digital health tools can transform post-market surveillance. By making Waldo open-source, we’re ensuring that anyone, from regulators to clinicians, can use it to protect patients.” This commitment aligns with the increasing movement toward transparency, collaborative science, and rapid dissemination of tools that accelerate public health advances.
Second author Vijay Tiyyala notes the significance of leveraging specialized AI models for public health tasks: “From a technical perspective, we demonstrated that a carefully trained model like RoBERTa can outperform state-of-the-art chatbots for AE detection. Waldo’s accuracy was surprising and encouraging.” This accomplishment illustrates the critical importance of tailoring AI applications to specific domains rather than relying solely on generalized solutions, which may lack the precision needed for sensitive health-related analyses.
The research team hopes their open-source release of Waldo will catalyze innovation and facilitate community-driven enhancements. By opening the tool to academic, clinical, and regulatory stakeholders, they aspire to build an ecosystem of data-driven safety surveillance powered by artificial intelligence and communal knowledge. This participatory approach promises to accelerate the detection of potential hazards and foster smarter, safer consumer health ecosystems.
As social media platforms continue to be a popular forum for sharing personal health experiences, tools like Waldo exemplify the transformative potential of AI in converting massive streams of informal text into actionable intelligence. For the first time, public health surveillance can move beyond traditional passive reporting systems, embracing a future where emerging safety concerns are detected in near real-time, based on the lived experiences of everyday individuals. This convergence of digital technology, data science, and health research heralds a new chapter in protecting society from the risks of consumer health products.
With increasing global reliance on digital health solutions, Waldo sets a precedent for how machine learning can revolutionize post-market surveillance and consumer safety. As AI models continue to evolve and social media data proliferates, the integration of automated detection tools into public health infrastructures will become an indispensable component of modern healthcare regulation and patient advocacy. Waldo’s advent is a compelling demonstration that the voices of online communities can be powerful allies in the ongoing effort to safeguard health.
Subject of Research:
Not applicable
Article Title:
Waldo: Automated discovery of adverse events from unstructured self reports
News Publication Date:
September 30, 2025
Web References:
http://dx.doi.org/10.1371/journal.pdig.0001011
References:
Desai KS, Tiyyala VM, Tiyyala P, Yeola A, Gallegos-Rangel A, Montiel-Torres A, et al. (2025) Waldo: Automated discovery of adverse events from unstructured self reports. PLOS Digit Health 4(9): e0001011.
Image Credits:
Ralph Olazo, Unsplash (CC0)
Keywords:
Artificial intelligence, adverse event detection, social media surveillance, machine learning, RoBERTa, cannabis-derived products, post-market safety, pharmacovigilance, digital health, natural language processing, consumer health products, Reddit