A groundbreaking initiative led by researchers at the Smithsonian Tropical Research Institute (STRI) is revolutionizing the way pollen identification is conducted. By digitizing an extensive collection of pollen images representing over 18,000 tropical plant species, scientists are harnessing machine learning to automate a painstakingly slow and expertise-heavy process. Traditionally, identifying pollen grains demanded hours of meticulous work under microscopes by experienced palynologists, but this new digital database and AI-powered system promises to dramatically accelerate and democratize pollen analysis across multiple scientific disciplines.
The core of this ambitious project is the digitization of the Smithsonian’s vast pollen collection, one of the largest in the world with more than 18,000 species primarily from tropical regions. The initiative, known as PollenGEO, has created a repository of over 40 million high-resolution photos of pollen grains drawn from meticulously curated specimens. Unlike conventional identification methods that rely heavily on subjective manual comparison to illustrated handbooks, this digital archive serves as a foundational dataset for training sophisticated AI models capable of recognizing subtle morphological differences among pollen species.
Pollen grains possess a remarkable durability that can preserve their structure for hundreds of millions of years, making them invaluable to a range of scientific fields including paleontology, ecology, and forensic science. Each species produces uniquely structured pollen, allowing precise identification when examined with accurate tools. Therefore, PollenGEO’s digitization not only facilitates contemporary botanical research but also unlocks new potential for studying ancient ecosystems and evolutionary processes through fossilized pollen.
Historically, the complexity and sheer number of species, especially in biodiverse tropical settings, made pollen identification a daunting and time-intensive task. Manual identification is particularly challenging in tropical regions where many species remain undescribed or extinct. Moreover, fossil pollen samples are often degraded or ambiguous, further complicating interpretation. The PollenGEO database addresses these issues by consolidating an unprecedented amount of verified palynological data into a single digital platform, enabling AI models to learn from extensive examples to improve accuracy and speed of identification.
Leading the digitization efforts is a team of over 30 specialists under the direction of staff palynologist Carlos Jaramillo. Their work extends beyond image scanning; detailed metadata for each pollen specimen has also been transcribed and digitized, assisted by approximately 100 volunteers from the Smithsonian Transcription Center. This comprehensive approach integrates imagery with identification data, habitat information, and collection context, creating a rich, multifaceted dataset essential for robust machine learning applications.
The main source of samples comes from the Graham Palynological Collection, donated to STRI in 2008, which includes more than 23,000 microscope slides and is widely regarded as one of the most significant tropical pollen archives. Supplementary collections augment the database, such as the Joan Nowicke collection, the Barro Colorado Island collection, the Amazonian samples collected by Paul Collinvaux, and fossil specimens from the Smithsonian’s National Museum of Natural History. Collectively, these collections provide a comprehensive representation of tropical pollen diversity, past and present.
From a technological standpoint, the project exemplifies interdisciplinary collaboration, integrating expertise from botany, paleontology, and computer science. Associate Professor Surangi Punyasena from the University of Illinois Urbana-Champaign has been instrumental in developing the AI environment necessary to process this voluminous data. Advanced image recognition algorithms are being refined to detect intricate patterns and structural features within pollen grains that are often invisible to the naked eye — features critical for distinguishing species with high morphological similarity.
The successful deployment of AI in pollen identification heralds a transformative shift in palynology, turning what was once a solitary microscopic endeavor into a scalable, digital, and universally accessible scientific process. This advancement has profound implications, facilitating rapid pollen diagnostics that can be applied to allergen detection and monitoring, forensic investigations where pollen traces link suspects or objects to specific geographic locations, and geochronology where pollen dating helps unravel the timelines of hydrocarbon deposits and ancient environmental changes.
The PollenGEO project also contributes to a larger multidisciplinary initiative, the Trans-Amazon Drilling project, which seeks to reconstruct the ecological and climatic history of the Amazon basin through analysis of sediment cores. By providing accurate and swift pollen identifications through AI-assisted tools, the project equips researchers with crucial data needed to understand ecological shifts spanning millennia, improving models of forest response to past climate fluctuations and projecting future trends.
The digitization effort has been supported through a broad coalition of funders, including the Smithsonian Institution, the Anders Foundation, philanthropy from Gregory D and Jennifer Walston Johnson, and the Smithsonian Women’s Committee, among others. Their investment underscores the value placed on creating open-access scientific resources capable of advancing global biodiversity knowledge and fostering interdisciplinary research.
Ultimately, the PollenGEO database and its AI-driven identification platform embody a new frontier in biological sciences, leveraging big data and computational power to unlock the full potential of palynology. By vastly increasing efficiency and accessibility, researchers hope this effort will stimulate innovations across diverse fields — from medicine to environmental science — highlighting how integrating technology with traditional disciplines can foster transformative scientific breakthroughs.
An informative webinar presented by Andrés Díaz further explores the technical details behind the massive digitization project, demonstrating the fusion of microscopy, data science, and machine learning that enables this leap forward in pollen research. As PollenGEO becomes publicly available online, it sets a precedent for other natural history collections to digitize and utilize AI tools, charting a course for future digital repositories that accelerate discovery and broaden participation in scientific inquiry.
The marriage of high-resolution imaging, exhaustive metadata, and cutting-edge artificial intelligence promises to redefine pollen identification as a digital science. This profound shift will reduce reliance on scarce human expertise, democratize access to palynological data, and unlock new avenues for understanding the world’s botanical diversity, past, present, and future.
Subject of Research: Digitization and machine learning-based identification of tropical pollen collections
Article Title: Digitizing Collections to Unlock the Full Potential of Palynology: A Case Study with the Smithsonian Palynology Collection
News Publication Date: Not explicitly stated; reference indicates publication year 2025
Web References:
- Smithsonian Tropical Research Institute: https://stri.si.edu/
- Smithsonian’s National Museum of Natural History: https://www.si.edu/museums/natural-history-museum
- Webinar on digitizing pollen images: https://stri.si.edu/story/microscopic-science
References:
Jaramillo, C., et al. 2025. Digitizing collections to unlock the full potential of palynology: A case study with the Smithsonian palynology collection. Plants, People, Planet.
Image Credits: Dominique Hämmerli and Carlos Jaramillo
Keywords:
Paleobiology, Pollen, Digital data, Digital recording, Paleontology