In the vast expanse of chemical possibilities, the quest to discover new molecules with practical applications remains one of modern science’s most formidable challenges. Researchers from Universitat Rovira i Virgili (URV) have taken a groundbreaking step forward by creating an artificial intelligence (AI) system capable of generating millions of novel molecules, molecules that do not yet exist in the annals of scientific literature but are fully adherent to chemical rules. Published in Nature Machine Intelligence, this innovation promises to redefine how chemists explore the nearly infinite realm of molecular structures.
The newly developed AI model, dubbed CoCoGraph, adopts an approach analogous to state-of-the-art generative AI tools used for text or image synthesis, such as ChatGPT and DALL-E. However, instead of producing sentences or pictures, CoCoGraph fabricates molecular structures that look chemically plausible. Unlike typical AI models that may generate invalid or nonsensical chemical compounds, CoCoGraph ensures that every molecule it crafts respects fundamental chemical constraints, such as correct valency and bonding patterns.
Currently, CoCoGraph doesn’t design molecules on demand—that is, the system is not yet capable of tailoring molecules to exhibit particular desired properties like solubility, toxicity levels, or target-specific bioactivity. Rather, its primary function is to generate vast libraries of plausible molecular candidates from a given chemical formula. This foundational ability is a crucial first step, given the staggering number of molecular possibilities: estimates suggest there could be as many as 10^60 unique molecules, an astronomically large number compared to the tiny fraction cataloged by science today.
To accomplish its molecular generation feats, CoCoGraph employs a diffusion model—a machine learning architecture originally developed for image synthesis. The process involves systematically “disordering” known molecules by breaking and rearranging their atomic bonds, then training the AI to reverse this disruption, restoring chemically coherent structures. This method effectively teaches the model how to navigate the complex landscape of chemical space by understanding how to reconstruct valid molecular graphs.
Molecules differ fundamentally from images; they are discrete entities defined by specific atoms connected via bonds, rather than continuous pixel grids. This discrete nature introduces considerable mathematical complexity, making the application of diffusion-based generative models far more challenging. CoCoGraph overcomes this by embedding the essence of chemical valency rules directly into its generative process, ensuring that every output molecule adheres strictly to chemical logic.
One of CoCoGraph’s most significant advantages over previous models is its efficiency. By reducing the number of parameters required and optimizing computational resource use, it not only speeds up molecule generation but also requires less powerful hardware. This balance between accuracy and efficiency is crucial for practical applications, where researchers need rapid iteration cycles while maintaining chemical validity.
The research team meticulously evaluated CoCoGraph by comparing it against contemporary state-of-the-art molecular generation algorithms. They analyzed 36 physicochemical properties—ranging from solubility and molecular weight to structural complexity—across millions of generated molecules. Results revealed that CoCoGraph’s outputs are chemically more realistic for about two-thirds of these properties, outperforming other models in generating plausible chemical structures that align better with known molecular distributions.
To validate the plausibility of its generated molecules in a real-world setting, the researchers conducted a blind assessment involving 121 chemistry experts from the University. Each expert was presented with pairs of molecules: one genuine, documented molecule and one synthesized by CoCoGraph. Astonishingly, the participants confused the AI-generated molecules with actual ones approximately 40% of the time. This degree of indistinguishability marks a milestone for AI in chemical design, highlighting the model’s precision and realism.
While CoCoGraph currently masters universal molecular generation, its developers have initiated exploratory experiments moving toward functionally targeted design. For example, they successfully identified molecules with physicochemical properties akin to paracetamol within the vast libraries generated by the system. Furthermore, they trialed molecular “tweaking” techniques—incremental chemical modifications on existing molecules—toward synthesizing viable variants that retain desirable characteristics, a promising approach for drug optimization.
The researchers emphasize that their work represents the foundational phase of a much larger vision: the creation of AI systems that can ideate bespoke molecules tailored to precise specifications. According to lead scientists such as Roger Guimerà, the long-term goal is to enable chemists to input specific property requirements—such as non-toxicity, targeted solubility, or interaction with a biological receptor—and receive custom-designed molecules fulfilling those criteria. Achieving this would revolutionize pharmaceutical development, materials science, and chemical engineering.
Such transformative potential is underscored by the sheer enormity of chemical space—so vast that traditional experimental and computational methods struggle to explore it comprehensively. CoCoGraph’s AI-driven generative capabilities offer a scalable, efficient means to navigate this complexity, serving as an indispensable tool in accelerating molecular discovery and innovation across myriad domains.
In sum, the advent of CoCoGraph symbolizes a formidable convergence between machine learning and chemistry. By creating chemically valid, high-quality molecules at scale, this AI model ushers in a new era where synthetic molecular design may transcend human intuition and traditional trial-and-error, unlocking unprecedented opportunities in science and technology.
Subject of Research: Cells
Article Title: A collaborative constrained graph diffusion model for the generation of realistic synthetic molecules
Web References: 10.1038/s42256-026-01229-5
Image Credits: URV
Keywords: Algorithms

