In an extraordinary leap towards the future of synthetic biology, researchers have unveiled Evo 2, a groundbreaking generative DNA model capable of designing entire genomes from diverse life forms with unprecedented accuracy. Unlike its predecessor, Evo 1, which was primarily focused on predictive modeling, Evo 2 demonstrates a versatile capacity to generate complete DNA sequences that mirror the complexity and functionality of natural genomes spanning archaea, prokaryotes, fungi, protists, plants, and animals. This advancement heralds a new era in genome-scale generation, opening avenues for synthetic biology, genetic engineering, and evolutionary research.
Evo 2 operates through unconstrained autoregressive generation, starting from partial genomic sequences and autonomously completing gene sequences with remarkable fidelity. Researchers tested Evo 2’s capability by prompting the model with 1,000 base pairs of upstream genomic context along with the initial 500 to 1,000 base pairs of a target gene. Their findings revealed that Evo 2 consistently outperforms Evo 1, achieving higher amino acid sequence recovery rates that improve with model scale. The 40-billion and 7-billion parameter versions of Evo 2 not only demonstrated superior gene completion but also maintained accuracy throughout sequences requiring long contextual understanding.
While Evo 2 excels broadly, its performance on viral genomes, particularly DNA from human viruses, remains suboptimal. This limitation was highlighted in tests showing essentially random sequence recovery in these cases, thereby naturally constraining the model’s capacity to generate human viral proteins accidentally or unconstrainedly. This specificity is notable since viruses pose unique sequence prediction challenges due to rapid mutation rates and diverse evolutionary pressures, offering an inherent safeguard in Evo 2’s design.
The scale of Evo 2’s generative prowess was dramatically displayed through its ability to replicate the entire human mitochondrial genome. The model generated over 250 unique 16-kilobase sequences prompted from human mitochondrial DNA fragments. These artificial mitochondrial genomes, when analyzed via the annotation toolkit MitoZ, showed the correct number and distribution of coding sequences (CDSs), transfer RNAs (tRNAs), and ribosomal RNAs (rRNAs), faithfully reproducing natural mitochondrial gene synteny and organization. There was notable sequence similarity with native mitochondrial genes, accompanied by appropriate codon usage patterns aligning closely with authentic human mitochondrial DNA.
Further structural validations employed AlphaFold to predict the 3D conformations of proteins generated by Evo 2 from these artificial mitochondrial sequences. Remarkably, many predicted proteins formed multimeric complexes structurally analogous to their natural counterparts, suggesting that Evo 2 can design protein-coding regions that potentially fold into biologically relevant structures. This structural fidelity is crucial for future applications aiming to produce functional biomolecules from computationally designed genomes.
Pushing the envelope on genomic scale, Evo 2 was tasked with generating prokaryotic genomes, focusing on Mycoplasma genitalium, a bacterial species with a minimal known genome of approximately 580 kilobases. By seeding the model with a 10.5-kilobase prompt from the reference genome, the researchers generated ten complete genome-length sequences. Annotations using Prodigal revealed that about 70% of the genes predicted in these synthetic genomes contained statistically significant Pfam domain hits, a substantial leap from Evo 1’s 18%. The distribution of gene lengths and predicted secondary protein structures closely mirrored those observed in natural M. genitalium proteins, suggesting that Evo 2 can faithfully reproduce minimalistic bacterial genome architecture.
In addition to prokaryotes, Evo 2’s genomic generation extended into eukaryotic complexity by generating sequences from Saccharomyces cerevisiae (baker’s yeast) chromosome III. Prompted with just 10.5 kilobases of native sequence, Evo 2 extrapolated to produce 330-kilobase sequences encompassing thousands of base pairs. These synthetic chromosomes included essential genetic elements such as tRNAs, promoters, and genes with authentic intronic structures, although feature densities like tRNA and gene counts were somewhat lower than in the native yeast genome. Despite this, the genes displayed length distributions akin to natural yeast proteins, with varying degrees of predicted structural similarity, underscoring Evo 2’s potential as a tool for complex eukaryotic genome reconstruction.
Evo 2’s ability to capture phylogenetic signals was also tested through tetranucleotide usage deviation (TUD) analyses, a metric commonly used to assess genomic relatedness. Synthetic sequences generated for S. cerevisiae demonstrated a correlation in TUD patterns with native genomes, an effect more pronounced in larger Evo 2 models. This phylogenetic fidelity suggests cataloging and replicating evolutionary constraints at the DNA sequence level, hinting at Evo 2’s use in studying genome evolution and species diversification computationally.
Despite these remarkable in silico successes, the researchers acknowledge critical limitations. Most notably, the computational metrics and annotations do not guarantee that Evo 2-generated genomes are functionally viable or capable of autonomous replication. Essential genomic elements, such as undetected regulatory sequences or critical but unannotated genes, may be missing. Realizing fully functional synthetic genomes will require extensive experimental validation and iterative refinement through sophisticated biotechnological platforms.
The evolutionary insights gleaned from Evo 2’s generative capabilities extend beyond mere recreation of known genomes. The model exhibits the capacity to diversify sequence compositions while maintaining structural and functional coherence, as evidenced by AlphaFold structural predictions showing protein variants with high structural similarity but diverse amino acid arrangements. Such findings raise exciting prospects for directed protein evolution and synthetic biology, where novel protein scaffolds with desired functions can be computationally designed.
Evo 2’s genomic generation spans the tree of life, establishing a comprehensive foundational platform for genome engineering. It holds vast potential in biotechnology, from constructing minimal synthetic cells and organelles to enabling precision gene therapies and the design of novel biomolecules. Furthermore, by harnessing Evo 2 for genome synthesis, researchers can systematically dissect the grammar of genomic information, unraveling hidden rules governing biological sequence function and evolution.
Importantly, Evo 2 also illustrates the power of scaling in artificial intelligence models applied to biological data. Larger parameter models consistently outperform smaller ones in sequence recovery and genome completeness, revealing the importance of computational capacity in capturing the intricate dependencies inherent in genetic material. This highlights a generalizable trend relevant across domains where large-scale AI methods intersect with complex biological datasets.
As this pioneering technology progresses towards experimental validation, the scientific community stands at the cusp of revolutionary advancements in genome design and synthetic life creation. Evo 2’s ability to generate organellar, prokaryotic, and eukaryotic genomes with nuanced complexity positions it as a harbinger of future integrative bioengineering approaches, merging computational prowess with molecular biology to reshape our understanding of life at its most fundamental level.
In sum, Evo 2 is not merely a predictive model but a transformative generative system that redefines the boundaries of biological sequence design. Its versatility across domains of life and congruence with natural genomic patterns underscore its potential as a universal genome engineering tool, inspiring ongoing research into the applications and implications of artificial genome synthesis.
Subject of Research: Genome modeling and design using generative AI across all domains of life.
Article Title: Genome modelling and design across all domains of life with Evo 2.
Article References:
Brixi, G., Durrant, M.G., Ku, J. et al. Genome modelling and design across all domains of life with Evo 2. Nature (2026). https://doi.org/10.1038/s41586-026-10176-5

