Tuesday, August 5, 2025
Science
No Result
View All Result
  • Login
  • HOME
  • SCIENCE NEWS
  • CONTACT US
  • HOME
  • SCIENCE NEWS
  • CONTACT US
No Result
View All Result
Scienmag
No Result
View All Result
Home Science News Biology

Can language models read the genome? This one decoded mRNA to make better vaccines.

April 5, 2024
in Biology
Reading Time: 5 mins read
0
Mengdi Wang in her Princeton office
69
SHARES
630
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT
ADVERTISEMENT

The same class of artificial intelligence that made headlines coding software and passing the bar exam has learned to read a different kind of text — the genetic code.

Mengdi Wang in her Princeton office

Credit: Photo by Sameer A. Khan/Fotobuddy

The same class of artificial intelligence that made headlines coding software and passing the bar exam has learned to read a different kind of text — the genetic code.

That code contains instructions for all of life’s functions and follows rules not unlike those that govern human languages. Each sequence in a genome adheres to an intricate grammar and syntax, the structures that give rise to meaning. Just as changing a few words can radically alter the impact of a sentence, small variations in a biological sequence can make a huge difference in the forms that sequence encodes.

Now Princeton University researchers led by machine learning expert Mengdi Wang are using language models to home in on partial genome sequences and optimize those sequences to study biology and improve medicine. And they are already underway.

In a paper published April 5 in the journal Nature Machine Intelligence, the authors detail a language model that used its powers of semantic representation to design a more effective mRNA vaccine such as those used to protect against COVID-19.

Found in Translation

Scientists have a simple way to summarize the flow of genetic information. They call it the central dogma of biology. Information moves from DNA to RNA to proteins. Proteins create the structures and functions of living cells.

Messenger RNA, or mRNA, converts the information into proteins in that final step, called translation. But mRNA is interesting. Only part of it holds the code for the protein. The rest is not translated but controls vital aspects of the translation process.

Governing the efficiency of protein production is a key mechanism by which mRNA vaccines work. The researchers focused their language model there, on the untranslated region, to see how they could optimize efficiency and improve vaccines.

After training the model on a small variety of species, the researchers generated hundreds of new optimized sequences and validated those results through lab experiments. The best sequences outperformed several leading benchmarks for vaccine development, including a 33% increase in the overall efficiency of protein production.

Increasing protein production efficiency by even a small amount provides a major boost for emerging therapeutics, according to the researchers. Beyond COVID-19, mRNA vaccines promise to protect against many infectious diseases and cancers.

Wang, a professor of electrical and computer engineering and the principal investigator in this study, said the model’s success also pointed to a more fundamental possibility. Trained on mRNA from a handful of species, it was able to decode nucleotide sequences and reveal something new about gene regulation. Scientists believe gene regulation, one of life’s most basic functions, holds the key to unlocking the origins of disease and disorder. Language models like this one could provide a new way to probe.

Wang’s collaborators include researchers from the biotech firm RVAC Medicines as well as the Stanford University School of Medicine.

The Language of Disease

The new model differs in degree, not kind, from the large language models that power today’s AI chat bots. Instead of being trained on billions of pages of text from the internet, their model was trained on a few hundred thousand sequences. The model also was trained to incorporate additional knowledge about the production of proteins, including structural and energy-related information.

The research team used the trained model to create a library of 211 new sequences. Each was optimized for a desired function, primarily an increase in the efficiency of translation. Those proteins, like the spike protein targeted by COVID-19 vaccines, drive the immune response to infectious disease.

Previous studies have created language models to decode various biological sequences, including proteins and DNA, but this was the first language model to focus on the untranslated region of mRNA. In addition to a boost in overall efficiency, it was also able to predict how well a sequence would perform at a variety of related tasks.

Wang said the real challenge in creating this language model was in understanding the full context of the available data. Training a model requires not only the raw data with all its features but also the downstream consequences of those features. If a program is designed to filter spam from email, each email it trains on would be labeled “spam” or “not spam.” Along the way, the model develops semantic representations that allow it to determine what sequences of words indicate a “spam” label. Therein lies the meaning.

Wang said looking at one narrow dataset and developing a model around it was not enough to be useful for life scientists. She needed to do something new. Because this model was working at the leading edge of biological understanding, the data she found was all over the place.

“Part of my dataset comes from a study where there are measures for efficiency,” Wang said. “Another part of my dataset comes from another study [that] measured expression levels. We also collected unannotated data from multiple resources.” Organizing those parts into one coherent and robust whole — a multifaceted dataset that she could use to train a sophisticated language model — was a massive challenge.

“Training a model is not only about putting together all those sequences, but also putting together sequences with the labels that have been collected so far. This had never been done before.”

The paper, “A 5′ UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions,” was published in Nature Machine Learning. Additional authors include Dan Yu, Yupeng Li, Yue Shen and Jason Zhang, from RVAC Medicines; Le Cong from Stanford; and Yanyi Chu and Kaixuan Huang from Princeton.



Journal

Nature Machine Intelligence

DOI

10.1038/s42256-024-00823-9

Method of Research

Experimental study

Subject of Research

Cells

Article Title

A 5′ UTR Language MoA 5′ UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictionsdel for Decoding Untranslated Regions of mRNA and Function Predictions

Article Publication Date

5-Apr-2024

Share28Tweet17
Previous Post

Globalization in Photonics: an IEEE Photonics Journal Special Issue

Next Post

Dinosaur study challenges Bergmann’s rule

Related Posts

blank
Biology

Global Review: Genetic Markers Diagnose Giardia duodenalis

August 5, 2025
blank
Biology

Lycii Fructus Extracts and Zeaxanthin Inhibit Osteoclasts

August 5, 2025
blank
Biology

Exploring Paenibacillus alvei FS1 as Agricultural Biocontrol

August 5, 2025
blank
Biology

Genetic Insights into Echinococcus from Greece and Neighbors

August 5, 2025
blank
Biology

AI Model Predicts Restaurant Demand Using Weather Data

August 5, 2025
blank
Biology

Surfactin-C15 Breaks Cell Membranes: A Dual Study

August 5, 2025
Next Post
Arctic dinosaurs

Dinosaur study challenges Bergmann’s rule

  • Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    27529 shares
    Share 11008 Tweet 6880
  • University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

    939 shares
    Share 376 Tweet 235
  • Bee body mass, pathogens and local climate influence heat tolerance

    640 shares
    Share 256 Tweet 160
  • Researchers record first-ever images and data of a shark experiencing a boat strike

    506 shares
    Share 202 Tweet 127
  • Warm seawater speeding up melting of ‘Doomsday Glacier,’ scientists warn

    310 shares
    Share 124 Tweet 78
Science

Embark on a thrilling journey of discovery with Scienmag.com—your ultimate source for cutting-edge breakthroughs. Immerse yourself in a world where curiosity knows no limits and tomorrow’s possibilities become today’s reality!

RECENT NEWS

  • First-Line TKI Choice Influences Second-Line Nivolumab Survival
  • Retirement Impact and Well-Being in Rural Vietnam
  • Nonvolatile p–i–n Graphene Photodetectors on Chip
  • Probiotics for Preemies: Comfort or Concern?

Categories

  • Agriculture
  • Anthropology
  • Archaeology
  • Athmospheric
  • Biology
  • Bussines
  • Cancer
  • Chemistry
  • Climate
  • Earth Science
  • Marine
  • Mathematics
  • Medicine
  • Pediatry
  • Policy
  • Psychology & Psychiatry
  • Science Education
  • Social Science
  • Space
  • Technology and Engineering

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 5,184 other subscribers

© 2025 Scienmag - Science Magazine

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • HOME
  • SCIENCE NEWS
  • CONTACT US

© 2025 Scienmag - Science Magazine

Discover more from Science

Subscribe now to keep reading and get access to the full archive.

Continue reading