Friday, May 22, 2026
Science
No Result
View All Result
  • Login
  • HOME
  • SCIENCE NEWS
  • CONTACT US
  • HOME
  • SCIENCE NEWS
  • CONTACT US
No Result
View All Result
Scienmag
No Result
View All Result
Home Science News Policy

A unified objective for dynamics model and policy learning in model-based reinforcement learning

September 4, 2024
in Policy
Reading Time: 4 mins read
0
The processing flow of Model Gradient
68
SHARES
614
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT

Recently, model-based reinforcement learning has been considered a crucial approach to applying reinforcement learning in the physical world, primarily due to its efficient utilization of samples. However, the supervised learned model, which generates rollouts for policy optimization, leads to compounding errors and hinders policy performance. To address this problem, the research team led by Yang YU published their new research on 15 August 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

The team proposed a novel model-based learning approach that unifies the objectives of model learning and policy learning. By directly maximizing the policy’s performance in the real world, this research proposes the Model Gradient algorithm (MG). Compared with existing model-based methods, this approach achieves both higher sample efficiency and better performance.

This research identifies the limitation of current supervised-learned model-based reinforcement learning methods, where the model inaccuracy leads to compounding error. The authors suggest addressing the problem by modifying model learning objective. A supervised model learning approach may not be designed to assist policy learning in achieving better performance because the objective does not align with the ultimate goal of reinforcement learning, i.e., maximizing the real-world policy performance. Therefore, this research aims to unify the objective of model learning and policy learning starting with policy gradient. By maximizing the real-world performance of the policy learned in the model, this research derives the gradient of model, which represents the direction of policy improvement with the form of enhancing the similarity between the policy gradient in the real environment and that in the model. By adopting this model update approach, the authors develops a novel model-based reinforcement learning algorithm called the Model Gradient algorithm (MG).

Experimental results demonstrate that MG outperforms other model-based reinforcement learning baselines with supervised model fitting in multiple continuous control tasks. MG especially exhibits stable performance in sparse reward tasks, even when compared to state-of-the-art Dyna-style model-based reinforcement learning methods with short-horizon rollouts. 

For the future work, this research considers extending this form to more policy optimization such as off-policy methods.

DOI: 10.1007/s11704-023-3150-5
 

The processing flow of Model Gradient

Credit: Chengxing JIA, Fuxiang ZHANG, Tian XU, Jing-Cheng PANG, Zongzhang ZHANG, Yang YU

Recently, model-based reinforcement learning has been considered a crucial approach to applying reinforcement learning in the physical world, primarily due to its efficient utilization of samples. However, the supervised learned model, which generates rollouts for policy optimization, leads to compounding errors and hinders policy performance. To address this problem, the research team led by Yang YU published their new research on 15 August 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

The team proposed a novel model-based learning approach that unifies the objectives of model learning and policy learning. By directly maximizing the policy’s performance in the real world, this research proposes the Model Gradient algorithm (MG). Compared with existing model-based methods, this approach achieves both higher sample efficiency and better performance.

This research identifies the limitation of current supervised-learned model-based reinforcement learning methods, where the model inaccuracy leads to compounding error. The authors suggest addressing the problem by modifying model learning objective. A supervised model learning approach may not be designed to assist policy learning in achieving better performance because the objective does not align with the ultimate goal of reinforcement learning, i.e., maximizing the real-world policy performance. Therefore, this research aims to unify the objective of model learning and policy learning starting with policy gradient. By maximizing the real-world performance of the policy learned in the model, this research derives the gradient of model, which represents the direction of policy improvement with the form of enhancing the similarity between the policy gradient in the real environment and that in the model. By adopting this model update approach, the authors develops a novel model-based reinforcement learning algorithm called the Model Gradient algorithm (MG).

Experimental results demonstrate that MG outperforms other model-based reinforcement learning baselines with supervised model fitting in multiple continuous control tasks. MG especially exhibits stable performance in sparse reward tasks, even when compared to state-of-the-art Dyna-style model-based reinforcement learning methods with short-horizon rollouts. 

For the future work, this research considers extending this form to more policy optimization such as off-policy methods.

DOI: 10.1007/s11704-023-3150-5
 



Journal

Frontiers of Computer Science

DOI

10.1007/s11684-023-1046-2

Method of Research

Experimental study

Subject of Research

Not applicable

Article Title

Assessment of HER2 status in extramammary Paget disease and its implication for disitamab vedotin, a novel humanized anti-HER2 antibody-drug conjugate therapy

Article Publication Date

15-Aug-2024

Share27Tweet17
Previous Post

How to solve the challenges faced by the carbon sequestration function of Chinese plantations in the future?

Next Post

How Sub-Saharan Africa can achieve the SDGs by 2100: A new report by Earth4All

Related Posts

Study Finds Private Equity Acquisitions Boost Primary Care Access by Expanding Workforce — Policy
Policy

Study Finds Private Equity Acquisitions Boost Primary Care Access by Expanding Workforce

May 20, 2026
Honoring Innovators: Changemakers Recognized by the World’s Leading Computing Association — Policy
Policy

Honoring Innovators: Changemakers Recognized by the World’s Leading Computing Association

May 20, 2026
Capture the Fracture® Surpasses Major Milestone: Over One Million Patients Identified Annually — Policy
Policy

Capture the Fracture® Surpasses Major Milestone: Over One Million Patients Identified Annually

May 20, 2026
Microplastics in the Thames Drive Policy Reform Efforts — Policy
Policy

Microplastics in the Thames Drive Policy Reform Efforts

May 20, 2026
Global Plastic Pollution Predominantly Driven by Food and Drink Packaging Waste — Policy
Policy

Global Plastic Pollution Predominantly Driven by Food and Drink Packaging Waste

May 20, 2026
How Do Advance Directives Influence End-of-Life Care? — Policy
Policy

How Do Advance Directives Influence End-of-Life Care?

May 20, 2026
Next Post
Africa specific turnarounds and policy levers

How Sub-Saharan Africa can achieve the SDGs by 2100: A new report by Earth4All

  • Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    27648 shares
    Share 11056 Tweet 6910
  • University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

    1051 shares
    Share 420 Tweet 263
  • Bee body mass, pathogens and local climate influence heat tolerance

    679 shares
    Share 272 Tweet 170
  • Researchers record first-ever images and data of a shark experiencing a boat strike

    543 shares
    Share 217 Tweet 136
  • Groundbreaking Clinical Trial Reveals Lubiprostone Enhances Kidney Function

    528 shares
    Share 211 Tweet 132
Science

Embark on a thrilling journey of discovery with Scienmag.com—your ultimate source for cutting-edge breakthroughs. Immerse yourself in a world where curiosity knows no limits and tomorrow’s possibilities become today’s reality!

RECENT NEWS

  • Microbial Networks Sustain Soil Stability in Dry Conditions
  • Human Ignitions Drive Brazilian Cerrado Fire Regimes
  • Stable Circulating Proteins in Older Adults Over Time
  • Engineered Superconducting Diamonds Pave Way for Multi-Modality Quantum Chips, Researchers Reveal

Categories

  • Agriculture
  • Anthropology
  • Archaeology
  • Athmospheric
  • Biology
  • Biotechnology
  • Blog
  • Bussines
  • Cancer
  • Chemistry
  • Climate
  • Earth Science
  • Editorial Policy
  • Marine
  • Mathematics
  • Medicine
  • Pediatry
  • Policy
  • Psychology & Psychiatry
  • Science Education
  • Social Science
  • Space
  • Technology and Engineering

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 5,146 other subscribers

© 2025 Scienmag - Science Magazine

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • HOME
  • SCIENCE NEWS
  • CONTACT US

© 2025 Scienmag - Science Magazine

Discover more from Science

Subscribe now to keep reading and get access to the full archive.

Continue reading