Saturday, May 2, 2026
Science
No Result
View All Result
  • Login
  • HOME
  • SCIENCE NEWS
  • CONTACT US
  • HOME
  • SCIENCE NEWS
  • CONTACT US
No Result
View All Result
Scienmag
No Result
View All Result
Home Science News Technology and Engineering

SharpeRatio@k: novel metric for evaluation of risk-return tradeoff in off-policy evaluation

April 23, 2024
in Technology and Engineering
Reading Time: 4 mins read
0
SharpeRatio@k: Off-Policy Evaluation Using Novel Risk-Return Tradeoff and Efficiency Assessment
66
SHARES
604
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT

Reinforcement learning (RL) is a machine learning technique that trains software by mimicking the trial-and-error learning process of humans. It has demonstrated considerable success in many areas that involve sequential decision-making. However, training RL models with real-world online tests is often undesirable as it can be risky, time-consuming and, importantly, unethical. Thus, using offline datasets that are naturally collected through past operations is becoming increasingly popular for training and evaluating RL and bandit policies.

SharpeRatio@k: Off-Policy Evaluation Using Novel Risk-Return Tradeoff and Efficiency Assessment

Credit: Tokyo Institute of Technology

Reinforcement learning (RL) is a machine learning technique that trains software by mimicking the trial-and-error learning process of humans. It has demonstrated considerable success in many areas that involve sequential decision-making. However, training RL models with real-world online tests is often undesirable as it can be risky, time-consuming and, importantly, unethical. Thus, using offline datasets that are naturally collected through past operations is becoming increasingly popular for training and evaluating RL and bandit policies.

In particular, in practical applications, the Off-Policy Evaluation (OPE) method is used to first filter the most promising candidate policies, called “top-k policies,” from an offline logged dataset, and then use more reliable real-world tests, called online A/B tests, to choose the final policy. To evaluate the effectiveness of different OPE estimators, researchers have primarily focused on metrics such as the mean-squared error (MSE), RankCorr and Regret. However, these methods solely focus on the accuracy of OPE methods while failing to evaluate the risk-return tradeoff during online policy deployment. Specifically, MSE and RankCorr fail to differentiate whether near-optimal policies are underestimated or poor-performing policies are overestimated, while Regret focuses only on the best policy and overlooks the possibility of harming the system due to sub-optimal policies in online A/B tests.

Addressing this issue, a team of researchers from Japan, led by Professor Kazuhide Nakata from Tokyo Institute of Technology, developed a new evaluation metric for OPE estimators. “Risk-return measurement is crucial in ensuring safety in risk-sensitive scenarios such as finance. Inspired by the design principle of the financial risk assessment metric, Sharpe ratio, we developed SharpeRatio@k, which measures both potential risk and return in top-k policy selection,” explains Prof. Nakata. The study was published in the Proceedings of the ICLR 2024 Conference.

SharpeRatio@k treats the top-k policies selected by an OPE estimator as a policy portfolio, similar to financial portfolios, and measures the risk, return and efficiency of the estimator based on the statistics of the portfolio. In this method, a policy portfolio is considered efficient when it contains policies that greatly improve performance (high return) without including poorly performing policies that negatively affect learning in online A/B tests (low risk).  This method maximises return and minimises risk, thereby identifying the safest and most efficient estimator.

The researchers demonstrated the capabilities of this novel metric through example scenarios and benchmark tests and compared it with existing metrics.  Testing revealed that SharpeRatio@k effectively measures the risk, return and overall efficiency of different estimators under varying online evaluation budgets, while existing metrics fail to do so. Additionally, it also addresses the overestimation and underestimation of policies. Interestingly, they also found that while in some scenarios it aligns with existing metrics, a better value of these metrics does not always result in a better SharpeRatio@k value.

Through these benchmarks, the researchers also suggested several future research directions for OPE estimators, including the need to use SharpeRatio@k for efficiency assessment of OPE estimators and the need for new estimators and estimator selection methods that account for risk-return tradeoffs. Furthermore, they also implemented their innovative metric in an open-source software for a quick, accurate and insightful evaluation of OPE.

Highlighting the importance of the study, Prof. Nakata concludes, “Our study shows that SharpreRatio@k can identify the appropriate estimator to use in terms of its efficiency under different behaviour policies, providing useful insight for a more appropriate estimator evaluation and selection in both research and practice.”

Overall, this study enhances policy selection through OPE, paving the way for improved reinforcement learning.

###

Related link:

SCOPE-RL document

SCOPE-RL open link

###

About Tokyo Institute of Technology

Tokyo Tech stands at the forefront of research and higher education as the leading university for science and technology in Japan. Tokyo Tech researchers excel in fields ranging from materials science to biology, computer science, and physics. Founded in 1881, Tokyo Tech hosts over 10,000 undergraduate and graduate students per year, who develop into scientific leaders and some of the most sought-after engineers in industry. Embodying the Japanese philosophy of “monotsukuri,” meaning “technical ingenuity and innovation,” the Tokyo Tech community strives to contribute to society through high-impact research.



Method of Research

Experimental study

Subject of Research

Not applicable

Article Title

Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation

Share26Tweet17
Previous Post

Follow-up 50 years on finds landmark steroid study remains safe

Next Post

Tropical fish are invading Australian ocean water

Related Posts

Early Detection of Keratoconus Enhanced by Light Polarization and AI — Technology and Engineering
Technology and Engineering

Early Detection of Keratoconus Enhanced by Light Polarization and AI

May 2, 2026
SNU Researchers Create Battery-Free, Skin-Conforming Wearable Technology — Technology and Engineering
Technology and Engineering

SNU Researchers Create Battery-Free, Skin-Conforming Wearable Technology

May 1, 2026
Strategies to Prevent Supply Chain Disruptions Amid the Rapid Growth of Drone and Robot Manufacturing — Technology and Engineering
Technology and Engineering

Strategies to Prevent Supply Chain Disruptions Amid the Rapid Growth of Drone and Robot Manufacturing

May 1, 2026
Dan M. Frangopol Wins Third ASCE Wellington Prize for Pioneering Infrastructure Resilience Research — Technology and Engineering
Technology and Engineering

Dan M. Frangopol Wins Third ASCE Wellington Prize for Pioneering Infrastructure Resilience Research

May 1, 2026
Allied Health Impact on Preterm Infant Nutrition — Technology and Engineering
Technology and Engineering

Allied Health Impact on Preterm Infant Nutrition

May 1, 2026
Cu-Ion Crosslinked Membranes Boost High-Temp Fuel Cells — Technology and Engineering
Technology and Engineering

Cu-Ion Crosslinked Membranes Boost High-Temp Fuel Cells

May 1, 2026
Next Post
Tropical fish

Tropical fish are invading Australian ocean water

  • Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    27639 shares
    Share 11052 Tweet 6908
  • University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

    1042 shares
    Share 417 Tweet 261
  • Bee body mass, pathogens and local climate influence heat tolerance

    677 shares
    Share 271 Tweet 169
  • Researchers record first-ever images and data of a shark experiencing a boat strike

    540 shares
    Share 216 Tweet 135
  • Groundbreaking Clinical Trial Reveals Lubiprostone Enhances Kidney Function

    527 shares
    Share 211 Tweet 132
Science

Embark on a thrilling journey of discovery with Scienmag.com—your ultimate source for cutting-edge breakthroughs. Immerse yourself in a world where curiosity knows no limits and tomorrow’s possibilities become today’s reality!

RECENT NEWS

  • Family Health Needs of Disabled Elders Explored
  • Mcu Controls Bone Growth Through Mitochondrial Calcium
  • Physical Disorders, ADLs, Cognition, Depression in Nursing Homes
  • Precise Spatiotemporal Cardiac Repair and Regeneration

Categories

  • Agriculture
  • Anthropology
  • Archaeology
  • Athmospheric
  • Biology
  • Biotechnology
  • Blog
  • Bussines
  • Cancer
  • Chemistry
  • Climate
  • Earth Science
  • Editorial Policy
  • Marine
  • Mathematics
  • Medicine
  • Pediatry
  • Policy
  • Psychology & Psychiatry
  • Science Education
  • Social Science
  • Space
  • Technology and Engineering

Subscribe to Blog via Email

Success! An email was just sent to confirm your subscription. Please find the email now and click 'Confirm Follow' to start subscribing.

Join 5,146 other subscribers

© 2025 Scienmag - Science Magazine

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • HOME
  • SCIENCE NEWS
  • CONTACT US

© 2025 Scienmag - Science Magazine