Scaling Offline RL via
Efficient and Expressive Shortcut Models

Nicolas Espinosa Dice¹, Yiyi Zhang¹, Yiding Chen¹, Bradley Guo¹, Owen Oertell¹, Gokul Swamy², Kianté Brantley³, Wen Sun¹

¹Cornell University, ²Carnegie Mellon University, ³Harvard University

Paper | Code | Thread | Blog Post

Abstract

Diffusion and flow models have emerged as powerful generative approaches capable of modeling diverse and multimodal behavior. However, applying these models to offline reinforcement learning (RL) remains challenging due to the iterative nature of their noise sampling processes, making policy optimization difficult. In this paper, we introduce Scalable Offline Reinforcement Learning (SORL), a new offline RL algorithm that leverages shortcut models – a novel class of generative models – to scale both training and inference. SORL’s policy can capture complex data distributions and can be trained simply and efficiently in a one-stage training procedure. At test time, SORL introduces both sequential and parallel inference scaling by using the learned Q-function as a verifier. We demonstrate that SORL achieves strong performance across a range of offline RL tasks and exhibits positive scaling behavior with increased test-time compute.

Blog Post

Scaling Offline Reinforcement Learning at Test Time
Kempner Institute Deeper Learning Blog
We introduce a novel approach to scaling reinforcement learning (RL) during training and inference. Inspired by the recent work on LLM test-time scaling, we demonstrate how greater test-time compute can be leveraged to improve the performance of expressive, flow-based policies in RL.

Thread

by incorporating self-consistency during offline RL training, we unlock three orthogonal directions of scaling:

1. efficient training (i.e. limit backprop through time)
2. expressive model classes (e.g. flow matching)
3. inference-time scaling (sequential and parallel)

which,… pic.twitter.com/65yh4wTyjn
— Nicolas Espinosa Dice (@nico_espinosa_d) June 12, 2025

Citation

@article{espinosa2025scaling,
  title={Scaling Offline RL via Efficient and Expressive Shortcut Models},
  author={Espinosa-Dice, Nicolas and Zhang, Yiyi and Chen, Yiding and Guo, Bradley and Oertell, Owen and Swamy, Gokul and Brantley, Kiante and Sun, Wen},
  journal={Neural Information Processing Symposium (NeurIPS)},
  year={2025}
}

Expressive Value Learning for Scalable Offline RL

→

Scaling Offline RL viaEfficient and Expressive Shortcut Models

Abstract

Blog Post

Thread

Citation

Scaling Offline RL via
Efficient and Expressive Shortcut Models