Nicolas Espinosa Dice

I am a second-year PhD student at Cornell University, where I am advised by Wen Sun and supported by the NSF GRFP and the Hopper-Dean and Bowers Fellowship. My research focuses on reinforcement learning, imitation learning, and generative models.

Prior to Cornell, I received a B.S. in Mathematics and Computer Science from Harvey Mudd College, where I was advised by George D. Montanez and Dagan Karp and supported by the Harvey S. Mudd Merit Award. I worked with George D. Montanez in the AMISTAD Lab on theoretical machine learning, receiving the Computer Science Research Award and Clinic Program Individual Award. I then worked with Weiqing Gu at Dasion.

Last updated: June 2025

Research

Scaling Offline RL via Efficient and Expressive Shortcut Models
Nicolas Espinosa Dice, Yiyi Zhang, Yiding Chen, Bradley Guo, Owen Oertell, Gokul Swamy, Kianté Brantley, Wen Sun
Preprint
Paper | Code | Project Page | Thread
Diffusion and flow models have emerged as powerful generative approaches capable of modeling diverse and multimodal behavior. However, applying these models to offline reinforcement learning (RL) remains challenging due to the iterative nature of their noise sampling processes, making policy optimization difficult. In this paper, we introduce Scalable Offline Reinforcement Learning (SORL), a new offline RL algorithm that leverages shortcut models – a novel class of generative models – to scale both training and inference. SORL’s policy can capture complex data distributions and can be trained simply and efficiently in a one-stage training procedure. At test time, SORL introduces both sequential and parallel inference scaling by using the learned Q-function as a verifier. We demonstrate that SORL achieves strong performance across a range of offline RL tasks and exhibits positive scaling behavior with increased test-time compute.

Efficient Imitation Under Misspecification
Nicolas Espinosa Dice, Sanjiban Choudhury, Wen Sun, Gokul Swamy
ICLR 2025
Paper | Code | Thread
We consider the problem of imitation learning under misspecification: settings where the learner is fundamentally unable to replicate expert behavior. Building on prior work in efficient inverse reinforcement learning through computationally efficient local search procedures, we first prove that under a novel structural condition we term reward-agnostic policy completeness, these sorts of local-search based IRL algorithms are able to avoid compounding errors, even in the misspecified setting. We then consider the question of where we should perform local search in the first place, given the learner may not be able to “walk on a tightrope” as well as the expert in the misspecified setting. We prove that in the misspecified setting, it is beneficial to broaden the set of states on which local search is performed to include states reachable by good policies that the learner can actually play. We then experimentally explore a variety of sources of misspecification and how offline data can be used to effectively broaden where we perform local search from.

Efficient Inverse Reinforcement Learning Without Compounding Errors
Nicolas Espinosa Dice, Gokul Swamy, Sanjiban Choudhury, Wen Sun
RLC 2024 RLSW, RLBRew
Paper | Code | Project Page
There are two seemingly contradictory desiderata for IRL algorithms: (a) preventing the compounding errors that stymie offline approaches like behavioral cloning and (b) avoiding the worst-case exploration complexity of reinforcement learning (RL). Prior work has been able to achieve either (a) or (b) but not both simultaneously. We prove that, under a novel structural condition we term reward-agnostic policy completeness, efficient IRL algorithms do avoid compounding errors, giving us the best of both worlds.