Nicolas Espinosa Dice

Nicolas Espinosa Dice
I am a second-year PhD student at Cornell University, where I am advised by Wen Sun. My research focuses on reinforcement learning, imitation learning, and generative models.

Prior to Cornell, I received a B.S. in Mathematics and Computer Science from Harvey Mudd College, where I was advised by George D. Montanez and Dagan Karp. I worked with George D. Montanez in the AMISTAD Lab and Weiqing Gu at Dasion.

Last updated: March 2025

Research

Efficient Imitation Under Misspecification
Nicolas Espinosa Dice, Sanjiban Choudhury, Wen Sun, Gokul Swamy
ICLR 2025
Paper / Code
We consider the problem of imitation learning under misspecification: settings where the learner is fundamentally unable to replicate expert behavior everywhere. Building on prior work in efficient inverse reinforcement learning through computationally efficient local search procedures, we first prove that under a novel structural condition we term reward-agnostic policy completeness, these sorts of local-search based IRL algorithms are able to avoid compounding errors, even in the misspecified setting. We then consider the question of where we should perform local search in the first place, given the learner may not be able to “walk on a tightrope” as well as the expert in the misspecified setting. We prove that in the misspecified setting, it is beneficial to broaden the set of states on which local search is performed to include states reachable by good policies that the learner can actually play. We then experimentally explore a variety of sources of misspecification and how offline data can be used to effectively broaden where we perform local search from.

Efficient Inverse Reinforcement Learning Without Compounding Errors
Nicolas Espinosa Dice, Gokul Swamy, Sanjiban Choudhury, Wen Sun
RLC 2024 RLSW, RLBRew
Project Page / Paper / Code
There are two seemingly contradictory desiderata for IRL algorithms: (a) preventing the compounding errors that stymie offline approaches like behavioral cloning and (b) avoiding the worst-case exploration complexity of reinforcement learning (RL). Prior work has been able to achieve either (a) or (b) but not both simultaneously. We prove that, under a novel structural condition we term reward-agnostic policy completeness, efficient IRL algorithms do avoid compounding errors, giving us the best of both worlds.