Efficient Inverse Reinforcement Learning
without Compounding Errors
RLC 2024 RLSW and RLBRew Workshops


Nicolas Espinosa Dice1
Gokul Swamy2
Sanjiban Choudhury1
Wen Sun1


Cornell University
Carnegie Mellon University







Abstract

Inverse reinforcement learning (IRL) is an on-policy approach to imitation learning (IL) that allows the learner to observe the consequences of their actions at train-time. Accordingly, there are two seemingly contradictory desiderata for IRL algorithms: (a) preventing the compounding errors that stymie offline approaches like behavioral cloning and (b) avoiding the worst-case exploration complexity of reinforcement learning (RL). Prior work has been able to achieve either (a) or (b) but not both simultaneously. In our work, we first prove a negative result showing that, without further assumptions, there are no efficient IRL algorithms that avoid compounding errors in the worst case. We then provide a positive result: under a novel structural condition we term reward-agnostic policy completeness, we prove that efficient IRL algorithms do avoid compounding errors, giving us the best of both worlds. We also propose a principled method for using sub-optimal data to further improve the sample-efficiency of efficient IRL algorithms.




Paper

Efficient Inverse Reinforcement Learning without Compounding Errors

Efficient Inverse Reinforcement Learning without Compounding Errors

Nicolas Espinosa Dice, Gokul Swamy, Sanjiban Choudhury, Wen Sun

In ICML 2024 MHFAIA Workshop.

@inproceedings{dice2024efficient,
            title={Efficient Inverse Reinforcement Learning without Compounding Errors},
            author={Dice, Nicolas Espinosa and Swamy, Gokul and Choudhury, Sanjiban and Sun, Wen},
            booktitle={ICML 2024 Workshop on Models of Human Feedback for AI Alignment}
}



Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful project, and inherits the modifications made by Jason Zhang. The code can be found here.