On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning

Publication
NeurIPS 2025

Related