Publications

Policy Mirror Descent with Lookahead
Reinforcement Learning with General Utilities: Scaling to Large State Action Spaces via Occupancy Measure Approximation