Anas Barakat
Anas Barakat
Home
Research
Talks
Teaching
CV
Contact
Light
Dark
Automatic
Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training
Anas Barakat
,
Souradip Chakraborty
,
Khushbu Pahwa
,
Amrit Singh Bedi
February 2026
Arxiv
Type
Conference paper
Publication
Under review
Reinforcement Learning
Related
On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning
Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning
Policy Mirror Descent with Lookahead
Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies
Cite
×