订阅小程序
旧版功能

Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment Via Asymmetric Self-Play

Ziyu Ye, Rishabh Agarwal,Tianqi Liu,Rishabh Joshi, Sarmishta Velury,Quoc V. Le,Qijun Tan, Yuan Liu

CoRR(2024)

引用 0|浏览5
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要