Trust, but Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards Xiaoyuan Liu,Tian Liang,Zhiwei He, Jiahao Xu,Wenxuan Wang,Pinjia He,Zhaopeng Tu,Haitao Mi,Dong Yuarxiv(2025)引用 0|浏览5AI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要