DAPO: an Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu,Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo,Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu,Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu,Jiaze Chen,Jiangjie Chen, Chengyi Wang, Hongli Yu,Yuxuan Song, Xiangpeng Wei,Hao Zhou,Jingjing Liu,Wei-Ying Ma,Ya-Qin Zhang, Lin Yan, Mu Qiao,Yonghui Wu,Mingxuan Wang arxiv(2025)
AI 理解论文
溯源树
样例
