Step-KTO: Optimizing Mathematical Reasoning Through Stepwise Binary FeedbackYen-Ting Lin,Di Jin, Tengyu Xu,Tianhao Wu,Sainbayar Sukhbaatar, Chen Zhu, Yun He, Yun-Nung Chen,Jason Weston,Yuandong Tian, Arash Rahnama,Sinong Wang,Hao Ma,Han FangCoRR(2025)引用 0|浏览7AI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要