Chrome Extension

WeChat Mini Program

Use on ChatGLM

Log in

Academic Profile User Profile

My Following Paper Collections Browse History

Preference Optimization Via Contrastive Divergence: Your Reward Model is Secretly an NLL Estimator

Zhuotong Chen, Fang Liu, Xuan Zhu,Yanjun Qi,Mohammad Ghavamzadeh

CoRR（2025）

Cited 0|Views3

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined