Helping or Herding? Reward Model Ensembles Mitigate but Do Not Eliminate Reward Hacking
COLM 2024(2024)
关键词
Language Modeling,Topic Modeling,Interpretable Models,Responsibility in AI,Model Interpretability
AI 理解论文
溯源树
样例

生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要