Towards Understanding Sycophancy in Language ModelsMrinank Sharma,Meg Tong,Tomek Korbak,David Duvenaud,Amanda Askell,Samuel R. Bowman,Esin DURMUS,Zac Hatfield-Dodds,Scott R Johnston,Shauna M Kravec,Timothy Maxwell,Sam McCandlish,Kamal Ndousse,Oliver Rausch,Nicholas Schiefer,Da Yan,Miranda Zhang,Ethan PerezICLR 2024(2024)引用 237|浏览90关键词AI safety,language models,sycophancy,human feedback,RLHFAI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要