The Capacity for Moral Self-Correction in Large Language ModelsDeep Ganguli,Amanda Askell,Nicholas Schiefer,Thomas I. Liao,Kamilė Lukošiūtė,Anna Chen,Anna Goldie,Azalia Mirhoseini,Catherine Olsson,Danny Hernandez,Dawn Drain,Dustin Li,Eli Tran-Johnson,Ethan Perez,Jackson Kernion,Jamie Kerr,Jared Mueller,Joshua Landau,Kamal Ndousse,Karina Nguyen,Liane Lovitt,Michael Sellitto,Nelson Elhage,Noemi Mercado,Nova DasSarma,Oliver Rausch,Robert Lasenby,Robin Larson,Sam Ringer,Sandipan Kundu,Saurav Kadavath,Scott Johnston,Shauna Kravec,Sheer El Showk,Tamera Lanham,Timothy Telleen-Lawton,Tom Henighan,Tristan Hume,Yuntao Bai,Zac Hatfield-Dodds,Ben Mann,Dario Amodei,Nicholas Joseph,Sam McCandlish,Tom Brown,Christopher Olah,Jack Clark,Samuel R. Bowman,Jared KaplanCoRR(2023)引用 175|浏览1338关键词moral,language,models,self-correctionAI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要