谷歌浏览器插件
订阅小程序
在清言上使用

Auditing Language Models for Hidden Objectives.

Samuel Marks,Johannes Treutlein, Trenton Bricken, Jack Lindsey, Jonathan Marcus, Siddharth Mishra-Sharma, Daniel M. Ziegler, Emmanuel Ameisen, Joshua Batson, Tim Belonax, Samuel R. Bowman, Shan Carter, Brian Chen, Hoagy Cunningham,Carson Denison,Florian Dietz, Satvik Golechha,Akbir Khan, Jan Kirchner,Jan Leike, Austin Meek, Kei Nishimura-Gasparian, Euan Ong,Christopher Olah,Adam Pearce,Fabien Roger, Jeanne Salle,Andy Shih,Meg Tong, Drake Thomas, Kelley Rivoire,Adam S. Jermyn, Monte MacDiarmid,Tom Henighan,Evan Hubinger

CoRR(2025)

引用 0|浏览3
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要