Sabotage Evaluations for Frontier Models
Joe Benton, Misha Wagner, Eric Christiansen,Cem Anil,Ethan Perez, Jai Srivastav,Esin Durmus,Deep Ganguli,Shauna Kravec,Buck Shlegeris,Jared Kaplan, Holden Karnofsky,Evan Hubinger,Roger Grosse,Samuel R. Bowman,David Duvenaud CoRR(2024)
AI 理解论文
溯源树
样例
