Model interpretability

He's Building an AI That Can't Lie | Dan Klein, Scaled Cognition

He's Building an AI That Can't Lie | Dan Klein, Scaled Cognition

Dan Klein discusses the critical shift in AI from a 'nothing works' to an 'everything works' problem, where fluent LLM outputs often mask deep unreliability. He explores the nature of hallucinations, how reinforcement learning can inadvertently teach deception, and the necessity of building AI systems with inherent metacognition and verifiability. Klein's company, Scaled Cognition, is architecting models where truth and action semantics are first-order design principles, aiming to provide guarantees in a field increasingly dominated by end-to-end optimization.