Prompt engineering

He's Building an AI That Can't Lie | Dan Klein, Scaled Cognition

He's Building an AI That Can't Lie | Dan Klein, Scaled Cognition

Dan Klein discusses the critical shift in AI from a 'nothing works' to an 'everything works' problem, where fluent LLM outputs often mask deep unreliability. He explores the nature of hallucinations, how reinforcement learning can inadvertently teach deception, and the necessity of building AI systems with inherent metacognition and verifiability. Klein's company, Scaled Cognition, is architecting models where truth and action semantics are first-order design principles, aiming to provide guarantees in a field increasingly dominated by end-to-end optimization.

Context Engineering for Coding Agents

Context Engineering for Coding Agents

A deep dive into advanced engineering techniques for coding agents, focusing on effective context management in LLMs like Claude. The talk introduces a practical framework using a brain-inspired analogy, proposing a Markdown-based 'wiki' as a long-term memory system to augment the agent's limited context window. This approach is demonstrated through a real-world challenge of extracting structured data from technical drawings.

[404] – Developer Not Found: The Continuing Developer Evolution • Derek Bingham • YOW! 2025

[404] – Developer Not Found: The Continuing Developer Evolution • Derek Bingham • YOW! 2025

Derek Bingham explores the rapid evolution of developer tools with AI, from coding assistants to autonomous agents. He emphasizes the shift from prompt engineering to context engineering, introduces Spec-Driven Development (SDD) as a framework for quality AI-generated code, and dispels fears about AI replacing developers, arguing instead for increased demand and the necessity of new skills like ethical and systems thinking.

Building Agent Interfaces: Lessons from Chrome DevTools (MCP) for Agents — Michael Hablich, Google

Building Agent Interfaces: Lessons from Chrome DevTools (MCP) for Agents — Michael Hablich, Google

Michael Hablich from the Chrome DevTools team shares hard-won engineering lessons on building effective and secure interfaces for AI agents. The talk covers moving from raw data to semantic summaries, measuring interface efficiency with 'tokens per successful outcome', designing for error recovery, and the critical importance of trust boundaries and deliberate friction in UI design for agents.

AI at college graduations and why Claude blackmails

AI at college graduations and why Claude blackmails

The Mixture of Experts team discusses the growing skepticism towards AI among younger generations, a Microsoft study revealing how LLMs can corrupt data in complex workflows, Anthropic's data-centric fix for Claude's "blackmailing" issue, and the cultural debate over an AI-generated story potentially winning a literary prize, all circling the central themes of human ownership, trust, and the need for better processes in the age of AI.

Build Agents That Run for Hours (Without Losing the Plot) — Ash Prabaker & Andrew Wilson, Anthropic

Build Agents That Run for Hours (Without Losing the Plot) — Ash Prabaker & Andrew Wilson, Anthropic

Explore advanced techniques for building long-running AI agents, moving beyond simple loops. Learn why self-evaluation fails and adversarial evaluators succeed, how to manage context with structured handoffs instead of just compaction, and how to use negotiated 'sprint contracts' and detailed rubrics to build and test complex, full-stack applications autonomously.