Llm

Context Engineering: Lessons Learned from Scaling CoCounsel

Context Engineering: Lessons Learned from Scaling CoCounsel

Jake Heller, founder of Casetext, shares a pragmatic framework for turning powerful large language models like GPT-4 into reliable, professional-grade products. He details a rigorous, evaluation-driven approach to prompt and context engineering, emphasizing iterative testing, the critical role of high-quality context, and advanced techniques like reinforcement fine-tuning and strategic model selection.

Iterating on Your AI Evals // Mariana Prazeres // Agents in Production 2025

Iterating on Your AI Evals // Mariana Prazeres // Agents in Production 2025

Moving an AI agent from a promising demo to a reliable product is challenging. This talk presents a startup-friendly, iterative process for building robust evaluation frameworks, emphasizing that you must iterate on the evaluations themselves—the metrics and the data—not just the prompts and models. It outlines a practical "crawl, walk, run" approach, starting with simple heuristics and scaling to an advanced system with automated checks and human-in-the-loop validation.

Building an Agentic Platform — Ben Kus, CTO Box

Building an Agentic Platform — Ben Kus, CTO Box

Ben Kus, CTO of Box, outlines the technical evolution of their AI platform, detailing the transition from a promising but fragile LLM-based metadata extraction system to a robust, scalable agentic architecture. He explains why this shift was necessary to handle enterprise-level complexity and the key lessons learned.

Five hard earned lessons about Evals — Ankur Goyal, Braintrust

Five hard earned lessons about Evals — Ankur Goyal, Braintrust

Building successful AI applications requires a sophisticated engineering approach that goes beyond prompt engineering. This involves creating intentionally engineered evaluations (evals) that reflect user feedback, focusing on "context engineering" to optimize tool definitions and outputs, and maintaining a flexible, model-agnostic architecture to adapt to the rapidly evolving AI landscape.

How BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock

How BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock

BlackRock engineers Vaibhav Page and Infant Vasanth introduce a modular, Kubernetes-native AI framework designed to accelerate the development of custom knowledge applications for investment operations, reducing deployment time from months to days.

Interpretability: Understanding how AI models think

Interpretability: Understanding how AI models think

Members of Anthropic's interpretability team discuss their research into the inner workings of large language models. They explore the analogy of studying AI as a biological system, the surprising discovery of internal "features" or concepts, and why this research is critical for understanding model behavior like hallucinations, sycophancy, and long-term planning, ultimately aiming to ensure AI safety.