Prompt engineering

KDD '25 AI Reasoning Day keynote: Improving AI Reasoning through Intent, Interaction, and Inspection

KDD '25 AI Reasoning Day keynote: Improving AI Reasoning through Intent, Interaction, and Inspection

A deep dive into practical strategies for improving AI reasoning in code and structured tasks. The talk covers capturing richer user intent through examples, enabling collaborative interaction, and using automated inspection for iterative refinement, illustrated with real-world applications from Microsoft.

DSPy: The End of Prompt Engineering - Kevin Madura, AlixPartners

DSPy: The End of Prompt Engineering - Kevin Madura, AlixPartners

An in-depth guide to DSPy, a framework for programming with language models, not just prompting them. Learn its core concepts—Signatures, Modules, Adapters, and Optimizers—and see real-world examples of building robust, testable, and transferable AI applications for the enterprise.

Shipping AI That Works: An Evaluation Framework for PMs – Aman Khan, Arize

Shipping AI That Works: An Evaluation Framework for PMs – Aman Khan, Arize

This talk provides a practical framework for product managers to move beyond simple "vibe checks" to implement rigorous, data-driven evaluation for LLM-powered products. Using a live demo of a multi-agent AI trip planner, the speaker breaks down essential methodologies, including human feedback, code-based checks, and LLM-as-a-judge systems, and demonstrates how to iterate on both prompts and the evals themselves to ensure consistent quality and build user trust.

How Claude Code Works - Jared Zoneraich, PromptLayer

How Claude Code Works - Jared Zoneraich, PromptLayer

An unofficial deep dive into the architecture of modern coding agents like Claude Code. Jared Zoneraich of PromptLayer explains the shift towards simpler, model-centric designs, detailing the core components like the master loop, tool calling (especially `bash`), and context management strategies. The talk also contrasts Claude's philosophy with other agents like Codex, AMP, and Cursor, offering practical takeaways for building your own AI agents.

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

OpenAI's Christina Kim (Research) and Laurentia Romaniuk (Product) discuss the development of GPT-5.1, detailing the shift to universal "reasoning models" to enhance both IQ and EQ. They explore the nuances of "model personality," the technical challenges of balancing steerability with safety, and how features like Memory create a more personalized, context-aware user experience.

Inside the AI Black Box

Inside the AI Black Box

Emmanuel Ameisen of Anthropic's interpretability team explains the inner workings of LLMs, drawing analogies to biology. He covers surprising findings on how models plan, represent concepts across languages, and the mechanistic causes of hallucinations, offering practical advice for developers on evaluation and post-training strategies.