Prompt engineering

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

OpenAI's Christina Kim (Research) and Laurentia Romaniuk (Product) discuss the development of GPT-5.1, detailing the shift to universal "reasoning models" to enhance both IQ and EQ. They explore the nuances of "model personality," the technical challenges of balancing steerability with safety, and how features like Memory create a more personalized, context-aware user experience.

Inside the AI Black Box

Inside the AI Black Box

Emmanuel Ameisen of Anthropic's interpretability team explains the inner workings of LLMs, drawing analogies to biology. He covers surprising findings on how models plan, represent concepts across languages, and the mechanistic causes of hallucinations, offering practical advice for developers on evaluation and post-training strategies.

I’m Teaching AI Self-Improvement Techniques

I’m Teaching AI Self-Improvement Techniques

Aman Khan from Arize discusses the challenges of building reliable AI agents and introduces a novel technique called "metaprompting". This method uses continuous, natural language feedback to optimize an agent's system prompt, effectively training its "memory" or context, leading to significant performance gains even for smaller models.

Prompt Engineering for LLMs, PDL, & LangChain in Action

Prompt Engineering for LLMs, PDL, & LangChain in Action

Martin Keen explains the evolution of prompt engineering from an art to a software engineering discipline. He introduces LangChain and Prompt Declaration Language (PDL) as tools to manage the probabilistic nature of LLMs, ensuring reliable, structured JSON output through concepts like contracts, control loops, and observability.

How to Future-Proof Your Career in the Age of AI (with Sheamus McGovern)

How to Future-Proof Your Career in the Age of AI (with Sheamus McGovern)

Sheamus McGovern outlines a multi-tiered skills hierarchy for AI and data professionals to navigate the future of work. He argues against fear-mongering, providing a practical roadmap that progresses from foundational GenAI prompting and advanced engineering to orchestration, human-centered skills, and the meta-skill of continuous learning, emphasizing the need to sunset old skills and build a personal brand.

Evals Aren't Useful? Really?

Evals Aren't Useful? Really?

A deep dive into the critical importance of robust evaluation for building reliable AI agents. The summary covers bootstrapping evaluation sets, advanced testing techniques like multi-turn simulations and red teaming, and the necessity of integrating traditional software engineering and MLOps practices into the agent development lifecycle.