Prompt engineering

I’m Teaching AI Self-Improvement Techniques

I’m Teaching AI Self-Improvement Techniques

Aman Khan from Arize discusses the challenges of building reliable AI agents and introduces a novel technique called "metaprompting". This method uses continuous, natural language feedback to optimize an agent's system prompt, effectively training its "memory" or context, leading to significant performance gains even for smaller models.

Prompt Engineering for LLMs, PDL, & LangChain in Action

Prompt Engineering for LLMs, PDL, & LangChain in Action

Martin Keen explains the evolution of prompt engineering from an art to a software engineering discipline. He introduces LangChain and Prompt Declaration Language (PDL) as tools to manage the probabilistic nature of LLMs, ensuring reliable, structured JSON output through concepts like contracts, control loops, and observability.

How to Future-Proof Your Career in the Age of AI (with Sheamus McGovern)

How to Future-Proof Your Career in the Age of AI (with Sheamus McGovern)

Sheamus McGovern outlines a multi-tiered skills hierarchy for AI and data professionals to navigate the future of work. He argues against fear-mongering, providing a practical roadmap that progresses from foundational GenAI prompting and advanced engineering to orchestration, human-centered skills, and the meta-skill of continuous learning, emphasizing the need to sunset old skills and build a personal brand.

Evals Aren't Useful? Really?

Evals Aren't Useful? Really?

A deep dive into the critical importance of robust evaluation for building reliable AI agents. The summary covers bootstrapping evaluation sets, advanced testing techniques like multi-turn simulations and red teaming, and the necessity of integrating traditional software engineering and MLOps practices into the agent development lifecycle.

Building with MCP and the Claude API

Building with MCP and the Claude API

A discussion with Anthropic engineers Alex Albert, John Welsh, and Michael Cohen about the Model Context Protocol (MCP). They cover its origins as an open standard, best practices for tool design and prompt engineering, and the future of the ecosystem where high-quality MCP servers will become a key competitive advantage.

Evals in Action: From Frontier Research to Production Applications

Evals in Action: From Frontier Research to Production Applications

An overview of OpenAI's approach to AI evaluation, covering the GDP-val benchmark for frontier models and the practical tools available for developers to evaluate their own custom agents and applications.