Llm optimization

Build Hour: Prompt Caching

Build Hour: Prompt Caching

Explore prompt caching to significantly reduce latency and costs for your AI applications. This guide breaks down the mechanics of KV caching, best practices for maximizing cache hits using `prompt_cache_key` and the Responses API, and real-world implementation insights from the agentic development platform, Warp.

How to Optimize AI Agents in Production

How to Optimize AI Agents in Production

Engineers building AI agents face a combinatorial explosion of configuration choices (prompts, models, parameters), leading to guesswork and suboptimal results. This talk introduces a structured, data-driven approach using multi-objective optimization to systematically explore this vast design space. Learn how the Traigent SDK helps engineers efficiently identify optimal tradeoffs between cost, latency, and accuracy, yielding significant quality improvements and cost reductions without manual trial-and-error.