Prompt caching

CAG vs Long Context: How AI Models Use and Remember Information

CAG vs Long Context: How AI Models Use and Remember Information

Martin Keen explains how Long Context and Cache Augmented Generation (CAG) serve as powerful alternatives to RAG for providing external knowledge to LLMs. This summary details the mechanics of each approach, the role of the KV cache, the practical application through prompt caching, and the trade-offs in performance, cost, and latency for real-world AI workloads.

Build Hour: Prompt Caching

Build Hour: Prompt Caching

Explore prompt caching to significantly reduce latency and costs for your AI applications. This guide breaks down the mechanics of KV caching, best practices for maximizing cache hits using `prompt_cache_key` and the Responses API, and real-world implementation insights from the agentic development platform, Warp.