Long context

CAG vs Long Context: How AI Models Use and Remember Information

CAG vs Long Context: How AI Models Use and Remember Information

Martin Keen explains how Long Context and Cache Augmented Generation (CAG) serve as powerful alternatives to RAG for providing external knowledge to LLMs. This summary details the mechanics of each approach, the role of the KV cache, the practical application through prompt caching, and the trade-offs in performance, cost, and latency for real-world AI workloads.

Is RAG Still Needed? Choosing the Best Approach for LLMs

Is RAG Still Needed? Choosing the Best Approach for LLMs

Martin Keen compares Retrieval Augmented Generation (RAG) with the emerging long context window approach in LLMs. He analyzes the pros and cons of each, from infrastructure simplicity and retrieval accuracy to computational costs and the 'needle in the haystack' problem, providing guidance on when to use each solution.

How We Built a Leading Reasoning Model (Olmo 3)

How We Built a Leading Reasoning Model (Olmo 3)

A comprehensive overview of the entire process behind building Olmo 3 Think, covering the full stack from pre-training architecture and data selection to the detailed post-training recipe involving SFT, DPO, and a deep dive into the advanced infrastructure for scaling Reinforcement Learning (RL). The summary also includes critical reflections on the challenges and nuances of evaluating modern reasoning models.

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

A technical breakdown and comparison of the architectures, training methodologies, and post-training techniques of three leading open-source models: OpenAI's GPT-OSS, Alibaba's Qwen-3, and DeepSeek V3. The summary explores their different approaches to Mixture-of-Experts, long-context, and attention mechanisms.