Llm training

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

At Applied Compute, efficient Reinforcement Learning is critical for delivering business value. This talk explores the transition from inefficient synchronous RL to a high-throughput asynchronous 'Pipeline RL' system. The core challenge is managing 'staleness'—a side effect of in-flight weight updates that can destabilize training. The speakers detail their first-principles systems model, based on the Roofline model, used to simulate and find the optimal allocation of GPU resources between sampling and training, balancing throughput with algorithmic stability and achieving significant speedups.

Zai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai

Zai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai

Zhang Yuxuan from Z.ai details the technical roadmap behind the GLM-4.6 model series, which has achieved top performance on the LMSYS Chatbot Arena. The summary covers their 15T token data recipe, the SLIME framework for efficient agent RL, key lessons in single-stage long-context training, and the architecture of the multimodal GLM-4.5V model.

Quantized LLM Training at Scale with ZeRO++ // Guanhua Wang // AI in Production 2025

Quantized LLM Training at Scale with ZeRO++ // Guanhua Wang // AI in Production 2025

Guanhua Wang from Microsoft's DeepSpeed team explains ZeRO++, a system that tackles the communication bottleneck in large-scale LLM training. By quantizing weights and gradients, ZeRO++ reduces communication volume by 4x, leading to training speedups of over 2x, particularly in low-bandwidth and small-batch-size environments.

The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

Edwin Chen, founder and CEO of Surge AI, shares the company's origin story, its rapid, bootstrapped growth, and its research-driven philosophy on data. He critiques traditional data labeling, explains why metrics like inter-annotator agreement fail for complex tasks, and offers a sharp analysis of benchmark hacking. Chen also details the future of data, from multimodal and agentic reasoning in rich RL environments to the need for hyper-specialized expertise for scientific discovery.

How Reinforcement Learning can Improve your Agent

How Reinforcement Learning can Improve your Agent

This talk addresses the unreliability of current AI agents, arguing that prompting is insufficient. It posits that Reinforcement Learning (RL) is the most promising solution, delving into the mechanisms of RLHF and RLVR. The core challenge identified is 'reward hacking', and the discussion explores future directions to overcome it, such as RLAIF, data augmentation, and the development of interactive, online models that can learn in real-time.

AI Changed Stack Overflow for the Better

AI Changed Stack Overflow for the Better

Stack Overflow CEO Prashanth Chandrashekar discusses the platform's evolution in the AI era, focusing on licensing its trusted Q&A corpus to major AI labs, expanding beyond Q&A to include discussions and live chat, and the critical role of its enterprise solution in powering internal AI agents. A key insight from their upcoming developer survey reveals that while AI adoption for coding is rising, developer trust in AI-generated output is declining, reinforcing Stack Overflow's position as a vital source of human-curated, reliable knowledge.