Optimization

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

A deep dive into the challenges and solutions for efficient Reinforcement Learning (RL) in enterprise settings. The talk contrasts synchronous and asynchronous RL, explains the critical trade-off of "staleness" versus stability, and details a first-principles system model used to optimize GPU allocation for maximum throughput.

I’m Teaching AI Self-Improvement Techniques

I’m Teaching AI Self-Improvement Techniques

Aman Khan from Arize discusses the challenges of building reliable AI agents and introduces a novel technique called "metaprompting". This method uses continuous, natural language feedback to optimize an agent's system prompt, effectively training its "memory" or context, leading to significant performance gains even for smaller models.

Solving AI Video: How Fal.ai is making AI Video Generation Fatser & Easier

Solving AI Video: How Fal.ai is making AI Video Generation Fatser & Easier

Fal co-founder Burkay Gur and head of engineering Batuhan Taskaya discuss their journey building a high-performance generative media cloud. They cover their strategic pivot to media models, core optimization principles born from early GPU scarcity, and the development of a customer-obsessed culture to navigate the fast-paced AI model landscape.

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

A deep dive into SGLang, an open-source serving framework for LLMs. This summary covers its core features, history, performance optimization techniques like CUDA Graph and Eagle 3 speculative decoding, and how to contribute to the project.