Distillation

You Might Not Need 50 Diffusion Steps — Ziv Ilan, Nvidia

You Might Not Need 50 Diffusion Steps — Ziv Ilan, Nvidia

Ziv Ilan from NVIDIA details how latency in video diffusion models can be drastically reduced to achieve real-time generation. He presents a layered approach combining dynamic quantization for memory and speed, chunk-based caching to skip redundant denoising computations, and, most critically, step distillation—training models to achieve high-quality output in significantly fewer steps. These techniques, packaged in the open-source FastGen repository, offer additive performance gains, enabling real-time video on a single Blackwell B200 GPU.

Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana)

Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana)

Sander Dieleman from Google DeepMind provides a behind-the-scenes look at the key components of training large-scale diffusion models for audio-visual data. The talk covers the entire pipeline, from the critical role of data curation and latent representations to the mechanics of diffusion, network architectures, sampling with guidance, and advanced control signals.