Latent Space

Tokenless

Apr 21, 2026

Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana)

Sander Dieleman from Google DeepMind provides a behind-the-scenes look at the key components of training large-scale diffusion models for audio-visual data. The talk covers the entire pipeline, from the critical role of data curation and latent representations to the mechanics of diffusion, network architectures, sampling with guidance, and advanced control signals.