Model Optimization

Model optimization

Jun 16, 2026

You Might Not Need 50 Diffusion Steps — Ziv Ilan, Nvidia

Ziv Ilan from NVIDIA details how latency in video diffusion models can be drastically reduced to achieve real-time generation. He presents a layered approach combining dynamic quantization for memory and speed, chunk-based caching to skip redundant denoising computations, and, most critically, step distillation—training models to achieve high-quality output in significantly fewer steps. These techniques, packaged in the open-source FastGen repository, offer additive performance gains, enabling real-time video on a single Blackwell B200 GPU.

May 20, 2026

The Future of AI – Key Trends Shaping What’s Next • Ekaterina Sirazitdinova • YOW! 2025

Ekaterina Sirazitdinova from NVIDIA provides a high-level overview of the latest trends shaping the future of AI, covering the evolution from early deep learning to the rise of agentic and physical AI, and diving deep into the critical optimization techniques required to deploy these powerful models efficiently.

Sep 01, 2025

Small Language Models are the Future of Agentic AI Reading Group

This paper challenges the prevailing "bigger is better" narrative in AI, arguing that Small Language Models (SLMs) are not just sufficient but often superior for agentic AI tasks due to their efficiency, speed, and specialization. The discussion explores the paper's core arguments, counterarguments, and the practical implications of adopting a hybrid LLM-SLM approach.