Real time ai

You Might Not Need 50 Diffusion Steps — Ziv Ilan, Nvidia

You Might Not Need 50 Diffusion Steps — Ziv Ilan, Nvidia

Ziv Ilan from NVIDIA details how latency in video diffusion models can be drastically reduced to achieve real-time generation. He presents a layered approach combining dynamic quantization for memory and speed, chunk-based caching to skip redundant denoising computations, and, most critically, step distillation—training models to achieve high-quality output in significantly fewer steps. These techniques, packaged in the open-source FastGen repository, offer additive performance gains, enabling real-time video on a single Blackwell B200 GPU.

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

Thor Schaeff from Google DeepMind demos the advanced audio AI stack, starting with a single API call to Gemini for rich transcription (speaker names, emotions, translation). He showcases speech generation directed by "director's notes" instead of a voice catalog, the real-time, sound-to-sound Gemini 1.5 Flash Live model, and a live demo of Gemini Live using the Lyria 2 model as a tool to generate a full song on stage.

This Startup Built the Infrastructure Powering Voice AI

This Startup Built the Infrastructure Powering Voice AI

In a YC Founder Fireside chat, AssemblyAI founder Dylan Fox discusses his journey from a solo founder in 2017 to leading a major voice AI infrastructure platform. He covers the early challenges, the technological shifts that fueled growth, the development of intelligent, promptable voice models, and the lessons learned in scaling a deep-tech company.

Real-Time Voice Agents in Production

Real-Time Voice Agents in Production

Panos Stravopodis, CTO of Elyos AI, shares the infrastructure and orchestration challenges of building production-ready voice AI agents. He details the four pillars for success—latency, consistency, context, and recovery—and provides engineering patterns for error handling, context management, and achieving conversational coherence in real-time systems.

Tavus: The AI Human Platform

Tavus: The AI Human Platform

Founders Hassaan Raza and Quinn Favret detail Tavus's evolution from a personalized video tool to an AI research lab building real-time, agentic AI humans. They explore the foundational models for perception and rendering, the launch of Tavus PALs, and their vision for AI humans as the next major computing interface.

Full Workshop: Realtime Voice AI — Mark Backman, Daily

Full Workshop: Realtime Voice AI — Mark Backman, Daily

An in-depth look at building real-time, production-grade voice AI agents using the open-source Pipecat framework. This summary covers the core concepts of voice AI pipelines, the shift to speech-to-speech models like Gemini Live, and advanced techniques for managing latency, context, and turn-taking.