Voice agents

Why TTS Models Now Look Like LLMs — Samuel Humeau, Mistral

Why TTS Models Now Look Like LLMs — Samuel Humeau, Mistral

Samuel Humeau from Mistral explains the dominant architecture for modern text-to-speech (TTS) systems, which mirrors large language models. He details how neural audio codecs solve the information density problem, the autoregressive transformer backbone for generation, and the streaming techniques used to achieve low perceived latency in voice agents. The talk uses Mistral's open-weight TTS model as a practical example.

Building Voice Agents Just Got Easier

Building Voice Agents Just Got Easier

Anoop Dawar from Deepgram discusses the evolution of voice AI, from basic transcription to sophisticated, real-time voice agents. He covers the key technical challenges in production, such as latency and interruption handling, and introduces Deepgram's Flux system. The talk concludes with a look at the future of speech-to-speech models that can understand emotional nuance, moving closer to passing the audio Turing Test.

Build Hour: Voice Agents

Build Hour: Voice Agents

A deep dive into building sophisticated voice agents using OpenAI's Realtime API and Agents SDK. The session covers architectural patterns like chained vs. end-to-end models, the use of multi-agent systems with handoffs for specialized tasks, and best practices for production including debugging with traces, implementing guardrails, and creating robust evaluations.

Full Workshop: Realtime Voice AI — Mark Backman, Daily

Full Workshop: Realtime Voice AI — Mark Backman, Daily

An in-depth look at building real-time, production-grade voice AI agents using the open-source Pipecat framework. This summary covers the core concepts of voice AI pipelines, the shift to speech-to-speech models like Gemini Live, and advanced techniques for managing latency, context, and turn-taking.