Tts

Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind

Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind

Guillaume Vernade from Google DeepMind demonstrates a full generative media pipeline, using Gemini to read a public domain book and act as a master prompt engineer for other models. Imagen generates character portraits, Veo animates scenes into video, Lyria composes a unique soundtrack for each chapter, and a clever TTS trick creates a multi-character audiobook.

Why TTS Models Now Look Like LLMs — Samuel Humeau, Mistral

Why TTS Models Now Look Like LLMs — Samuel Humeau, Mistral

Samuel Humeau from Mistral explains the dominant architecture for modern text-to-speech (TTS) systems, which mirrors large language models. He details how neural audio codecs solve the information density problem, the autoregressive transformer backbone for generation, and the streaming techniques used to achieve low perceived latency in voice agents. The talk uses Mistral's open-weight TTS model as a practical example.

Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber

Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber

An in-depth look at Gabber's experience deploying the Orpheus text-to-speech model to production, covering latency optimization, high-fidelity LoRa-based voice cloning, and a cost-effective inference stack using vLLM and a consistent hash ring for load balancing.