Music Generation

Jun 09, 2026

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

Thor Schaeff from Google DeepMind demos the advanced audio AI stack, starting with a single API call to Gemini for rich transcription (speaker names, emotions, translation). He showcases speech generation directed by "director's notes" instead of a voice catalog, the real-time, sound-to-sound Gemini 1.5 Flash Live model, and a live demo of Gemini Live using the Lyria 2 model as a tool to generate a full song on stage.

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind