Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind
Patrick Löber from Google DeepMind provides a technical walkthrough of the Gemini API's "any-to-any" capabilities. The session covers multimodal understanding of complex documents, video, and audio; an agentic loop using function calling to trigger native image and speech generation; and the real-time, audio-to-audio Live API.