Multimodal ai

Why Tejal Patwardhan stopped underestimating the models - Episode 21

Why Tejal Patwardhan stopped underestimating the models - Episode 21

Tejal Patwardhan, head of OpenAI's frontier evals team, discusses the critical evolution of AI evaluations. She explains why traditional benchmarks fail as models become more capable, how OpenAI develops realistic, long-horizon tests (including groundbreaking wet lab experiments), and the implications of rapidly advancing multimodal and reasoning models for scientific discovery and the future of human work.

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

Thor Schaeff from Google DeepMind demos the advanced audio AI stack, starting with a single API call to Gemini for rich transcription (speaker names, emotions, translation). He showcases speech generation directed by "director's notes" instead of a voice catalog, the real-time, sound-to-sound Gemini 1.5 Flash Live model, and a live demo of Gemini Live using the Lyria 2 model as a tool to generate a full song on stage.

Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind

Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind

Patrick Löber from Google DeepMind provides a technical walkthrough of the Gemini API's "any-to-any" capabilities. The session covers multimodal understanding of complex documents, video, and audio; an agentic loop using function calling to trigger native image and speech generation; and the real-time, audio-to-audio Live API.

The Future of AI – Key Trends Shaping What’s Next • Ekaterina Sirazitdinova • YOW! 2025

The Future of AI – Key Trends Shaping What’s Next • Ekaterina Sirazitdinova • YOW! 2025

Ekaterina Sirazitdinova from NVIDIA provides a high-level overview of the latest trends shaping the future of AI, covering the evolution from early deep learning to the rise of agentic and physical AI, and diving deep into the critical optimization techniques required to deploy these powerful models efficiently.

Physical AI Forum | Builders Reveal the New Moat & Playbook | Creator & Founder's Cut | Mar 2026 |4K

Physical AI Forum | Builders Reveal the New Moat & Playbook | Creator & Founder's Cut | Mar 2026 |4K

In a live panel at the Physical AI Builders Forum, founders and operators in computer vision, robotics, and multimodal AI share their 2026 playbooks. The discussion covers the architectural differences between physical and generative AI, the strategic shift from frame AI to scene AI for enterprise value, and the critical skills needed to build and scale a modern AI business.

MLX Genmedia — Prince Canuma, Arcee

MLX Genmedia — Prince Canuma, Arcee

A tour of MLX, the on-device AI framework for Apple Silicon. This talk explores real-world applications from real-time vision and multimodal omni models to sub-100ms speech synthesis and video generation, all running locally. It highlights breakthrough techniques like Turbo Quant for 1M context and showcases community projects in robotics and native apps, arguing for a future where powerful AI runs without the cloud.