On-Device AI | Tokenless

On device ai

Jun 10, 2026

Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind

Google DeepMind's Ian Ballantyne and Gus Martins introduce Gemma 4, a family of open models delivering state-of-the-art performance with remarkable size efficiency. They discuss how models like the 31B variant outperform competitors 2-20x its size while running on a single GPU, the shift to an Apache 2.0 license to foster sovereignty and adoption, and the new economics of running powerful agentic workloads on hardware ranging from a Pixel phone to a single enterprise GPU.

May 24, 2026

⚡️ Google's Open AI Strategy — Omar Sanseviero, Google DeepMind

An in-depth look at Gemma 4's novel transformer architecture with per-layer embeddings, enabling efficient parameter offloading for on-device inference. The discussion also covers its native multimodality, the state of fine-tuning, text-based diffusion models, and the growing intersection of research and engineering.

May 23, 2026

Prompt to Pipeline: Building with Google's Gen Media Stack — Paige & Guillaume, Google DeepMind

A comprehensive overview of Google DeepMind's latest advancements, featuring Paige Bailey demonstrating Gemini 1.5 Flash's cost-effective video analysis and AI Studio's single-prompt app generation. Guillaume Vernade showcases a full generative media pipeline, turning a public domain book into an illustrated, animated, and scored project using Gemini, Nano Banana, VO, and LIA. Ian Valentine closes with the power of Gemma 4, demonstrating on-device, multi-agent code generation and debugging without cloud APIs.

May 22, 2026

AI on Android: Ask me Anything — Florina Muntenescu & Oli Gaymond, Google DeepMind

Android provides a comprehensive AI strategy through AI Core, which manages the on-device Gemini Nano model. Developers can use the ML Kit GenAI APIs for easy access, with a hybrid inference option to fall back to the cloud for broader device support, ensuring both performance and reach.

May 11, 2026

MLX Genmedia — Prince Canuma, Arcee

A tour of MLX, the on-device AI framework for Apple Silicon. This talk explores real-world applications from real-time vision and multimodal omni models to sub-100ms speech synthesis and video generation, all running locally. It highlights breakthrough techniques like Turbo Quant for 1M context and showcases community projects in robotics and native apps, arguing for a future where powerful AI runs without the cloud.

May 05, 2026

Accelerating AI on Edge — Chintan Parikh and Weiyi Wang, Google DeepMind

A deep dive into Google's AI Edge stack for on-device AI, covering the new Gemma 4 models, the LiteRT framework for cross-platform deployment, and practical use cases in agent skills, tool calling, and hardware acceleration on CPUs, GPUs, and NPUs.