On-Device AI | Tokenless

On device ai

May 03, 2026

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

Cormac Brick from Google's AI Edge team details the dual trends of on-device AI: large, system-level models like Gemma 4 enabling complex agent skills, and fine-tuned tiny LLMs for high-performance, in-app tasks. The summary covers the architecture of on-device function calling, the engineering trade-offs for edge deployment, and the practical workflow for fine-tuning and deploying models under 1B parameters on platforms like Android and iOS.

Apr 29, 2026

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

Maxime Labonne from Liquid AI shares a playbook for post-training frontier small models (under 1GB) for on-device deployment. The talk breaks down the LFM2.5 recipe, which includes on-policy preference alignment and agentic reinforcement learning, and addresses unique challenges at the 1B scale, such as capability interference and 'doom loops', offering concrete solutions to build efficient models for tasks like data extraction and tool use.

Apr 27, 2026

Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind

Cassidy Hardin from Google DeepMind introduces Gemma 4, a new family of open-weight models with significant architectural and performance improvements. This summary covers the four new models (31B Dense, 26B MoE, and two "Effective" on-device models), deep dives into architectural changes like mixed global/local attention and Per-Layer Embeddings (PLE), and details the new native multimodal capabilities for vision and audio.

Apr 20, 2026

Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI

Adria Grondin, developer of the Locally AI app, provides a technical walkthrough on running large language models like Google's Gemma on an iPhone using Apple's MLX framework. The talk covers the necessary tools, performance expectations, the importance of quantization, and the growing MLX ecosystem.

Apr 20, 2026

Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind

A deep dive into Google DeepMind's Gemma 4, the latest family of open models. This summary covers the new model architectures like per-layer embeddings, on-device agentic capabilities, multimodal features, and the growing ecosystem of fine-tuned applications from medicine to sovereign AI.

Jan 16, 2026

Claude Cowork analysis & Apple picks Gemini

The panel discusses Anthropic's Claude Cowork and the challenge of user trust in AI agents for everyday tasks. They then analyze the Apple-Google partnership to integrate Gemini into Siri, debating its implications for edge AI, privacy, and hardware limitations. Finally, they explore Linus Torvalds' use of AI for "vibe coding," considering its impact on hobbyist programming and entrepreneurship versus the current limitations in producing production-ready software.

← Previous Next →