Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI
Adria Grondin, developer of the Locally AI app, provides a technical walkthrough on running large language models like Google's Gemma on an iPhone using Apple's MLX framework. The talk covers the necessary tools, performance expectations, the importance of quantization, and the growing MLX ecosystem.