Apple silicon

Frontier AI at Home — Alex Cheema, EXO Labs

Frontier AI at Home — Alex Cheema, EXO Labs

Alex Cheema from EXO Labs explores the path to a 100x improvement in the price-performance of running frontier AI models locally. The talk covers full-stack optimization strategies, including kernel fusion for a 30% performance boost, RDMA for scalable tensor parallelism, and a novel approach of splitting prefill and decode phases across heterogeneous hardware (e.g., an RTX GPU and Mac Studios) to significantly speed up large-prompt inference.

MLX Genmedia — Prince Canuma, Arcee

MLX Genmedia — Prince Canuma, Arcee

A tour of MLX, the on-device AI framework for Apple Silicon. This talk explores real-world applications from real-time vision and multimodal omni models to sub-100ms speech synthesis and video generation, all running locally. It highlights breakthrough techniques like Turbo Quant for 1M context and showcases community projects in robotics and native apps, arguing for a future where powerful AI runs without the cloud.

Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI

Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI

Adria Grondin, developer of the Locally AI app, provides a technical walkthrough on running large language models like Google's Gemma on an iPhone using Apple's MLX framework. The talk covers the necessary tools, performance expectations, the importance of quantization, and the growing MLX ecosystem.