Heterogeneous computing

Frontier AI at Home — Alex Cheema, EXO Labs

Frontier AI at Home — Alex Cheema, EXO Labs

Alex Cheema from EXO Labs explores the path to a 100x improvement in the price-performance of running frontier AI models locally. The talk covers full-stack optimization strategies, including kernel fusion for a 30% performance boost, RDMA for scalable tensor parallelism, and a novel approach of splitting prefill and decode phases across heterogeneous hardware (e.g., an RTX GPU and Mac Studios) to significantly speed up large-prompt inference.

Scaling the Next Paradigm of Heterogeneous Intelligence — Adrian Bertagnoli, Callosum

Scaling the Next Paradigm of Heterogeneous Intelligence — Adrian Bertagnoli, Callosum

Adrian Bertagnoli from Callosum argues that the era of scaling monolithic models on homogeneous GPU clusters is ending. He introduces "heterogeneous intelligence," a new paradigm where model architectures, chip types, and workflows are optimized together. By routing subtasks to the most efficient model and hardware, this approach achieves significant performance gains, as demonstrated by two key results: a 7x cost reduction in recursive reasoning tasks using Cerebras, and state-of-the-art performance on the Video Web Arena benchmark, outperforming leading GPT and Gemini models at a fraction of the cost and time.