Tensor parallelism

Frontier AI at Home — Alex Cheema, EXO Labs

Frontier AI at Home — Alex Cheema, EXO Labs

Alex Cheema from EXO Labs explores the path to a 100x improvement in the price-performance of running frontier AI models locally. The talk covers full-stack optimization strategies, including kernel fusion for a 30% performance boost, RDMA for scalable tensor parallelism, and a novel approach of splitting prefill and decode phases across heterogeneous hardware (e.g., an RTX GPU and Mac Studios) to significantly speed up large-prompt inference.