Local llm

Frontier AI at Home — Alex Cheema, EXO Labs

Frontier AI at Home — Alex Cheema, EXO Labs

Alex Cheema from EXO Labs explores the path to a 100x improvement in the price-performance of running frontier AI models locally. The talk covers full-stack optimization strategies, including kernel fusion for a 30% performance boost, RDMA for scalable tensor parallelism, and a novel approach of splitting prefill and decode phases across heterogeneous hardware (e.g., an RTX GPU and Mac Studios) to significantly speed up large-prompt inference.

Build a Local LLM App in Python with Just 2 Lines of Code

Build a Local LLM App in Python with Just 2 Lines of Code

Distinguished Engineer Chris Hay demonstrates how to run and program Large Language Models (LLMs) locally in just two lines of Python code. The tutorial covers setting up a local environment with Ollama and UV, using a custom library for simplified interaction, and explores advanced topics like asynchronous streaming, persona customization with system prompts, and managing multi-turn conversations.

From 3 Months to 4 Days: How Dell Pro AI Studio Speeds AI Development (with Dell’s Experts)

From 3 Months to 4 Days: How Dell Pro AI Studio Speeds AI Development (with Dell’s Experts)

Dell's Shirish Gupta and Ish Shah discuss the complexities developers face in leveraging on-device accelerators like NPUs and GPUs. They introduce Dell ProAI Studio, a solution designed to abstract away hardware-specific toolchains, enabling developers to easily run AI workloads locally for benefits like speed, cost, security, and offline capability.