Transformers

Jan 15, 2026

What are State Space Models? Redefining AI & Machine Learning with Data

State Space Models (SSMs) are emerging as a powerful and efficient alternative to Transformers for handling sequential data. Aaron Baughman explains the core concepts of SSMs, their mathematical foundations, and how architectures like S4 and Mamba address the memory and scalability challenges inherent in Transformers, leading to a new generation of faster, more intelligent hybrid AI models.

Jan 09, 2026

The Limits of Today’s AI Models

Karan Goel, CEO of Cartesia, discusses the fundamental limitations of Transformer architectures, arguing they behave more like retrieval systems than learning systems. He explains how State Space Models (SSMs) enable compression and abstraction, and why Cartesia is tackling multimodal intelligence by first solving for voice AI, aiming to develop a transferable 'recipe' for end-to-end representation learning.

Dec 13, 2025

The Mathematical Foundations of Intelligence [Professor Yi Ma]

Professor Yi Ma challenges our understanding of intelligence, proposing a unified mathematical theory based on two principles: parsimony and self-consistency. He argues that current large models merely memorize statistical patterns in already-compressed human knowledge (like text) rather than achieving true understanding. This framework re-contextualizes deep learning as a process of compression and denoising, allowing for the derivation of Transformer architectures like CRATE from first principles, paving the way for a more interpretable, white-box approach to AI.

Dec 13, 2025

The Mathematical Foundations of Intelligence [Professor Yi Ma]

Professor Yi Ma presents a unified mathematical theory of intelligence built on two principles: parsimony and self-consistency. He challenges the notion that large language models (LLMs) understand, arguing they are sophisticated memorization systems, and demonstrates how architectures like the Transformer can be derived from the first principle of compression.

Dec 13, 2025

The Mathematical Foundations of Intelligence [Professor Yi Ma]

Professor Yi Ma presents a unified mathematical theory of intelligence based on two principles: Parsimony and Self-Consistency. He argues that current AI, particularly LLMs, excels at memorization by compressing already-compressed human knowledge (text), but fails at true abstraction and understanding. His framework, centered on maximizing the coding rate reduction of data, provides a first-principles derivation for architectures like Transformers (CRATE) and explains phenomena like the effectiveness of gradient descent through the concept of benign non-convex landscapes.

Oct 11, 2025

Sparse Activation is the Future of AI (with Adrian Kosowski)

Adrian Kosowski from Pathway explains their groundbreaking research on sparse activation in AI, moving beyond the dense architectures of transformers. Their model, Baby Dragon Hatchling (BDH), mimics the brain's efficiency by activating only a small fraction of its artificial neurons, enabling a new, more scalable, and compositional approach to reasoning that isn't confined by the vector space limitations of current models.