State space models

What are State Space Models? Redefining AI & Machine Learning with Data

What are State Space Models? Redefining AI & Machine Learning with Data

State Space Models (SSMs) are emerging as a powerful and efficient alternative to Transformers for handling sequential data. Aaron Baughman explains the core concepts of SSMs, their mathematical foundations, and how architectures like S4 and Mamba address the memory and scalability challenges inherent in Transformers, leading to a new generation of faster, more intelligent hybrid AI models.

The Limits of Today’s AI Models

The Limits of Today’s AI Models

Karan Goel, CEO of Cartesia, discusses the fundamental limitations of Transformer architectures, arguing they behave more like retrieval systems than learning systems. He explains how State Space Models (SSMs) enable compression and abstraction, and why Cartesia is tackling multimodal intelligence by first solving for voice AI, aiming to develop a transferable 'recipe' for end-to-end representation learning.

Granite 4.0: Small AI Models, Big Efficiency

Granite 4.0: Small AI Models, Big Efficiency

IBM's Granite 4.0 models introduce a groundbreaking hybrid architecture combining Mamba-2 and Transformer blocks with a Mixture of Experts (MoE) design. This approach enables smaller models to achieve superior performance, speed, and memory efficiency, even outperforming much larger models on key enterprise tasks while running on consumer-grade hardware.

929: Dragon Hatchling: The Missing Link Between Transformers and the Brain — with Adrian Kosowski

929: Dragon Hatchling: The Missing Link Between Transformers and the Brain — with Adrian Kosowski

Adrian Kosowski from Pathway introduces the Baby Dragon Hatchling (BDH), a groundbreaking, post-transformer architecture inspired by neuroscience. BDH leverages sparse, positive activation to mimic brain function, offering a path to limitless context, superior reasoning, and unprecedented computational efficiency, potentially solving key limitations of current large language models.