Transformers

The Mathematical Foundations of Intelligence [Professor Yi Ma]

The Mathematical Foundations of Intelligence [Professor Yi Ma]

Professor Yi Ma challenges our understanding of intelligence, proposing a unified mathematical theory based on two principles: parsimony and self-consistency. He argues that current large models merely memorize statistical patterns in already-compressed human knowledge (like text) rather than achieving true understanding. This framework re-contextualizes deep learning as a process of compression and denoising, allowing for the derivation of Transformer architectures like CRATE from first principles, paving the way for a more interpretable, white-box approach to AI.

The Mathematical Foundations of Intelligence [Professor Yi Ma]

The Mathematical Foundations of Intelligence [Professor Yi Ma]

Professor Yi Ma presents a unified mathematical theory of intelligence built on two principles: parsimony and self-consistency. He challenges the notion that large language models (LLMs) understand, arguing they are sophisticated memorization systems, and demonstrates how architectures like the Transformer can be derived from the first principle of compression.

The Mathematical Foundations of Intelligence [Professor Yi Ma]

The Mathematical Foundations of Intelligence [Professor Yi Ma]

Professor Yi Ma presents a unified mathematical theory of intelligence based on two principles: Parsimony and Self-Consistency. He argues that current AI, particularly LLMs, excels at memorization by compressing already-compressed human knowledge (text), but fails at true abstraction and understanding. His framework, centered on maximizing the coding rate reduction of data, provides a first-principles derivation for architectures like Transformers (CRATE) and explains phenomena like the effectiveness of gradient descent through the concept of benign non-convex landscapes.

Sparse Activation is the Future of AI (with Adrian Kosowski)

Sparse Activation is the Future of AI (with Adrian Kosowski)

Adrian Kosowski from Pathway explains their groundbreaking research on sparse activation in AI, moving beyond the dense architectures of transformers. Their model, Baby Dragon Hatchling (BDH), mimics the brain's efficiency by activating only a small fraction of its artificial neurons, enabling a new, more scalable, and compositional approach to reasoning that isn't confined by the vector space limitations of current models.

The Day AI Solves My Puzzles Is The Day I Worry (Prof. Cristopher Moore)

The Day AI Solves My Puzzles Is The Day I Worry (Prof. Cristopher Moore)

Professor Cristopher Moore of the Santa Fe Institute discusses the surprising effectiveness of AI, arguing it stems from the rich, non-random structure of the real world. He explores the limits of current models, the nature of intelligence as creative problem-solving and abstraction, the importance of grounding and shared reality, and the profound implications of computational irreducibility and the need for algorithmic transparency in high-stakes applications.

The Moonshot Podcast Deep Dive: Jeff Dean on Google Brain’s Early Days

The Moonshot Podcast Deep Dive: Jeff Dean on Google Brain’s Early Days

Google DeepMind’s Chief Scientist Jeff Dean discusses the origins of his work on scaling neural networks, the founding of the Google Brain team, the technical breakthroughs that enabled training massive models, the development of TensorFlow and TPUs, and his perspective on the evolution and future of artificial intelligence.