Mixture of experts

Granite 4.0: Small AI Models, Big Efficiency

Granite 4.0: Small AI Models, Big Efficiency

IBM's Granite 4.0 models introduce a groundbreaking hybrid architecture combining Mamba-2 and Transformer blocks with a Mixture of Experts (MoE) design. This approach enables smaller models to achieve superior performance, speed, and memory efficiency, even outperforming much larger models on key enterprise tasks while running on consumer-grade hardware.

This week in AI models: Granite 4.0, Claude 4.5, Sora 2

This week in AI models: Granite 4.0, Claude 4.5, Sora 2

A deep dive into the latest AI model releases, including IBM's hyper-efficient Granite 4.0, Anthropic's code-focused Claude 4.5, and OpenAI's consumer-centric Sora 2. The discussion covers the strategic differentiation between major AI labs, the future of open-source, the rise of AI e-commerce agents, and the emerging cybersecurity challenges of social engineering AI.

Upwork's Radical Bet on Reinforcement Learning: Building RLEF from Scratch | Andrew Rabinovich (CTO)

Upwork's Radical Bet on Reinforcement Learning: Building RLEF from Scratch | Andrew Rabinovich (CTO)

Andrew Rabinovich, CTO and Head of AI at Upwork, details their strategy for building AI agents for digital work. He introduces a custom reinforcement learning approach called RLEF (Reinforcement Learning from Experience), explains why digital work marketplaces are ideal training grounds, and shares his vision for a future where AI delivers finished projects, orchestrated by a meta-agent named Uma.

7 AI Terms You Need to Know: Agents, RAG, ASI & More

7 AI Terms You Need to Know: Agents, RAG, ASI & More

A deep dive into seven essential AI concepts shaping the future of intelligent systems, including Agentic AI, RAG, Mixture of Experts (MoE), and the theoretical frontier of Artificial Superintelligence (ASI).

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

A technical breakdown and comparison of the architectures, training methodologies, and post-training techniques of three leading open-source models: OpenAI's GPT-OSS, Alibaba's Qwen-3, and DeepSeek V3. The summary explores their different approaches to Mixture-of-Experts, long-context, and attention mechanisms.

OpenAI dropped GPT-5, is AGI here?

OpenAI dropped GPT-5, is AGI here?

In this analysis, experts Bryan Casey, Mihai Criveti, and Chris Hay dissect the OpenAI GPT-5 release, comparing its capabilities against Anthropic's Claude Opus 4.1. While GPT-5 introduces significant improvements in accessibility, agentic capabilities, and reliability, the consensus is that it does not yet dethrone Claude as the daily driver for developers due to key differences in user experience and workflow management.