Benchmarks

OpenAI, Oracle & AMD shake up AI

OpenAI, Oracle & AMD shake up AI

The panel discusses the shifting AI hardware landscape as Oracle and OpenAI bet on AMD, challenging Nvidia's dominance. They also analyze a US government report on the risks of the DeepSeek model, debate the viability of Reflection AI's new $2B open-source venture, and dissect the story of a VC fund replacing analysts with AI agents.

The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

Edwin Chen, founder and CEO of Surge AI, shares the company's origin story, its rapid, bootstrapped growth, and its research-driven philosophy on data. He critiques traditional data labeling, explains why metrics like inter-annotator agreement fail for complex tasks, and offers a sharp analysis of benchmark hacking. Chen also details the future of data, from multimodal and agentic reasoning in rich RL environments to the need for hyper-specialized expertise for scientific discovery.

AGI progress, surprising breakthroughs, and the road ahead — the OpenAI Podcast Ep. 5

AGI progress, surprising breakthroughs, and the road ahead — the OpenAI Podcast Ep. 5

OpenAI's Chief Scientist Jakub Pachocki and researcher Szymon Sidor discuss the rapid progress towards AGI, focusing on the shift from traditional benchmarks to real-world capabilities like automating scientific discovery. They share insights into recent breakthroughs in mathematical and programmatic reasoning, highlighted by successes in competitions like the International Math Olympiad (IMO), and explore what's next for scaling and long-horizon problem-solving.

Building Better Language Models Through Global Understanding

Building Better Language Models Through Global Understanding

Dr. Mazi Fadai discusses the critical challenges in multilingual AI, including data imbalances and flawed evaluation methodologies. She argues that tackling these difficult multilingual problems is not only essential for global accessibility but also a catalyst for fundamental AI innovation, much like how machine translation research led to the Transformer architecture. The talk introduces new, more culturally aware evaluation benchmarks like Global MMLU and INCLUDE as a path toward building more robust and globally representative language models.