User feedback

What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench

What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench

Despite benchmarks showing relentless progress, many users remain dissatisfied with LLM responses in real-world scenarios. This summary explores two key analyses—a custom 'nonsense question' benchmark and trends from Chatbot Arena's 'dislike both' data—to reveal the persistent gaps in model reasoning, reliability, and domain-specific understanding.

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

Researchers Christina Kim and Laurentia Romaniuk discuss the development of GPT-5.1, focusing on the shift to making reasoning models the default. They explore the nuanced concept of model "personality" as a blend of response style and the entire user experience, and detail the ongoing work of balancing model steerability with safety.

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

OpenAI's Christina Kim (Research) and Laurentia Romaniuk (Product) discuss the development of GPT-5.1, detailing the shift to universal "reasoning models" to enhance both IQ and EQ. They explore the nuances of "model personality," the technical challenges of balancing steerability with safety, and how features like Memory create a more personalized, context-aware user experience.