Chatbot arena

What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench

What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench

Despite benchmarks showing relentless progress, many users remain dissatisfied with LLM responses in real-world scenarios. This summary explores two key analyses—a custom 'nonsense question' benchmark and trends from Chatbot Arena's 'dislike both' data—to reveal the persistent gaps in model reasoning, reliability, and domain-specific understanding.