Model reliability

Inside the AI Black Box

Inside the AI Black Box

Emmanuel Ameisen of Anthropic's interpretability team explains the inner workings of LLMs, drawing analogies to biology. He covers surprising findings on how models plan, represent concepts across languages, and the mechanistic causes of hallucinations, offering practical advice for developers on evaluation and post-training strategies.

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Traditional benchmarks and leaderboards are insufficient for production AI. This summary details a practical, multi-layered evaluation strategy, moving from foundational system performance to factual accuracy and finally to safety and bias, using open-source tools like GuideLLM, lm-eval-harness, and Promptfoo.