Evals

Sep 03, 2025

Build Hour: Reinforcement Fine-Tuning

A deep dive into Reinforcement Fine-Tuning (RFT), covering how to set up tasks, design effective graders, and run efficient training loops to improve model reasoning, based on a live demonstration from OpenAI's Build Hours.

Aug 23, 2025

Perceptual Evaluations: Evals for Aesthetics — Diego Rodriguez, Krea.ai

KREA.ai's cofounder Diego Rodriguez discusses the critical failure of current AI evaluation metrics in understanding human perception and aesthetics, advocating for a new paradigm of personalized, perceptually-aware evals.

Jul 22, 2025

AI Agent Development Tradeoffs You NEED to Know

Sherwood Callaway of 11X discusses the architecture of "Alice," an AI Sales Development Representative. He covers the practical decision to use LangGraph for its reliability in production, the challenges of infrastructure and observability when using hosted agent platforms, and their methodology for running Evals to mitigate hallucinations by comparing generated content against source data.

← Previous