Red teaming

915: How to Jailbreak LLMs (and How to Prevent It) — with Michelle Yi

915: How to Jailbreak LLMs (and How to Prevent It) — with Michelle Yi

Tech leader and investor Michelle Yi discusses the critical technical aspects of building trustworthy AI systems. She delves into adversarial attack and defense mechanisms, including red teaming, data poisoning, prompt stealing, and "slop squatting," and explores how advanced concepts like Constitutional AI and World Models can create safer, more reliable AI.

912: In Case You Missed It in July 2025  — with Jon Krohn (@JonKrohnLearns)

912: In Case You Missed It in July 2025 — with Jon Krohn (@JonKrohnLearns)

A review of five key interviews covering the importance of data-centric AI (DMLR) in specialized fields like law, the challenges of AI benchmarking, strategies for domain-specific model selection using red teaming, the power of AI in predicting human behavior, and the shift towards building causal AI models.

How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco

How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco

A security analysis of YC AI agents reveals that the most critical vulnerabilities are not in the LLM itself, but in the surrounding infrastructure. This breakdown of a red teaming exercise, where 7 out of 16 agents were compromised, highlights three common and severe security flaws: cross-user data access (IDOR), remote code execution via insecure sandboxes, and server-side request forgery (SSRF).