Red Teaming

Sep 25, 2025

When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs

Hanna Kim from KAIST explores the significant cybersecurity risks posed by web-enabled Large Language Model (LLM) agents. The research investigates how these agents, equipped with web search and navigation tools, can be misused to automate and scale cyberattacks involving personal data, such as PII collection, impersonation, and spear-phishing, while easily bypassing existing safety measures.

Sep 22, 2025

Ethical Hacking in Action: Red Teaming, Pen Testing, & Cybersecurity

Explore the core tasks of ethical hacking, from vulnerability scanning to red teaming. This guide covers engagement structure, hacker methodologies, key frameworks like MITRE ATT&CK, and the essential tools for cybersecurity professionals.

Aug 19, 2025

915: How to Jailbreak LLMs (and How to Prevent It) — with Michelle Yi

Tech leader and investor Michelle Yi discusses the critical technical aspects of building trustworthy AI systems. She delves into adversarial attack and defense mechanisms, including red teaming, data poisoning, prompt stealing, and "slop squatting," and explores how advanced concepts like Constitutional AI and World Models can create safer, more reliable AI.

Aug 08, 2025

912: In Case You Missed It in July 2025 — with Jon Krohn (@JonKrohnLearns)

A review of five key interviews covering the importance of data-centric AI (DMLR) in specialized fields like law, the challenges of AI benchmarking, strategies for domain-specific model selection using red teaming, the power of AI in predicting human behavior, and the shift towards building causal AI models.

Jul 30, 2025

How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco

A security analysis of YC AI agents reveals that the most critical vulnerabilities are not in the LLM itself, but in the surrounding infrastructure. This breakdown of a red teaming exercise, where 7 out of 16 agents were compromised, highlights three common and severe security flaws: cross-user data access (IDOR), remote code execution via insecure sandboxes, and server-side request forgery (SSRF).

← Previous