Agents of Chaos: Red-Teaming Study on AI Agent Security

A research team from Northeastern University, MIT, Harvard, Stanford and other institutions published a red-teaming study on AI agent security. They deployed agents with persistent memory, email accounts, Discord access, filesystems, and shell execution capabilities in a live environment for a two-week security testing period.

Key Findings

The research team documented 11 representative case studies, revealing major security issues:

Experimental Setup

The study used the OpenClaw framework, deploying agents in isolated virtual machine environments. Each agent had:

Key Insights

The study found agents operate at Mirsky’s L2 autonomy level: capable of autonomously executing sub-tasks (sending emails, executing commands), but lacking the self-model to recognize when tasks exceed their competence or reliably determine when to hand control back to humans.

These findings reveal security, privacy, and governance vulnerabilities in realistic AI agent deployments, calling for urgent attention from legal scholars, policymakers, and researchers across disciplines.

Read the full report: https://agentsofchaos.baulab.info/report.html

← All articles