Reaper: Building a Fuzzer for the AI Era

Top 5 Takeaways

Reaper Is Built for Agents to Test Software - Reaper is a fuzzing tool designed to be used by agents to autonomously explore and stress-test software systems.

It Moves Beyond Static Input Fuzzing - Unlike traditional fuzzers, Reaper allows for context-aware, adaptive input mutation to track evolving software states and decision paths, perfect for modern, dynamic apps.

Agents Can Simulate Attacks at Scale - Reaper allows AI agents to simulate real-world adversarial pressure: modifying inputs, triggering edge behavior, and learning over time, just like a red team would.

It Enables Proactive, Autonomous Hardening - With Reaper, AI agents can autonomously detect misfires, logic flaws, and state corruption, helping teams catch vulnerabilities before deployment without human orchestration.

It Represents a New Security Paradigm - Reaper exemplifies a shift toward purpose-built security tools for AI-native environments, marking a new era where agents not only defend, but also rigorously test and validate software themselves.

Introduction

Fuzzing has long been a cornerstone of application security and an essential technique for uncovering edge cases, crashes, and unpredictable behaviors in software. But traditional software fuzzers were built for a different era. They are not built for use by dynamic, agent-driven systems.

That’s where Josh Larsen’s Reaper project, unveiled at the Security Frontiers virtual event, comes in. Reaper isn’t another fuzzer designed for humans. It’s built for machines. More specifically, it’s a fuzzing platform intended to be used by autonomous agent tools that reason, plan, and act on their own. Reaper’s architecture allows these agents to simulate edge cases, mutate inputs, and run tests independently without manual intervention.

This shift isn’t just technical. It’s philosophical. Reaper reflects a new paradigm in which software testing isn’t operated on agents, but by them. It’s a purpose-built, agent-operable fuzzer for modern systems, and one of the clearest examples of how security tools adapt to meet the AI-native future.

Why Traditional Fuzzing Falls Short

Legacy fuzzers were designed for a simpler world with predictable inputs and deterministic outputs. Their strength is flooding binaries, APIs, or form fields with malformed or randomized data to surface crashes and exceptions. But those tactics don’t translate well to the dynamic complexity of today’s software environments.

Modern systems often rely on autonomous agents to orchestrate tasks, trigger workflows, and make decisions. These agents don’t just react to inputs; they reason over time, adjust based on memory, and interact with other tools. Inputs aren’t static strings but evolving chains of tasks, conversations, or API interactions that defy traditional fuzzing models.

And while testing these modern systems is increasingly important, the existing crop of fuzzers wasn’t built to support agents as users. They lack the architecture to expose relevant hooks, persist state, or adapt inputs based on evolving context, all capabilities an intelligent agent would need to conduct meaningful tests.

That’s what makes Reaper so significant. It’s a fuzzer designed to be driven by agents, allowing them to apply fuzzing techniques to complex, adaptive systems. Doing so opens the door for automated testing that keeps pace with the systems it’s meant to secure.

What Reaper Does Differently

Most fuzzers are built to be driven by humans, with static workflows that throw inputs at APIs or binaries and wait for something to break. Reaper flips that model by being designed to be usable by AI agents. It exposes structured interfaces and feedback loops that make it operable by autonomous systems capable of planning, adapting, and responding in real time.

Rather than randomly mutating inputs and logging crashes, Reaper enables agents to construct and explore sequences of actions in dynamic environments from multi-step workflows to decision trees and API-based interactions. It supports persistent state tracking, so an agent can see how a system’s memory evolves over time, probe edge cases, and decide what to test next.

Where legacy fuzzers stop at malformed inputs, Reaper empowers its human or machine operators to uncover subtler, systems-level issues: prompt drift, task misalignment, or cascading logic failures. It’s not built to simulate an agent’s behavior. It’s built to extend an agent’s capabilities, letting it break things more intelligently.

While Reaper is optimized for AI-native environments, its architecture makes it useful in any system where reasoning, not just code execution, drives the outcome. The goal isn’t just to crash things. It’s to catch fragile logic before it hits production and give intelligent systems the means to do that independently.

View Josh’s presentation of Reaper at Security Frontiers below.

Technical Architecture Highlights

Reaper is engineered to enable deep, adaptive testing driven by agents across modern, dynamic software environments. Its architecture is built around long-lived state tracking, allowing an autonomous agent to observe how a system evolves across tasks, API calls, memory changes, and decision loops. That persistent context is key to spotting subtle issues like logic drift or brittle edge conditions.

Instead of relying on blind input mutation, Reaper exposes interfaces that let agents craft intelligent test paths based on observed system behavior. Agents can respond to low-confidence outputs, identify repeated failure patterns, or trigger probes when workflows stall, bringing a red team mindset to fuzzing, but doing it autonomously and programmatically.

The system accommodates deterministic and probabilistic testing models, which is essential when working with systems where outcomes vary based on reasoning rather than hardcoded logic. And because Reaper is built for modern security operations, it fits cleanly into containerized workflows, API-connected environments, and local orchestration frameworks.

The result is a fuzzer that doesn’t test agents, but empowers agents to test the systems around them more intelligently, more iteratively, and with greater autonomy than traditional tooling allows.

Why It Matters for Security Teams

AI agents aren’t just on the horizon; they’re already here. Security teams are deploying them in SOCs, remediation workflows, and threat intel analysis pipelines. These agents increasingly act autonomously, making decisions, interacting with APIs, and updating systems independently. But while their capabilities grow, our ability to test the environments they operate in has lagged behind.

Reaper is software fuzzing built for the agentic AI era: It’s a tool that agents can use to test the software systems they’re working with. Whether probing the reliability of a cloud configuration API or fuzzing inputs in an automated response pipeline, Reaper gives agents the means to simulate pressure, explore edge cases, and uncover failures independently.

For security teams, this matters. The traditional testing model assumes human control and oversight. But in agent-led operations, we need testing infrastructure that works at machine speed and scale. Reaper empowers agents to do that by testing themselves and proactively hardening workflows before fragile logic leads to real-world failure.

Broader Impact: Security Tools for a New Paradigm

Reaper offers a glimpse into how security infrastructure must evolve. As AI-native systems become more common, the industry needs tooling that doesn’t just accommodate agents but empowers them. Reaper was designed from the ground up for that world, a fuzzer that agents can use to probe, test, and validate the systems they interact with.

That’s a major shift. Traditional fuzzers assume a human at the helm. Reaper assumes the opposite: that a machine might be driving the testing process. It introduces new primitives for agent-aligned security work tools that think in sequences, reason about outcomes, and adapt over time.

This theme ran throughout the Security Frontiers event: builders creating tools for problems that didn’t exist a few years ago. Josh Larsen’s Reaper project stood out not because it extended old ideas but because it offered a fresh foundation built with autonomy in mind.

If agents are changing how software behaves, Reaper is changing how we test what they touch, not by inspecting the agent, but by handing it the keys to test everything else.

Build, Break, Repeat

Security innovation has always depended on a healthy cycle of building new systems, then rigorously testing those new creations to see what breaks. That cycle is more critical than ever in the AI era, where software is being built and updated faster.

Reaper was born from that philosophy, but with a twist: it’s a tool designed for autonomous agents to wield. It equips agents with the ability to fuzz new software, exploring APIs, workflows, and connected systems with the same scrutiny a human red teamer might apply.

It’s not just a new way to test, it’s a shift in who’s doing the testing. Reaper gives agents the capability to question, prod, and validate the world they operate in. That’s a provocative idea, and exactly the kind of thinking the security community needs now, not more dashboards, but infrastructure that keeps pace with the autonomy we’re deploying.

If you’re building agent systems, Reaper shouldn’t be an afterthought. It ensures those systems act safely and reliably, with a red team mindset built in.

Want to see Reaper in action? Watch Josh Larsen’s full session and explore more innovations from Security Frontiers, where the security community isn’t just talking about the future. They’re building it.

Reaper: Building a Fuzzer for the AI Era

Top 5 Takeaways

Introduction

Why Traditional Fuzzing Falls Short

What Reaper Does Differently

Technical Architecture Highlights

Why It Matters for Security Teams

Broader Impact: Security Tools for a New Paradigm

Build, Break, Repeat

Self-Guided Demo

Anthropic's Claude-Powered SOC: A Cool Build That Only an AI Company Could Pull Off

Teaching AI SOC Agents to Investigate: A Use Case in Action

Reaper: Building a Fuzzer for the AI Era

Subscribe to Our Newsletter