TL;DR

LLM-driven exploit generation has fundamentally changed the economics of cyberattacks, making initial access faster, cheaper, and scalable at machine speed. At the same time, internal breach detection rates have barely improved, forcing security teams to operationalize an “assume breach” mindset focused on post-compromise detection. This blog explores why modern SOCs need both AI-driven alert investigation and continuous threat hunting to catch the threats that never trigger an alert.

Key Takeaways

Assume-breach in 2026 is more important than ever. It's the math: Anthropic's Claude Mythos Preview produced a 90x jump in successful exploit generation, while internal breach detection barely improved according to the latest M-Trends report. Assuming attackers are already inside your network means you need to emphasize detection in two ways: AI SOC Analyst clears the alert queue, and AI Threat Hunter runs hypothesis-driven hunts on the alerts that never fire.

  • Expertise is now scalable. Anthropic's Claude Mythos Preview produced 181 working Firefox JS shell exploits on the same benchmark where Opus 4.6 produced just 2, a roughly 90x jump in a single model generation. Initial access is now a matter of how much compute budget attackers have.
  • Detection has stalled. Mandiant's 2026 M-Trends report shows internal detection ticked up from 43% in 2024 to 52% in 2025, while the global median dwell time climbed from 11 days to 14, and dwell on cyber-espionage cases ran 122 days.
  • Reactive and proactive detection. AI SOC Analyst clears the alert queue you already have, while AI Threat Hunter and AI TI Analyst run hypothesis-driven, federated hunts to find the threats that never fired an alert.

Introduction

Anthropic's Frontier Red Team published a chart earlier this year that should be every CISO's lock screen. On the same Firefox JavaScript-engine benchmark, Opus 4.6 produced just 2 working exploits across several hundred attempts. Mythos Preview produced 181, plus 29 more reaching register control: a roughly 90x jump in success. The expertise bottleneck on initial access is gone as attackers will be increasingly limited by their compute budget, not by how many skilled operators they have. The post-compromise math still favors defenders if they can detect malicious actors before they achieve their goals.

Attackers Got Faster. Defenders Didn't.

Why Exploiting Zero Days Will Become More Common

For two decades, the attacker economics in your environment ran on social engineering. Someone had to click on a phishing email or hand over a credential, and everything downstream waited on a human making a mistake. The 2020 Verizon DBIR put social engineering at 22% of breaches, almost all of it phishing-led and gated by a human click.

Phishing will still be a threat going forward, but in the Mythos era, attackers will be able to more easily find and exploit zero-day vulnerabilities. That means they have more means of gaining initial access. 

Anthropic's Frontier Red Team put it bluntly in their February 2026 zero-days post: "language models are already capable of identifying novel vulnerabilities, and may soon exceed the speed and scale of even expert human researchers." Two months later, the Mythos Preview benchmark backed the warning.

On the same Firefox 147 JavaScript-engine target, Opus 4.6 produced just 2 working exploits across several hundred attempts; Mythos Preview produced 181, plus 29 more reaching register control. Anthropic's engineers describe asking Mythos Preview to find remote code execution vulnerabilities overnight and waking up to working exploits the following morning, work that expert penetration testers said would have taken weeks.

If you've been telling your CISO this wave is still hypothetical, Anthropic's November 2025 disclosure is the one that ends that conversation. Anthropic disrupted a Chinese state-sponsored campaign that used Claude Code for reconnaissance, exploitation, lateral movement, and exfiltration. The campaign profile was unmistakably machine-scale:

  • 80–90% of tactical operations executed by Claude
  • ~30 target entities across the campaign
  • 4–6 critical decision points where human operators stepped in

Starting in late December 2025, attackers running a Claude instance whose guardrails they had bypassed compromised ten Mexican government bodies and a financial institution and exfiltrated roughly 150 GB of data covering an estimated 195 million identities. Initial access has flipped from a probability problem into a tooling problem, and the tooling now operates at machine speed against named targets.

Why Has Internal Detection Stalled?

Half of the time, someone else has to tell you that you’ve been breached. Per Mandiant's M-Trends 2026 report:

  • Internal detection ticked up from 43% to 52% (2024 to 2025), the single best year-over-year improvement Mandiant has logged in years, but …
  • External parties still account for nearly half of all 2025 breach notifications (48%). This means that half of the time, victims are notified by someone else.
  • Global median dwell climbed from 11 to 14 days

Let’s see how the calculus has changed: Attacker capability moves visibly with every frontier model release, meaning that they have more opportunities to breach your network. But internal detection rates move by a few points per year, sometimes in the wrong direction.

The threats that never trip an alert are part of the job that has been quietly underfunded for a decade. The concept of “assume breach” has been around for more than a decade, but it’s more important than ever. 

To understand why, let’s look at the intruder’s dilemma.

One Initial Access. Eleven Chances to Detect.

Why Is Initial Access Now the Easy Part?

The familiar "defenders have to be right 100% of the time, attackers have to be right once" framing is true, but that doesn’t mean everything is easy for attackers once they’re in. The intruder’s dilemma is that in order for them to achieve their goals, they have a lot of chances to set off alarms. 

Throughout an entire post-compromise sequence, defenders need only detect one of the attacker's actions to respond and prevent real damage. The problem is, as we covered with the M-Trends report above, land-and-pivot works for attackers because perimeters are generally hard while interiors are soft. Many organizations need to drastically improve their internal detection capabilities to prevent damage once attackers are able to enter the network more easily. That asymmetry only widens once initial access gets cheap with more powerful AI models.

To improve your detection and response, the MITRE ATT&CK helps and turns into a checklist you can actually defend against. Past initial access, the framework lays out eleven tactic categories where defenders have a detection chance:

  1. Execution
  2. Persistence
  3. Privilege Escalation
  4. Defense Evasion
  5. Credential Access
  6. Discovery
  7. Lateral Movement
  8. Collection
  9. Command and Control
  10. Exfiltration
  11. Impact

That's eleven chances for the attacker to fall over a tripwire on the way to whatever they came for.

What Do You Do When a Threat Doesn’t Trigger an Alert?

The threats most likely to hurt you in 2026 aren't necessarily the ones screaming in your queue. Lateral movement off legitimate credentials and identity-provider drift survives by not looking like activity in the first place. If they can, attackers will use common tools and protocols to hide their activity. They never trigger an alert because they never break a rule. That’s what threat hunting is for: finding the threats that slipped past your detections.

An operational assume-breach posture has to cover both surfaces: the queue you have, and the queue you don't.

What Does Operational Assume-Breach Actually Look Like?

Hunt the Queue You Have

Tier 1 triage and investigation is where reactive surface-clearing earns its keep, and where Dropzone's AI SOC Analyst spends its day: every alert in your queue receives a consistent investigation that takes about 7 minutes on average.

ECS, a top-five MSSP in North America (ranked #4 on MSSP Alert's Top 250 for 2025), handles roughly 30,000 alerts per month in its SOC thanks to Dropzone AI. No human team clears that queue without something going uninvestigated, which is exactly where land-and-pivot activity usually hides. Clearing it at machine speed and with consistent investigation depth frees up the time budget for the proactive work that comes next. Read the ECS case study.

That extra SOC capacity unlocks a closer look: alerts your team would normally triage out under pressure get the same full investigation as Urgent ones. When the Axios npm package was compromised in March 2026, the resulting activity appeared in customer environments only as a medium-priority Microsoft Defender alert.

AI SOC Analyst pulled that medium-priority alert through a full investigation, traced endpoint telemetry to a stealthy PowerShell execution beaconing to infrastructure already flagged as malicious by multiple vendors, and escalated it to a Malicious Urgent verdict before human analysts could triage.

The medium-severity label was hiding a real initial-access foothold from Sapphire Sleet, a North Korean state actor that ordinary triage probably wouldn’t have chased, and the same pattern lives in every low- or medium-priority ticket your team triaged out under headcount pressure.

Hunt the Queue You Don't Have

The threats that never tripped a rule are the ones you actually need to find. AI Threat Hunter runs hypothesis-driven, federated hunts across SIEM, EDR, identity, and network data, compressing up to 40 hours of manual hunting into roughly an hour.

Behind the hunts sits the Hunt Pack Catalog, organized into five categories:

  • Emerging threats
  • Threat actors
  • Vulnerabilities
  • ATT&CK techniques
  • Operational anomalies

To respond quickly to emerging threats, the AI TI Analyst feeds the catalog by monitoring 2,500+ OSINT sources, from CVE disclosures and vendor advisories to security blogs and dark web forums, extracting TTPs and IOCs, and packaging the results into hunt packs that the Threat Hunter can actually execute.

Let’s look at an example: A February 2026 Five Eyes Cisco SD-WAN advisory (CISA Emergency Directive ED-26-03) shows the cadence with Dropzone AI:

  1. At 00:00, a third-party advisory drops
  2. By 04:00, a behavioral hunt pack is built from the extracted TTPs
  3. By 05:30, a federated hunt report lands in your team's inbox

Even hunts that come back clean return value:

  • Policy violations surfaced
  • Visibility gaps identified
  • Misconfigurations flagged
  • Detection-engineering opportunities you can action the next morning

Conclusion

Frontier model releases land every few months now, and each one is going to get a bit better at finding and exploiting vulnerabilities. You still need to focus on speeding up your patching cycle, but the reality is that there is too much operational friction to keep up. By 2027, you will absolutely need your detection and response function to pick up the slack. Just assume that attackers have already breached your network. You have chances to catch the intruder, but only if something is hunting on every one of them. See for yourself how Dropzone does this in our self-guided demo.

FAQs

What does "assume breach" actually mean in 2026?
Assume-breach in 2026 is the operating assumption that adversaries are already active in your environment, paired with the practical capability to detect and contain them post-compromise. The 2015-era version of the same phrase was mostly a posture statement that translated to "buy more EDR." LLM-driven exploit generation has made perimeter prevention math so hard that post-compromise detection is now the part of the program that actually drives outcomes.
How is LLM-driven exploit generation different from traditional vulnerability discovery?
Traditional discovery is bound by human expertise: weeks of effort by trained experts using significant tooling. LLM-driven generation collapses both inputs simultaneously, producing working exploits in hours and letting non-experts direct the work. The practical consequence for vulnerability management is that the time between disclosure and exploitation now compresses faster than typical patch cycles, which forces patch prioritization based on probability of exploitation rather than pure CVSS severity.
Why hasn't internal breach detection improved over the past decade?
The structural answer is that headcount doesn't scale into the no-alert surface. SOCs that doubled or tripled their Tier 1 staff over the last decade got faster ticket triage, not better hunting, because the bottleneck never lived in the queue. The threats that escape detection are precisely the ones that never tripped a rule, and chasing them requires hypothesis-driven work that the alert backlog crowds out first.
What's the difference between threat hunting and alert triage?
Triage is reactive work driven by alerts that have already fired. Hunting is hypothesis-driven work that looks for adversary behavior that didn't (or couldn't) trigger a rule. Hunting has historically been the first thing cut when the Tier 1 backlog grows, which is why most internal detection improvements have stalled. AI Threat Hunter compresses manual hunt cycles from up to 40 hours to about 1 hour, making continuous hunting tractable for normal-sized teams.
How does AI TI Analyst differ from a traditional threat intelligence platform?
Traditional TIPs aggregate feeds and stop there, leaving analysts to manually translate intel into hunts. AI TI Analyst closes that gap: it sits complementary to your existing TIP, automating the analyst layer that turns intel into a hunt pack the Threat Hunter can execute the same day. The feed layer wasn't broken; the analyst layer was missing.
A man with a beard and a green shirt.
Tyson Supasatit
Principal Product Marketing Manager

Tyson Supasatit is Principal Product Marketing Manager at Dropzone AI where he helps cybersecurity defenders understand what is possible with AI agents. Previously, Tyson worked at companies in the supply chain, cloud, endpoint, and network security markets. Connect with Tyson on Mastodon at https://infosec.exchange/@tsupasat

Self-Guided Demo

Test drive our hands-on interactive environment. Experience our AI SOC analyst autonomously investigate security alerts in real-time, just as it would in your SOC.
Self-Guided Demo
A screenshot of a dashboard with a purple background and the words "Dropzone AI" in the top left corner.