TL;DR

Traditional threat modeling faces severe scalability challenges, with single security engineers supporting 30-40 developers through hours-long STRIDE reviews. An AI-powered copilot using Claude 3.7 and RAG achieves 50-55% accuracy in generating baseline threat models, reducing review time while keeping security engineers in control of validation and refinement.

Key Takeaways:

  1. Traditional threat modeling doesn't scale. Long review sessions and resource gaps make keeping pace with fast-moving development difficult.
  2. An AI copilot can ease the load. By automating baseline threat models, engineers gain a head start while staying in control of final outputs.
  3. Data quality is critical. Clear diagrams and well-structured inputs directly impact the accuracy of AI-generated results.
  4. Privacy matters. Building the copilot in-house on AWS kept sensitive product data secure and simplified compliance and audit needs.
  5. The future is self-service. Developers will generate first-pass threat models with AI, while security engineers guide, validate, and refine.

Introduction

Threat modeling sits at the heart of secure software development. It's where teams pause to ask the hard questions:

  • What are we building?
  • What could go wrong?
  • How do we protect against it?

The answers shape design decisions and reduce risk, but anyone who has been through the process knows how heavy it can feel. Sessions stretch for hours, security engineers are outnumbered by developers, and every new product or feature means starting over again.

The result is a practice that's vital for reducing vulnerabilities but notoriously time-consuming and difficult to scale.

That's the challenge one product security engineer set out to solve. Drawing from experience supporting multiple teams across a fast-moving enterprise, Murat Zhumagali began experimenting with an AI-powered copilot, an assistant designed to take on the repetitive legwork of threat modeling while keeping humans in the loop for oversight and decision-making.

The idea wasn't just to automate for efficiency's sake. It was about rethinking how generative AI could make threat modeling more practical and accessible:

  • Fast enough to keep pace with development cycles
  • Rigorous enough to deliver results security teams can trust

This story explores how copilots built with GenAI are beginning to transform a process long seen as a bottleneck into something far more scalable.

What Are the Challenges of Traditional Threat Modeling?

Threat modeling often follows the STRIDE framework:

  1. Define the scope
  2. Identify threats
  3. Map mitigations
  4. Record their status

It's a structured process that quickly becomes a drain on time and resources.

Each STRIDE review requires long sessions with developers, dissecting architectures and data flows in detail. For many teams, a single security engineer supports 30–40 developers, making deep reviews hard to sustain.

The problem compounds in companies that grow through mergers and acquisitions, as each new business brings its own tech stack, leaving security teams juggling multiple environments simultaneously.

The outcome is a scalability bottleneck. Threat modeling remains critical but struggles to keep up with modern development velocity. It risks becoming:

  • A bottleneck that slows releases
  • A check-the-box exercise that overlooks real risks

How Was the AI Copilot Built?

Faced with these limits, Murat set out to see whether a copilot could help. Part of the motivation was personal: catching up with the wave of GenAI tools and separating genuine value from hype. But there were practical concerns too.

Off-the-shelf copilots require sending sensitive data to external providers, a non-starter for product security. The solution must stay private, running inside the company's environment.

The prototype came together with a simple stack:

  • Streamlit interface provided the front end
  • AWS Bedrock was the backbone
  • Claude 3.7 powered the reasoning
  • Retrieval-augmented generation (RAG) using Titan embeddings, OpenSearch, and S3 bucket

The workflow was straightforward:

  1. User enters product name, description, and architecture diagram
  2. Copilot builds tailored prompt using STRIDE and OWASP Top 10 principles
  3. Model generates draft threat model with components, threats, severity, impacts, and mitigations
  4. Human reviewer validates results and adjusts where needed

The AI took on the repetitive legwork, while engineers kept final say. The goal wasn't to replace security expertise but to provide a strong starting point that made reviews faster, more consistent, and easier to scale.

Watch Murat’s presentation at the 2025 Security Frontiers event: 

What Lessons Were Learned from the Pilot?

The first runs of the copilot showed just how important iteration would be. In its earliest form, relying only on prompt-stuffing, the outputs were barely usable, maybe five percent relevant to the actual product.

Adding more context into the prompts improved things, nudging accuracy into the 30–35 percent range, but still left too much noise. The real breakthrough came with retrieval-augmented generation.

Accuracy progression:

  • Prompt-stuffing only: 5% relevant
  • Enhanced prompts: 30-35% relevant
  • RAG implementation: 40-45% relevant
  • Claude 3.7 upgrade: 50-55% relevant

By storing artifacts in S3, embedding them, and pulling in context through OpenSearch, accuracy climbed significantly. When Claude 3.7 was introduced, it added another bump, bringing results closer to the target range.

Along the way, the team uncovered several practical lessons:

  1. Garbage in, garbage out: If diagrams were incomplete or poorly formatted, the model's reasoning followed suit
  2. RAG scales better: Far more effective than giant prompts, especially as the knowledge base grew
  3. Diagram conversion matters: Converting complex architecture maps into DSL format improved results, though it introduced more prep work

For developers, the outputs were good enough to serve as a baseline, something they could quickly review and refine. For security engineers, however, the bar is higher.

Until the system consistently hits closer to seventy percent accuracy, it will remain a supplement rather than a full shift in threat modeling. Still, the steady gains showed that an AI copilot could move from experiment to genuinely useful tool with the right data and workflow.

Why Does In-House Deployment Matter?

One reason the project stayed viable was the decision to build it in-house. Privacy and compliance concerns ruled out sending sensitive product data to external LLM providers.

By keeping everything inside AWS, where the company was already hosting customer workloads, the team could ensure that models ran within their own security perimeter. No designs, diagrams, or data flows ever left their environment.

This approach offered more than peace of mind. It gave the team direct control over:

  • Which models to use
  • How often to upgrade them
  • How to tune context retrieval for their needs

They weren't locked into someone else's roadmap or forced to accept opaque changes in performance. Each improvement, from embedding strategies to diagram handling, could be tested and adjusted on its own terms.

The benefits also extended to governance. In the event of a compliance review or legal audit, it was far easier to show:

  • Where data lived
  • How it was processed
  • Who had access

Instead of debating whether a vendor might have stored training data or logged requests, the team could point to a contained system operating entirely within their AWS footprint. For security engineers, that clarity wasn't just a technical win; it was a strategic one, ensuring that innovation didn't come at the expense of trust or accountability.

What Comes Next for AI-Powered Threat Modeling?

The next phase is about refinement. First, the knowledge base needs to handle multimodal inputs so diagrams no longer require heavy manual prep. Feeding in past threat models will also help fine-tune accuracy and consistency, giving the copilot more of the context security engineers rely on.

Model choice is another area to explore. Testing alternatives to Claude will ensure flexibility and reveal performance trade-offs. At the same time, adding feedback loops where engineers can adjust severity, mitigations, or scenarios will make the tool more interactive and trustworthy.

Rollout will begin with simpler products, proving value quickly before tackling more complex systems. From there, the vision is broader:

  • Self-service threat modeling with developers taking the first pass
  • Security engineers refining results and ensuring quality
  • Faster iteration cycles without sacrificing rigor

Final Thoughts

Threat modeling remains one of the most valuable practices in security, but it simply won't scale if we rely only on traditional methods. The hours required, the imbalance between security engineers and developers, and the growing complexity of modern environments all point to the same conclusion: we need new approaches.

AI copilots won't replace the expertise of security professionals, but they can change the balance of work. By providing a reliable baseline, engineers can focus on higher-value tasks:

  • Refining mitigations
  • Validating edge cases
  • Guiding strategy

At the same time, developers gain a self-service tool that helps them think about threats earlier and more often, without waiting on scarce security resources.

The future of threat modeling may not be about sitting in marathon review sessions, but about giving every team access to an AI partner that makes the process faster, more consistent, and easier to adopt.

Security engineers remain in the loop, shaping outputs and ensuring quality, but the heavy lifting shifts to automation.

If you'd like to see more stories like this and hear directly from the people building the next generation of security tools, catch the full session and others on Security Frontiers.

FAQs

What is threat modeling, and why is it important?
Threat modeling is the process of identifying risks, possible attack paths, and mitigations during software design. It helps teams address vulnerabilities early, before they become costly issues in production. Using frameworks like STRIDE, teams systematically analyze components, data flows, and potential threats to build more secure systems from the ground up.
Why is traditional threat modeling hard to scale?
It requires long sessions with developers, relies heavily on scarce security engineers, and becomes even harder when companies manage multiple tech stacks after mergers or acquisitions. A typical security engineer supports 30-40 developers, making deep STRIDE reviews nearly impossible to sustain across fast-moving development cycles without creating bottlenecks.
How does an AI copilot support threat modeling?
The copilot automates the baseline work by ingesting product details and diagrams, applying frameworks like STRIDE and OWASP Top 10, and generating draft threat models that engineers can review and refine. Using retrieval-augmented generation (RAG) with Claude 3.7, the system achieves 50-55% accuracy, providing developers with useful starting points while security engineers focus on validation and strategic refinement.
What role do humans play if AI is involved?
Security engineers remain in the loop. They validate AI outputs, adjust severity and mitigations, and ensure nuanced risks aren't overlooked. AI provides speed and consistency by handling repetitive baseline work, while humans provide judgment, experience, and strategic thinking that no model can replicate. The goal is augmentation, not replacement.
Why build an in-house copilot instead of using public LLMs?
Keeping the solution in-house avoids sending sensitive product data like architecture diagrams, data flows, and design documents to external providers. Building on AWS within the company's existing security perimeter gives teams more control over privacy, compliance, model selection, and audit trails. This approach eliminates vendor lock-in and ensures sensitive security information never leaves the organization's environment.
A man with a beard and a green shirt.
Tyson Supasatit
Principal Product Marketing Manager

Tyson Supasatit is Principal Product Marketing Manager at Dropzone AI where he helps cybersecurity defenders understand what is possible with AI agents. Previously, Tyson worked at companies in the supply chain, cloud, endpoint, and network security markets. Connect with Tyson on Mastodon at https://infosec.exchange/@tsupasat

Self-Guided Demo

Test drive our hands-on interactive environment. Experience our AI SOC analyst autonomously investigate security alerts in real-time, just as it would in your SOC.
Self-Guided Demo
A screenshot of a dashboard with a purple background and the words "Dropzone AI" in the top left corner.