TL;DR

Kyle Polley's PII Detective revolutionizes data privacy by using AI to analyze metadata instead of raw data, reducing PII detection costs from thousands to $5 while maintaining accuracy. This approach combines LLM inference with human review for practical, auditable privacy protection without the operational overhead of traditional scanning methods.

Protecting sensitive data is a non-negotiable priority for any organization, but detecting personally identifiable information (PII) at scale is rarely simple. As data volumes explode and privacy regulations grow more stringent, security teams are under increasing pressure to locate and lock down sensitive fields across sprawling cloud environments. Unfortunately, most existing tools are slow, expensive, and operationally heavy, often requiring full data scans that strain compute budgets and increase exposure risk.

At the Security Frontiers 2025 virtual conference, Kyle Polley offered a refreshing alternative: PII Detective. This open-source, AI-powered tool sidesteps the traditional pitfalls of PII discovery by never touching the data itself. Instead, it leverages large language models to analyze metadata only, making smart inferences about what columns are likely to contain sensitive information based on names, patterns, and structure.

PII Detective is a great example of how security practitioners are using AI to dramatically improve efficiency of day-to-day security operations tasks. 

The Problem with Traditional PII Detection

For most organizations, detecting PII means throwing compute at the problem. Legacy tools operate under the assumption that more scanning equals better coverage, relying on brute-force methods like regex sweeps across entire datasets. These scans dig deep into raw data, combing through tables row by row in search of anything resembling sensitive information—names, email addresses, IDs, and beyond.

However, as Kyle Polley pointed out in his Security Frontiers talk, this approach misses the point. “What if the solution isn’t more compute?” he asked. “What if it’s just better context?”

The current model introduces significant challenges. First, it’s expensive. Running intensive data scans across cloud storage platforms like Snowflake or BigQuery can rack up staggering compute bills. Second, it’s noisy. These tools often generate high volumes of false positives, flagging columns with vague names or harmless values as risks, leaving privacy teams with piles of results that require manual review. And third, it’s inflexible. Full-data scans often involve elevated permissions, creating operational headaches and increased risk if something goes wrong.

The PII Detective Approach

Rather than parsing through terabytes of raw data, PII Detective flips the model on its head. Kyle Polley’s tool asks a simple question: what if the information we need is already in front of us in the structure, not the contents?

PII Detective scans only metadata, such as column names, data types, and structural patterns. Avoiding raw data entirely dramatically reduces compute costs and privacy risks. At its core is a large language model trained to infer sensitivity based on naming conventions, schema logic, and contextual cues. For example, it knows that a column named ssn or employee_id is probably sensitive, while invoice_number likely isn’t.

The system doesn’t try to be perfect. Instead, it’s designed to move quickly and cheaply, flagging likely PII candidates and handing them off for human review as a final step. It’s a clean balance: AI handles the bulk, and humans make the call.

And the efficiency is hard to ignore. In Kyle’s benchmarks, PII Detective could scan thousands of tables for roughly $5 in total compute. No complex infrastructure, no massive cloud bills, just smart inference, and fast iteration, all while respecting the boundaries of data privacy.

Best of all, PII Detective is open-source on GitHub here.

Why It Matters: Practical AI, Not Hype

Kyle Polley’s project captured one of Security Frontiers’ core themes: real tools for solving real problems, built with practicality in mind rather than spectacle. 

PII Detective avoided the complexity and cost of full data access by anchoring its logic in metadata and layering on a lightweight, LLM-powered inference model. It’s a reminder that effective AI doesn’t have to be massive. It just has to be smart.

It also embraced a human-in-the-loop design, reinforcing trust and accountability. Instead of trying to eliminate human oversight, the tool builds around it, automating what can be automated and surfacing what needs review. The result? High signal-to-noise precision and a faster and more accurate process than traditional methods.

Perhaps most importantly, PII Detective doesn’t try to reinvent the stack. Its AI logic is composable and can slot into whatever data catalog, privacy tooling, or compliance workflow an organization already has. That flexibility is key, especially when so many enterprise-grade solutions arrive bloated, overpriced, and over-engineered, delivering only marginal gains in exchange for massive effort.

What Teams Can Learn from This

PII Detective is a reminder that simplicity is a feature, especially in a field that too often confuses complexity with capability. The tool delivers useful results with minimal privacy exposure by pairing metadata with LLM inference. It doesn’t overreach, and it doesn’t ask for unnecessary access. 

Another key insight of trust in AI comes from explainability and reversibility. PII Detective’s results are auditable, reviewable, and easy to adjust. That makes it not just efficient but usable. 

And the approach isn’t limited to PII detection. The same pattern of metadata + LLM + human review has enormous potential across security and compliance, including Tier 1 alert triage and investigation. From automated data classification to alert triage or access reviews, this model offers a path forward that balances automation with oversight.

Lessons Worth Taking Home

PII Detective proves that solving big security problems doesn’t always require big tools. Sometimes, all it takes is a sharper question, a smaller footprint, and the right mix of automation and oversight. It’s the kind of work that quietly changes how teams think, and that’s exactly what Security Frontiers was all about.

If you’re interested in how AI is actually being used to solve real security problems, Security Frontiers is the place to see it in action.

FAQs

Why is PII discovery needed for compliance?
Privacy data scans are essential for compliance because they help organizations locate and secure personally identifiable information (PII) across their environments—an obligation under nearly all modern data protection laws. Regulations like GDPR, CCPA, HIPAA, and others require companies to know what sensitive data they have, where it resides, and how it’s protected.
What are the traditional options for PII discovery?
Traditional privacy data scanning tools typically rely on full-content inspection methods to identify sensitive information like names, email addresses, Social Security numbers, and more. These tools fall into a few main categories: regex-based scanners, data discovery platforms, DLP tools, and manual review or sampling.
What makes PII Detective different from traditional PII detection tools?
Unlike traditional tools that rely on resource-heavy, full-content regex scans, PII Detective skips raw data access entirely. It uses metadata and contextual cues to flag likely PII columns and then passes those results to humans for validation. This approach is cheaper, faster, less noisy, and doesn't require elevated permissions or complex infrastructure.
How does PII Detective detect sensitive data without scanning the actual content?
PII Detective uses large language models (LLMs) to analyze metadata only—such as column names, data types, and structural patterns—without ever touching raw data. It makes smart inferences about which fields are likely to contain sensitive information, dramatically reducing compute costs and privacy risks compared to traditional full-data scans.
Is PII Detective expensive to run at scale?
No. PII Detective is extremely cost-efficient. In benchmarks shared by Kyle Polley, it was able to scan thousands of tables for around $5 in compute costs. This is a 99% reduction in cost compared to conventional scanning methods.
How does PII Detective balance automation with human oversight?

PII Detective follows a human-in-the-loop design. The AI flags likely sensitive columns, but it’s up to human reviewers to make the final call. This design increases trust and accountability while still automating the bulk of the work.

A man with a beard and a green shirt.
Tyson Supasatit
Principal Product Marketing Manager

Tyson Supasatit is Principal Product Marketing Manager at Dropzone AI where he helps cybersecurity defenders understand what is possible with AI agents. Previously, Tyson worked at companies in the supply chain, cloud, endpoint, and network security markets. Connect with Tyson on Mastodon at https://infosec.exchange/@tsupasat

Self-Guided Demo

Test drive our hands-on interactive environment. Experience our AI SOC analyst autonomously investigate security alerts in real-time, just as it would in your SOC.
Self-Guided Demo
A screenshot of a dashboard with a purple background and the words "Dropzone AI" in the top left corner.