PromptGuard AI: Enterprise LLM Data Loss Prevention
An AI security platform for reducing sensitive data leakage into public LLMs. The system combines a Chrome Extension for browser level prompt interception, a FastAPI policy engine on Google Cloud Run and a React dashboard for analytics, audit review and rule configuration.
- Context
- WMG / Google / NatWest Secure Intelligence Frontier hackathon
- Outcome
- Advanced to grand final at Google London
- Surface
- Browser to public LLM prompts
- Stack
- Chrome MV3 · FastAPI · Cloud Run · React
Overview
PromptGuard AI is an applied AI security platform that sits between an organisation's users and public LLMs (ChatGPT, Gemini, Claude and others) and stops sensitive data from leaving the building. It was built during the WMG / Google / NatWest Secure Intelligence Frontier hackathon and the team advanced to the grand final at Google London.
The core product decision is that the prompt should be inspected before it leaves the browser. By moving enforcement closer to the user action, the system can provide immediate feedback, preserve an audit trail and reduce reliance on controls that only operate after data has already reached a third-party service.
Problem framing
A typical day for a knowledge worker now includes pasting fragments of customer data, internal source code, contracts, credentials, account numbers and unreleased product specs into a chat box owned by a third party. From a security and regulatory perspective, this is a data exfiltration channel hiding inside a productivity tool.
Existing DLP tooling does not generally see the prompt. It sees an HTTPS request to a benign-looking domain. Existing LLM safety tooling sits on the provider side, where the customer data has already arrived. PromptGuard takes the position that DLP for LLM use must live in the user's browser, with a server-side policy engine and audit trail behind it.
Architecture
Step 01
User browser
ChatGPT / Gemini / Claude tab
Step 02
Chrome Extension
Manifest V3 · MutationObserver
Step 03
Pre-send hook
Intercept submit event
Step 04
FastAPI on Cloud Run
/scan endpoint
Step 05
Detection layer
Regex + LLM-assisted classifier
Step 06
Policy engine
Allow · redact · block
Step 07
Audit log
Structured events
Step 08
React dashboard
Analytics + rules
Every prompt the user attempts to send is intercepted in the browser, sent to the policy engine for a synchronous decision, and then either allowed, modified or blocked before it leaves the device. The user sees an inline explanation, not just a refusal.
Chrome Extension (Manifest V3)
- Manifest V3 service-worker architecture with no long-running background pages, ephemeral state held in
chrome.storage.session, persistent configuration inchrome.storage.sync. - Content script attaches a
MutationObserverto the chat composer, detects the submit event (Enter / Send button), pauses submission, and runs a synchronous round-trip to the backend before releasing the request. - Inline UX: an unobtrusive shield indicator in the composer, a contextual warning banner when redaction or blocking fires, and a tooltip explaining which rule fired and why. No silent blocking.
- Per-tenant configuration. The extension authenticates against the backend with a tenant-scoped token so different organisations can ship different rules without rebuilding the extension.
FastAPI backend on Google Cloud Run
- Stateless FastAPI app deployed to Google Cloud Run for on-demand horizontal autoscaling, which fits bursty prompt traffic tied to working hours.
- Two primary endpoints:
POST /scanfor synchronous prompt evaluation (latency-sensitive, on the user's critical path) andGET /eventsfor the admin dashboard (analytics, paginated, read-heavy). - Pydantic schemas at the API boundary so every prompt payload is validated before reaching detection logic.
- Structured JSON logging with correlation IDs that flow from the extension through the backend into the audit store.
- Cloud Run keeps cold-start cost low; the policy engine is intentionally CPU-bound rather than GPU-bound so that scaling is cheap and predictable.
Detection layer
The detection layer is deliberately a hybrid. Pure regex is fast and explainable but has a brittle recall ceiling on free-form prose. Pure LLM classification is flexible but slow, expensive and hallucinatory. Combining the two gives most of the recall with most of the precision.
- Pattern layer. Regex / token rules for high-confidence structured PII such as credit-card-shaped strings, IBANs, sort-codes, internal ticket IDs, AWS / GCP credentials, API keys.
- LLM-assisted layer. A lightweight classifier called for ambiguous cases such as "is this paragraph a customer record?", "does this excerpt look like internal source code?", "is this a confidential contract clause?".
- False-positive damping. A scoring layer that combines pattern hits and classifier verdicts, so a single weak signal does not block a legitimate prompt. This was important for usability because security controls only work when users understand and trust the decision.
React admin dashboard
- React frontend for security operators covering analytics, rule configuration and live event review.
- Charts on prompt volume, block / redact / allow ratios, top-firing rules and per-user heatmaps.
- Inline rule editor where pattern rules can be authored, tested against a prompt sample, and rolled out to specific tenants without redeploying the extension.
- Drill-down on individual events with full payload inspection, redacted by default and unlockable behind explicit operator action, which is itself an audit event.
Policy & enforcement
Prompt passes all rules. Forwarded to the LLM unchanged. Logged at a low retention tier so the audit trail exists without flooding storage.
High-confidence sensitive segments (e.g. card numbers, credentials) are replaced with typed placeholders ([REDACTED:CARD]) before the prompt leaves the browser. The user is shown the redaction inline and can accept, edit or cancel.
For severe categories (e.g. customer PII paste, source-code dump beyond a length threshold) the prompt is blocked outright with a link to the relevant policy. The user is never silently stopped. The rule and the rationale are always shown.
Every decision (allow / redact / block) is written to an audit log with correlation IDs, the firing rule, the classifier confidence and a hashed prompt fingerprint. The full payload is only stored when a block fires and retention is policy-controlled.
Observability
A control product that nobody trusts is worse than no control product. The team treated observability as a first-class deliverable, not an afterthought.
- Logging
- Structured JSON
- Metrics
- Per-rule fire rates
- Dashboards
- Cloud Run + custom
- Alerting
- Threshold-based
Correlation IDs end-to-end
Catch rule drift
Latency / errors / volume
On fire-rate spikes
Security model
- Tenant-scoped API tokens, rotated through the dashboard. No long-lived shared secrets in the extension package.
- Prompt payloads in transit are TLS-only; at rest, only blocked-event payloads are persisted, and only behind operator-controlled retention.
- Extension permissions are scoped to the LLM domains the tenant configures rather than blanket access to all browsing.
- The detection layer is treated as untrusted-content territory: the LLM-assisted classifier runs on a non-overridable system prompt, and its verdicts are scored, not executed as policy.
Hackathon outcome
Built as a small team during the WMG / Google / NatWest Secure Intelligence Frontier hackathon. The combination of an opinionated end-to-end product slice (extension + backend + dashboard, all working), a clear regulatory framing and a hybrid detection layer that performed reliably during live demos was enough to advance the team to the grand final at Google London.
The lesson worth carrying into other AI-security work: the biggest differentiator was not detection complexity. It was UX and observability. A DLP tool that operators trust is one whose decisions they can read, explain and override.
Future work
- Native enterprise-browser integration (Edge for Business, Chrome Enterprise policy) to remove per-user extension install friction.
- Streaming evaluation to score and redact prompts token by token while the user types, instead of only at submit time.
- Reverse-direction DLP: scan model responses for inadvertent internal context leakage when used inside corporate copilots.
- SIEM integration (Splunk / Chronicle / Sentinel) so PromptGuard events land in the same incident pipeline as the rest of corporate security.