Skip to content
ZeyadKhalil
LLM Security · Data Loss Prevention · Google London Final

PromptGuard AI: Enterprise LLM Data Loss Prevention

An AI security platform for reducing sensitive data leakage into public LLMs. The system combines a Chrome Extension for browser level prompt interception, a FastAPI policy engine on Google Cloud Run and a React dashboard for analytics, audit review and rule configuration.

Context
WMG / Google / NatWest Secure Intelligence Frontier hackathon
Outcome
Advanced to grand final at Google London
Surface
Browser to public LLM prompts
Stack
Chrome MV3 · FastAPI · Cloud Run · React

Overview

PromptGuard AI is an applied AI security platform that sits between an organisation's users and public LLMs (ChatGPT, Gemini, Claude and others) and stops sensitive data from leaving the building. It was built during the WMG / Google / NatWest Secure Intelligence Frontier hackathon and the team advanced to the grand final at Google London.

The core product decision is that the prompt should be inspected before it leaves the browser. By moving enforcement closer to the user action, the system can provide immediate feedback, preserve an audit trail and reduce reliance on controls that only operate after data has already reached a third-party service.

Problem framing

A typical day for a knowledge worker now includes pasting fragments of customer data, internal source code, contracts, credentials, account numbers and unreleased product specs into a chat box owned by a third party. From a security and regulatory perspective, this is a data exfiltration channel hiding inside a productivity tool.

Existing DLP tooling does not generally see the prompt. It sees an HTTPS request to a benign-looking domain. Existing LLM safety tooling sits on the provider side, where the customer data has already arrived. PromptGuard takes the position that DLP for LLM use must live in the user's browser, with a server-side policy engine and audit trail behind it.

Architecture

  1. Step 01

    User browser

    ChatGPT / Gemini / Claude tab

  2. Step 02

    Chrome Extension

    Manifest V3 · MutationObserver

  3. Step 03

    Pre-send hook

    Intercept submit event

  4. Step 04

    FastAPI on Cloud Run

    /scan endpoint

  5. Step 05

    Detection layer

    Regex + LLM-assisted classifier

  6. Step 06

    Policy engine

    Allow · redact · block

  7. Step 07

    Audit log

    Structured events

  8. Step 08

    React dashboard

    Analytics + rules

Every prompt the user attempts to send is intercepted in the browser, sent to the policy engine for a synchronous decision, and then either allowed, modified or blocked before it leaves the device. The user sees an inline explanation, not just a refusal.

Chrome Extension (Manifest V3)

  • Manifest V3 service-worker architecture with no long-running background pages, ephemeral state held in chrome.storage.session, persistent configuration in chrome.storage.sync.
  • Content script attaches a MutationObserver to the chat composer, detects the submit event (Enter / Send button), pauses submission, and runs a synchronous round-trip to the backend before releasing the request.
  • Inline UX: an unobtrusive shield indicator in the composer, a contextual warning banner when redaction or blocking fires, and a tooltip explaining which rule fired and why. No silent blocking.
  • Per-tenant configuration. The extension authenticates against the backend with a tenant-scoped token so different organisations can ship different rules without rebuilding the extension.

FastAPI backend on Google Cloud Run

  • Stateless FastAPI app deployed to Google Cloud Run for on-demand horizontal autoscaling, which fits bursty prompt traffic tied to working hours.
  • Two primary endpoints: POST /scan for synchronous prompt evaluation (latency-sensitive, on the user's critical path) and GET /events for the admin dashboard (analytics, paginated, read-heavy).
  • Pydantic schemas at the API boundary so every prompt payload is validated before reaching detection logic.
  • Structured JSON logging with correlation IDs that flow from the extension through the backend into the audit store.
  • Cloud Run keeps cold-start cost low; the policy engine is intentionally CPU-bound rather than GPU-bound so that scaling is cheap and predictable.

Detection layer

The detection layer is deliberately a hybrid. Pure regex is fast and explainable but has a brittle recall ceiling on free-form prose. Pure LLM classification is flexible but slow, expensive and hallucinatory. Combining the two gives most of the recall with most of the precision.

  • Pattern layer. Regex / token rules for high-confidence structured PII such as credit-card-shaped strings, IBANs, sort-codes, internal ticket IDs, AWS / GCP credentials, API keys.
  • LLM-assisted layer. A lightweight classifier called for ambiguous cases such as "is this paragraph a customer record?", "does this excerpt look like internal source code?", "is this a confidential contract clause?".
  • False-positive damping. A scoring layer that combines pattern hits and classifier verdicts, so a single weak signal does not block a legitimate prompt. This was important for usability because security controls only work when users understand and trust the decision.

React admin dashboard

  • React frontend for security operators covering analytics, rule configuration and live event review.
  • Charts on prompt volume, block / redact / allow ratios, top-firing rules and per-user heatmaps.
  • Inline rule editor where pattern rules can be authored, tested against a prompt sample, and rolled out to specific tenants without redeploying the extension.
  • Drill-down on individual events with full payload inspection, redacted by default and unlockable behind explicit operator action, which is itself an audit event.

Policy & enforcement

Prompt passes all rules. Forwarded to the LLM unchanged. Logged at a low retention tier so the audit trail exists without flooding storage.

Observability

A control product that nobody trusts is worse than no control product. The team treated observability as a first-class deliverable, not an afterthought.

Logging
Structured JSON

Correlation IDs end-to-end

Metrics
Per-rule fire rates

Catch rule drift

Dashboards
Cloud Run + custom

Latency / errors / volume

Alerting
Threshold-based

On fire-rate spikes

Security model

  • Tenant-scoped API tokens, rotated through the dashboard. No long-lived shared secrets in the extension package.
  • Prompt payloads in transit are TLS-only; at rest, only blocked-event payloads are persisted, and only behind operator-controlled retention.
  • Extension permissions are scoped to the LLM domains the tenant configures rather than blanket access to all browsing.
  • The detection layer is treated as untrusted-content territory: the LLM-assisted classifier runs on a non-overridable system prompt, and its verdicts are scored, not executed as policy.

Hackathon outcome

Built as a small team during the WMG / Google / NatWest Secure Intelligence Frontier hackathon. The combination of an opinionated end-to-end product slice (extension + backend + dashboard, all working), a clear regulatory framing and a hybrid detection layer that performed reliably during live demos was enough to advance the team to the grand final at Google London.

The lesson worth carrying into other AI-security work: the biggest differentiator was not detection complexity. It was UX and observability. A DLP tool that operators trust is one whose decisions they can read, explain and override.

Future work

  • Native enterprise-browser integration (Edge for Business, Chrome Enterprise policy) to remove per-user extension install friction.
  • Streaming evaluation to score and redact prompts token by token while the user types, instead of only at submit time.
  • Reverse-direction DLP: scan model responses for inadvertent internal context leakage when used inside corporate copilots.
  • SIEM integration (Splunk / Chronicle / Sentinel) so PromptGuard events land in the same incident pipeline as the rest of corporate security.