Prompt-injection patterns
Catches "ignore previous instructions", role swaps, system-prompt extraction, "developer mode" tricks.
Paste any text. The filter scans for five categories an enterprise LLM system has to guard against — prompt injection, PII / secrets leakage, profanity, off-policy keywords. Returns a verdict (block / review / allow) and a redacted preview suitable for safe display.
Try a scenario
Prompt-injection patterns
Catches "ignore previous instructions", role swaps, system-prompt extraction, "developer mode" tricks.
PII detection
Regex for emails, phone numbers, credit cards, US SSNs, IP addresses.
API keys / secrets
Catches common API key formats: OpenAI (sk-), GitHub PAT, Slack tokens, AWS access keys.
Profanity & off-policy
Small built-in word lists for demo purposes. Production uses calibrated lists per industry vertical.
Verdict logic
Any high-severity finding → block. Any finding → review. Clean → allow.
Production swap
Same JSON contract drops in front of NeMo Guardrails, Guardrails AI, or Maxim with ML-based detectors for tone, hallucination, and brand safety.
We deploy production-grade input + output guardrails using NeMo Guardrails, custom policy-as-code, and real-time monitoring with alerts to your security team — calibrated to your false-positive budget.
Book a 30-minute consultation. We will walk through the use case, sketch the value case, and tell you honestly whether we can help.