Skip to main content

What Is Content Shield?

Content Shield detects sensitive content in player messages — such as self-harm or suicidal intent — and silently escalates the conversation to a human agent. When triggered, zero automated messages reach the player. No greeting, no AI response, no transfer message. Only a human communicates.
Content Shield runs on every inbound message, not just the first. A player might start with a normal question and later express distress — Content Shield catches it regardless of when it appears.

How It Works

  1. Player sends a message (first or subsequent)
  2. Before any AI processing, Content Shield checks the message
  3. If sensitive content is detected → conversation is silently escalated to a human agent
  4. If not detected → normal flow continues (greeting, AI response, etc.)

What the Player Experiences

  • If Content Shield triggers: Nothing automated. A human agent picks up the conversation.
  • If Content Shield doesn’t trigger: Normal experience — greeting, AI response, etc.

What the Operator Sees

  • Ticket is flagged in the escalation queue
  • Detection metadata shows the type, confidence score, and matched reference
  • The suppressOutbound flag prevents any automated outbound for the ticket’s lifetime

Setting Up Content Shield

Content Shield is configured through Automation Rules using a special trigger type.
1

Create an Automation Rule

Go to Settings → Automation Rules and create a new rule:
  • Trigger: Content Detected
  • Detection Type: Self-harm
  • Action: Escalate to Human (with silent mode enabled)
Silent mode ensures no automated messages are sent to the player.
2

Assign an Agent Team

Make sure you have a team of human agents configured to receive escalated conversations. These are the agents who will handle flagged conversations.
3

Test

Send a test message through your chat widget to verify detection is working correctly. Content Shield will evaluate the message and trigger escalation if sensitive content is detected.

Detection

Content Shield uses purpose-built safety classification that works across languages. Detection is fast — messages are evaluated in real time with no noticeable delay to the player.

Key Behaviors

Not just the first message. A player might start with a withdrawal question and later express distress. Content Shield evaluates every inbound message.
Once a ticket is flagged, Content Shield won’t re-trigger on subsequent messages in the same conversation. The flag is permanent for that ticket’s lifetime.
Workspaces without a Content Detected automation rule skip detection entirely. No API calls, no latency.
The player’s message is saved to the transcript for completeness, even when intercepted. Only the AI processing is skipped.
Content Shield works on all supported channels — LiveChat, Zendesk, Zoho, Respond.io, Intercom, Web Messenger, and Web — with full first-message coverage.

Limitations

  • Single detection type — currently only self-harm. Additional types (harassment, threats) can be added.
  • No per-brand thresholds — a single threshold applies across the workspace.
  • Very terse messages may not match — short phrases like “end it” with no context may fall below the detection threshold.
Content Shield is a safety layer, not a replacement for comprehensive responsible gaming detection. It specifically targets self-harm content as an immediate safety measure. For broader RG detection (problem gambling, financial hardship, self-exclusion), the AI agent handles classification during normal conversation flow.