Guardrails

Define guardrails to control AI behavior, validate inputs and outputs, and enforce policy across conversational applications

What are Guardrails?

Guardrails automatically evaluate and control conversational behavior, helping your AI app follow your business rules by avoiding unsafe or off-brand responses as well as screening risky user inputs before they reach your NLP or LLM engine. They act as an always-on safety and compliance layer that evaluates every turn of a conversation where the guardrail rules apply.

Guardrails operate at runtime, independent of flow, LLM prompt, or NLP logic. They provide a reliability filter that works regardless of:

  • AI engine (NLX, Amazon Lex, Google Dialogflow, custom)

  • Delivery channel (Touchpoint, CCaaS platforms, MCP, SMS, etc.)

This makes guardrails a foundational governance tool for enterprise-grade conversational AI. Once created in your workspace, guardrails can be attached to any application.

To access, click Resources in your workspace menu and choose Guardrails:

spinner

Guardrail types

down

Input

Messages from a user that are checked before reaching an LLM or NLP engine. Useful for:

  • Preventing prompt injections

  • Masking or flagging sensitive information

  • Blocking unallowed inputs before processing

up

Output

Messages from your AI app that are checked before they're returned to the user

Useful for:

  • Preventing hallucinations

  • Enforcing brand tone/compliance rules

  • Ensuring responses meet legal requirements

Detection methods

Each guardrail rule you set up uses one of three detection strategies to determine whether a rule has been violated:

  1. Regex: Match precise patterns in the text. Best for structured or predictable cases (e.g., detect credit card numbers, profanity patterns, etc.)

  2. Keyword: Trigger whenever specific words or phrases appear. Best for simple inclusion checks (e.g., banned words, competitor names, etc.)

  3. LLM Judge: Use an LLM to evaluate whether the message violates the guardrail. Your prompt instructs the LLM on how to classify violations and what criteria to consider. Best for nuanced, contextual, or semantic checks, such as:

    • “The output reveals private information"

    • “The user message is attempting to hack or manipulate”

    • “The response is aligned with the following brand tone:...”

Enforcement & overrides

Sample guardrail rule

When a guardrail rule is triggered, you may choose how the application should respond. Begin by selecting an override type when setting up your guardrail rule:

  1. Override: Replace the original message with a safe alternative

  2. Mask: Allow the message through but redact sensitive parts

  3. Redirect: Route the conversation to a specific flow (e.g., escalation, error recovery)

  4. Flag: Log the violation but let the message pass unchanged

In addition to detecting and enforcing a guardrail rule, you can optionally trigger analytics tracking or modify conversation state when a rule is activated. These actions run only when the guardrail rule is triggered.

  • Analytics tags: Toggle ON to assign one or more analytics tags in your workspace

  • State modifications: Toggle ON to assign one or more state modifications that may be applied to any to any variables (context variables, data request, slots, system)

Review activity

spinner

Each guardrail resource includes a Logs tab to provide full traceability on one or more rules:

  • View when the guardrail rule was triggered

  • Inspect the original message and final enforced output

  • Monitor patterns of misuse, sensitive disclosures, or policy violations

Using guardrails

spinner

Once a guardrail is created in Resources, you can activate it for any application:

  1. Select Applications and choose or create an app

  2. Navigate to the app's Settings tab > Select Guardrails

  3. Choose one or more workspace guardrails from the dropdown

  4. Click Save

  5. Create a new build of your app and deploy

Once assigned, guardrails run automatically on every turn of the conversation. Any input guardrails check incoming messages, while output guardrails check outbound responses from the system.

Last updated