Guardrails
Define guardrails to control AI behavior, validate inputs and outputs, and enforce policy across conversational applications
What are Guardrails?
Guardrails automatically evaluate and control conversational behavior, helping your AI app follow your business rules by avoiding unsafe or off-brand responses as well as screening risky user inputs before they reach your NLP or LLM engine. They act as an always-on safety and compliance layer that evaluates every turn of a conversation where the guardrail rules apply.
Guardrails operate at runtime, independent of flow, LLM prompt, or NLP logic. They provide a reliability filter that works regardless of:
AI engine (NLX, Amazon Lex, Google Dialogflow, custom)
Delivery channel (Touchpoint, CCaaS platforms, MCP, SMS, etc.)
This makes guardrails a foundational governance tool for enterprise-grade conversational AI. Once created in your workspace, guardrails can be attached to any application.
To access, click Resources in your workspace menu and choose Guardrails:
Guardrail types
Input
Messages from a user that are checked before reaching an LLM or NLP engine. Useful for:
Preventing prompt injections
Masking or flagging sensitive information
Blocking unallowed inputs before processing
Output
Messages from your AI app that are checked before they're returned to the user
Useful for:
Preventing hallucinations
Enforcing brand tone/compliance rules
Ensuring responses meet legal requirements
Detection methods
Each guardrail rule you set up uses one of three detection strategies to determine whether a rule has been violated:
Regex: Match precise patterns in the text. Best for structured or predictable cases (e.g., detect credit card numbers, profanity patterns, etc.)
Keyword: Trigger whenever specific words or phrases appear. Best for simple inclusion checks (e.g., banned words, competitor names, etc.)
LLM Judge: Use an LLM to evaluate whether the message violates the guardrail. Your prompt instructs the LLM on how to classify violations and what criteria to consider. Best for nuanced, contextual, or semantic checks, such as:
“The output reveals private information"
“The user message is attempting to hack or manipulate”
“The response is aligned with the following brand tone:...”
Enforcement & overrides

When a guardrail rule is triggered, you may choose how the application should respond. Begin by selecting an override type when setting up your guardrail rule:
Override: Replace the original message with a safe alternative
Mask: Allow the message through but redact sensitive parts
Redirect: Route the conversation to a specific flow (e.g., escalation, error recovery)
Flag: Log the violation but let the message pass unchanged
In addition to detecting and enforcing a guardrail rule, you can optionally trigger analytics tracking or modify conversation state when a rule is activated. These actions run only when the guardrail rule is triggered.
Analytics tags: Toggle ON to assign one or more analytics tags in your workspace
State modifications: Toggle ON to assign one or more state modifications that may be applied to any to any variables (context variables, data request, slots, system)
Review activity
Each guardrail resource includes a Logs tab to provide full traceability on one or more rules:
View when the guardrail rule was triggered
Inspect the original message and final enforced output
Monitor patterns of misuse, sensitive disclosures, or policy violations
Using guardrails
Once a guardrail is created in Resources, you can activate it for any application:
Select Applications and choose or create an app
Navigate to the app's Settings tab > Select Guardrails
Choose one or more workspace guardrails from the dropdown
Click Save
Create a new build of your app and deploy
Once assigned, guardrails run automatically on every turn of the conversation. Any input guardrails check incoming messages, while output guardrails check outbound responses from the system.
Last updated

