Guardrails

Define guardrails to control AI behavior, validate inputs and outputs, and enforce policy across conversational applications

What are Guardrails?

Guardrails automatically evaluate and control conversational behavior, helping your AI app follow your business rules by avoiding unsafe or off-brand responses as well as screening risky user inputs before they reach your NLP or LLM engine. They act as an always-on safety and compliance layer that evaluates every turn of a conversation where the guardrail rules apply.

Guardrails operate at runtime, independent of flow, LLM prompt, or NLP logic. They provide a reliability filter that works regardless of:

  • AI engine (NLX, Amazon Lex, Google Dialogflow, custom)

  • Delivery channel (Touchpoint, CCaaS platforms, MCP, SMS, etc.)

This makes guardrails a foundational governance tool for enterprise-grade conversational AI. Once created in your workspace, guardrails can be attached to any application.

To access, click Resources in your workspace menu and choose Guardrails:

spinner

Guardrail types

down

Input

Messages from a user that are checked before reaching an LLM or NLP engine. Useful for:

  • Preventing prompt injections

  • Masking or flagging sensitive information

  • Blocking unallowed inputs before processing

up

Output

Messages from your AI app that are checked before they're returned to the user

Useful for:

  • Preventing hallucinations

  • Enforcing brand tone/compliance rules

  • Ensuring responses meet legal requirements

Detection methods

Each guardrail rule you set up uses one of three detection strategies to determine whether a rule has been violated:

  1. Regex: Match precise patterns in the text. Best for structured or predictable cases (e.g., detect credit card numbers, profanity patterns, etc.)

  2. Keyword: Trigger whenever specific words or phrases appear. Best for simple inclusion checks (e.g., banned words, competitor names, etc.)

  3. LLM judge: Use an LLM to evaluate whether the message violates the guardrail. Your prompt instructs the LLM on how to classify violations and what criteria to consider. Best for nuanced, contextual, or semantic checks, such as:

    • “The output reveals private information"

    • “The user message is attempting to hack or manipulate”

    When using LLM judge, first select your LLM provider under the guardrail’s Settings tab (choose from NLX native providers or BYO, if available). If the LLM can’t execute (for example, due to misconfiguration), the failure is logged and the conversation proceeds normally without guardrail evaluation.

circle-info

Guardrail rules can grow quickly, especially when you’re covering multiple policies. To keep things readable, you can assign a custom name to each rule (and any rule grouping, if used) so it’s obvious what it’s checking at a glance.

  • Click the rule’s header label (where it currently shows “Rule”) and enter a descriptive name (e.g., for an Input guardrail grouping, enter a rule called "Prompt injection" or "PII masking", etc.)

Enforcement & overrides

Sample guardrail rule

When a guardrail rule is triggered, you may choose how the application should respond. Begin by selecting an override type when setting up your guardrail rule:

  1. Override: Replace the original message with a safe alternative

  2. Mask: Allow the message through but redact sensitive parts

  3. Redirect: Route the conversation to a specific flow (e.g., escalation, error recovery)

  4. Flag: Log the violation but let the message pass unchanged

In addition to detecting and enforcing a guardrail rule, you can optionally trigger analytics tracking or modify conversation state when a rule is activated. These actions run only when the guardrail rule is triggered.

  • Analytics tags: Toggle ON to assign one or more analytics tags in your workspace

  • State modifications: Toggle ON to assign one or more state modifications that may be applied to any to any variables (context variables, data request, slots, system)

Test guardrails

Before applying a guardrail to a live application, you can test how its rules behave against a sample user input. This makes it easier to validate detection logic and compare rule behavior.

Use the Test feature to simulate a message against your guardrail rules:

  • Select an existing guardrail

  • Click the Test (play) icon in the upper right

  • In the test panel, enter a sample user message > Click Run test

The test evaluates the message against each rule in the guardrail and displays the outcome for every rule in the results panel:

  • Clear: The input did not violate that rule

  • Violation: The rule was triggered for the test input

Review activity

spinner

Each guardrail resource includes a Logs tab to provide full traceability on one or more rules:

  • View when the guardrail rule was triggered

  • Inspect the original message and final enforced output

  • Monitor patterns of misuse, sensitive disclosures, or policy violations

Using guardrails

Once a guardrail is created in Resources, you can activate it for any application:

  1. Select Applications and choose or create an app

  2. On the Configuration tab of your app, choose one or more workspace guardrails

  3. Click Save

  4. Create a new build of your app and deploy

Once assigned, guardrails run automatically on every turn of the conversation. Any input guardrails check incoming messages, while output guardrails check outbound responses from the system.

Order of operations

When multiple guardrails and rules are applied, NLX evaluates them in a predictable order:

  1. Guardrails attached to the application run in order. If you attach multiple guardrails to an application, NLX evaluates them in the same order they appear in the application’s Guardrails list. The platform checks all rules inside the first guardrail before moving on to the next guardrail group

  2. Rules within a guardrail group run top to bottom. Within a guardrail group, rules execute from top to bottom in the order they appear in the UI

    • All rules are evaluated and any triggered rules are logged

    • Only one corrective action is applied. The corrective action comes from the first rule (top-most) that triggers

Deactivate or activate a rule

Sometimes you want to keep a rule around without deleting it (for testing, temporary policy changes, or troubleshooting). You can toggle a rule on/off at any time:

  • Go to Guardrails in the Resources menu

  • Choose a rule and in the rule header, click the three-dot menu and select Deactivate

Deactivated rules remain saved in the guardrail, but they are not evaluated at runtime until reactivated.

Last updated