Your AI Agent Isn't Working Because You Skipped the Guardrails

February 7, 2026·ScaledByDesign·

aiagentsguardrailsproduction

The Demo Worked. Production Didn't.

We see this every month: a team builds an AI agent that works beautifully in a demo. The CEO is thrilled. Then it hits production and starts hallucinating order numbers, promising refunds it can't issue, and confidently giving wrong answers to paying customers.

The model isn't the problem. The missing guardrails are.

What Guardrails Actually Mean

Guardrails aren't about limiting your AI — they're about making it trustworthy enough to deploy. Think of them as the engineering layer between "cool demo" and "production system."

Input Guardrails

Filter what goes into the model before it processes anything:

interface InputGuardrail {
  validateInput(input: string): {
    safe: boolean;
    sanitized: string;
    flags: string[];
  };
}
 
const inputGuardrail: InputGuardrail = {
  validateInput(input: string) {
    const flags: string[] = [];
 
    // Detect prompt injection attempts
    if (containsInjectionPattern(input)) {
      flags.push("injection_attempt");
      return { safe: false, sanitized: "", flags };
    }
 
    // Strip PII before sending to model
    const sanitized = redactPII(input);
    if (sanitized !== input) flags.push("pii_redacted");
 
    // Check token length
    if (estimateTokens(sanitized) > MAX_INPUT_TOKENS) {
      flags.push("truncated");
    }
 
    return { safe: true, sanitized, flags };
  },
};

Output Guardrails

Validate what comes back before it reaches the user:

async function validateAgentResponse(
  response: AgentResponse,
  context: RequestContext
): Promise<ValidatedResponse> {
  // 1. Check for hallucinated data
  const factCheck = await verifyAgainstSource(
    response.claims,
    context.knowledgeBase
  );
 
  if (factCheck.confidence < 0.85) {
    return {
      action: "escalate",
      reason: "Low confidence on factual claims",
      fallback: "Let me connect you with a team member who can help.",
    };
  }
 
  // 2. Ensure response stays in scope
  if (!isWithinAgentScope(response, context.allowedActions)) {
    return {
      action: "redirect",
      reason: "Out of scope response",
      fallback: generateScopedResponse(context),
    };
  }
 
  // 3. Check for unauthorized commitments
  if (containsCommitment(response) && !context.canMakeCommitments) {
    return {
      action: "modify",
      reason: "Unauthorized commitment detected",
      modified: removeCommitments(response),
    };
  }
 
  return { action: "allow", response };
}

The Five Guardrails Every Agent Needs

1. Scope Boundaries

Define exactly what your agent can and cannot do. Not in a prompt — in code:

const agentScope = {
  canDo: [
    "answer_product_questions",
    "check_order_status",
    "initiate_return",
    "update_shipping_address",
  ],
  cannotDo: [
    "issue_refunds_over_50",
    "modify_pricing",
    "access_payment_details",
    "make_policy_exceptions",
  ],
  escalateTo: "human_agent",
  escalationTriggers: [
    "customer_frustration_detected",
    "legal_question",
    "complaint_about_agent",
    "three_failed_attempts",
  ],
};

2. Confidence Thresholds

Never let an agent act on low-confidence outputs:

Confidence	Action
> 0.90	Execute automatically
0.70 – 0.90	Execute with disclaimer
0.50 – 0.70	Suggest but require confirmation
< 0.50	Escalate to human

3. Rate Limiting and Circuit Breakers

Your agent shouldn't be able to take 500 actions per minute, even if the model says to:

const circuitBreaker = {
  maxActionsPerMinute: 10,
  maxCostPerHour: 50, // dollars
  maxEscalationsBeforeShutdown: 5,
  cooldownPeriod: "5m",
};

4. Audit Logging

Every agent action should be traceable:

What input triggered the action
What the model returned (raw)
What guardrails modified or blocked
What the user actually received
Latency and cost per interaction

5. Human Handoff

The most important guardrail is knowing when to stop. Build graceful handoff that preserves context — don't make the customer repeat themselves.

The Cost of Skipping Guardrails

We've cleaned up after companies that shipped agents without guardrails:

$40k in unauthorized refunds from an agent that learned to say "yes" to everything
Legal exposure from an agent that gave medical advice it wasn't qualified to give
Brand damage from an agent that argued with customers about return policies

The model cost was $200/month. The cleanup cost was 6 figures.

Build It Right the First Time

Guardrails aren't a nice-to-have. They're the difference between an AI demo and an AI product. If your agent doesn't have input validation, output verification, scope boundaries, confidence thresholds, and human handoff — it's not ready for production.

It's just a demo with a URL.

RAG vs Fine-Tuning: When to Use What in Production