ScaledByDesign/Articles
All ArticlesServicesAbout
scaledbydesign.com
Scaled By Design

Fractional CTO + execution partner for revenue-critical systems.

Company

  • About
  • Services
  • Contact

Resources

  • Articles
  • Pricing
  • FAQ

Legal

  • Privacy Policy
  • Terms of Service

© 2026 ScaledByDesign. All rights reserved.

contact@scaledbydesign.com

On This Page

The Demo Worked. Production Didn't.What Guardrails Actually MeanInput GuardrailsOutput GuardrailsThe Five Guardrails Every Agent Needs1. Scope Boundaries2. Confidence Thresholds3. Rate Limiting and Circuit Breakers4. Audit Logging5. Human HandoffThe Cost of Skipping GuardrailsBuild It Right the First Time
  1. Articles
  2. AI & Automation
  3. Your AI Agent Isn't Working Because You Skipped the Guardrails

Your AI Agent Isn't Working Because You Skipped the Guardrails

February 7, 2026·ScaledByDesign·
aiagentsguardrailsproduction

The Demo Worked. Production Didn't.

We see this every month: a team builds an AI agent that works beautifully in a demo. The CEO is thrilled. Then it hits production and starts hallucinating order numbers, promising refunds it can't issue, and confidently giving wrong answers to paying customers.

The model isn't the problem. The missing guardrails are.

What Guardrails Actually Mean

Guardrails aren't about limiting your AI — they're about making it trustworthy enough to deploy. Think of them as the engineering layer between "cool demo" and "production system."

Input Guardrails

Filter what goes into the model before it processes anything:

interface InputGuardrail {
  validateInput(input: string): {
    safe: boolean;
    sanitized: string;
    flags: string[];
  };
}
 
const inputGuardrail: InputGuardrail = {
  validateInput(input: string) {
    const flags: string[] = [];
 
    // Detect prompt injection attempts
    if (containsInjectionPattern(input)) {
      flags.push("injection_attempt");
      return { safe: false, sanitized: "", flags };
    }
 
    // Strip PII before sending to model
    const sanitized = redactPII(input);
    if (sanitized !== input) flags.push("pii_redacted");
 
    // Check token length
    if (estimateTokens(sanitized) > MAX_INPUT_TOKENS) {
      flags.push("truncated");
    }
 
    return { safe: true, sanitized, flags };
  },
};

Output Guardrails

Validate what comes back before it reaches the user:

async function validateAgentResponse(
  response: AgentResponse,
  context: RequestContext
): Promise<ValidatedResponse> {
  // 1. Check for hallucinated data
  const factCheck = await verifyAgainstSource(
    response.claims,
    context.knowledgeBase
  );
 
  if (factCheck.confidence < 0.85) {
    return {
      action: "escalate",
      reason: "Low confidence on factual claims",
      fallback: "Let me connect you with a team member who can help.",
    };
  }
 
  // 2. Ensure response stays in scope
  if (!isWithinAgentScope(response, context.allowedActions)) {
    return {
      action: "redirect",
      reason: "Out of scope response",
      fallback: generateScopedResponse(context),
    };
  }
 
  // 3. Check for unauthorized commitments
  if (containsCommitment(response) && !context.canMakeCommitments) {
    return {
      action: "modify",
      reason: "Unauthorized commitment detected",
      modified: removeCommitments(response),
    };
  }
 
  return { action: "allow", response };
}

The Five Guardrails Every Agent Needs

1. Scope Boundaries

Define exactly what your agent can and cannot do. Not in a prompt — in code:

const agentScope = {
  canDo: [
    "answer_product_questions",
    "check_order_status",
    "initiate_return",
    "update_shipping_address",
  ],
  cannotDo: [
    "issue_refunds_over_50",
    "modify_pricing",
    "access_payment_details",
    "make_policy_exceptions",
  ],
  escalateTo: "human_agent",
  escalationTriggers: [
    "customer_frustration_detected",
    "legal_question",
    "complaint_about_agent",
    "three_failed_attempts",
  ],
};

2. Confidence Thresholds

Never let an agent act on low-confidence outputs:

ConfidenceAction
> 0.90Execute automatically
0.70 – 0.90Execute with disclaimer
0.50 – 0.70Suggest but require confirmation
< 0.50Escalate to human

3. Rate Limiting and Circuit Breakers

Your agent shouldn't be able to take 500 actions per minute, even if the model says to:

const circuitBreaker = {
  maxActionsPerMinute: 10,
  maxCostPerHour: 50, // dollars
  maxEscalationsBeforeShutdown: 5,
  cooldownPeriod: "5m",
};

4. Audit Logging

Every agent action should be traceable:

  • What input triggered the action
  • What the model returned (raw)
  • What guardrails modified or blocked
  • What the user actually received
  • Latency and cost per interaction

5. Human Handoff

The most important guardrail is knowing when to stop. Build graceful handoff that preserves context — don't make the customer repeat themselves.

The Cost of Skipping Guardrails

We've cleaned up after companies that shipped agents without guardrails:

  • $40k in unauthorized refunds from an agent that learned to say "yes" to everything
  • Legal exposure from an agent that gave medical advice it wasn't qualified to give
  • Brand damage from an agent that argued with customers about return policies

The model cost was $200/month. The cleanup cost was 6 figures.

Build It Right the First Time

Guardrails aren't a nice-to-have. They're the difference between an AI demo and an AI product. If your agent doesn't have input validation, output verification, scope boundaries, confidence thresholds, and human handoff — it's not ready for production.

It's just a demo with a URL.

Previous
RAG vs Fine-Tuning: When to Use What in Production
Articles
Your AI Agent Isn't Working Because You Skipped the GuardrailsRAG vs Fine-Tuning: When to Use What in ProductionHow to Cut Your LLM Costs by 70% Without Losing QualityThe AI Implementation Playbook for Non-Technical FoundersWhy Most AI Chatbots Fail (And What Production-Grade Looks Like)Building AI Agents That Know When to Hand Off to HumansVibe Coding Is Destroying Your CodebaseAI Won't Fix Your Broken Data Pipeline