Your AI Agent Isn't Working Because You Skipped the Guardrails
The Demo Worked. Production Didn't.
We see this every month: a team builds an AI agent that works beautifully in a demo. The CEO is thrilled. Then it hits production and starts hallucinating order numbers, promising refunds it can't issue, and confidently giving wrong answers to paying customers.
The model isn't the problem. The missing guardrails are.
What Guardrails Actually Mean
Guardrails aren't about limiting your AI — they're about making it trustworthy enough to deploy. Think of them as the engineering layer between "cool demo" and "production system."
Input Guardrails
Filter what goes into the model before it processes anything:
interface InputGuardrail {
validateInput(input: string): {
safe: boolean;
sanitized: string;
flags: string[];
};
}
const inputGuardrail: InputGuardrail = {
validateInput(input: string) {
const flags: string[] = [];
// Detect prompt injection attempts
if (containsInjectionPattern(input)) {
flags.push("injection_attempt");
return { safe: false, sanitized: "", flags };
}
// Strip PII before sending to model
const sanitized = redactPII(input);
if (sanitized !== input) flags.push("pii_redacted");
// Check token length
if (estimateTokens(sanitized) > MAX_INPUT_TOKENS) {
flags.push("truncated");
}
return { safe: true, sanitized, flags };
},
};Output Guardrails
Validate what comes back before it reaches the user:
async function validateAgentResponse(
response: AgentResponse,
context: RequestContext
): Promise<ValidatedResponse> {
// 1. Check for hallucinated data
const factCheck = await verifyAgainstSource(
response.claims,
context.knowledgeBase
);
if (factCheck.confidence < 0.85) {
return {
action: "escalate",
reason: "Low confidence on factual claims",
fallback: "Let me connect you with a team member who can help.",
};
}
// 2. Ensure response stays in scope
if (!isWithinAgentScope(response, context.allowedActions)) {
return {
action: "redirect",
reason: "Out of scope response",
fallback: generateScopedResponse(context),
};
}
// 3. Check for unauthorized commitments
if (containsCommitment(response) && !context.canMakeCommitments) {
return {
action: "modify",
reason: "Unauthorized commitment detected",
modified: removeCommitments(response),
};
}
return { action: "allow", response };
}The Five Guardrails Every Agent Needs
1. Scope Boundaries
Define exactly what your agent can and cannot do. Not in a prompt — in code:
const agentScope = {
canDo: [
"answer_product_questions",
"check_order_status",
"initiate_return",
"update_shipping_address",
],
cannotDo: [
"issue_refunds_over_50",
"modify_pricing",
"access_payment_details",
"make_policy_exceptions",
],
escalateTo: "human_agent",
escalationTriggers: [
"customer_frustration_detected",
"legal_question",
"complaint_about_agent",
"three_failed_attempts",
],
};2. Confidence Thresholds
Never let an agent act on low-confidence outputs:
| Confidence | Action |
|---|---|
| > 0.90 | Execute automatically |
| 0.70 – 0.90 | Execute with disclaimer |
| 0.50 – 0.70 | Suggest but require confirmation |
| < 0.50 | Escalate to human |
3. Rate Limiting and Circuit Breakers
Your agent shouldn't be able to take 500 actions per minute, even if the model says to:
const circuitBreaker = {
maxActionsPerMinute: 10,
maxCostPerHour: 50, // dollars
maxEscalationsBeforeShutdown: 5,
cooldownPeriod: "5m",
};4. Audit Logging
Every agent action should be traceable:
- What input triggered the action
- What the model returned (raw)
- What guardrails modified or blocked
- What the user actually received
- Latency and cost per interaction
5. Human Handoff
The most important guardrail is knowing when to stop. Build graceful handoff that preserves context — don't make the customer repeat themselves.
The Cost of Skipping Guardrails
We've cleaned up after companies that shipped agents without guardrails:
- $40k in unauthorized refunds from an agent that learned to say "yes" to everything
- Legal exposure from an agent that gave medical advice it wasn't qualified to give
- Brand damage from an agent that argued with customers about return policies
The model cost was $200/month. The cleanup cost was 6 figures.
Build It Right the First Time
Guardrails aren't a nice-to-have. They're the difference between an AI demo and an AI product. If your agent doesn't have input validation, output verification, scope boundaries, confidence thresholds, and human handoff — it's not ready for production.
It's just a demo with a URL.