AI Hallucination Detection in Production — What Actually Works
The Invoice That Never Existed
A client deployed an AI customer support agent for their SaaS platform. Within 48 hours, the agent told a customer they had an outstanding invoice for $4,200 — an invoice that didn't exist. The customer panicked, called their accountant, and almost churned. The AI had hallucinated a specific invoice number, amount, and due date with perfect confidence.
This is the hallucination problem. It's not that the model says "I don't know." It's that it fabricates specific, plausible-sounding information that's completely wrong.
Why Hallucinations Happen
LLMs don't "know" things — they predict the next likely token. When there's no grounding data, they generate what sounds right:
User: "What's the status of order #ORD-7842?"
What the model should do: Look up order #ORD-7842 in the database
What the model often does: Generate a plausible-sounding status
Hallucinated response: "Order #ORD-7842 was shipped on March 2nd via
FedEx tracking #7891234567. Expected delivery: March 5th."
Every detail is fabricated. The order number format looks right.
The dates are reasonable. The tracking number has the right length.
A human reading this would assume it's real.
The Three-Layer Detection System
After building hallucination detection for several production AI systems, here's the architecture that works:
Layer 1: Grounding Verification
Every factual claim the AI makes should be traceable to a source document:
interface GroundedResponse {
answer: string;
citations: Citation[];
groundingScore: number; // 0-1, % of claims grounded in sources
}
async function verifyGrounding(
response: string,
sourceDocuments: Document[]
): Promise<GroundedResponse> {
// Extract factual claims from the response
const claims = await extractClaims(response);
// Check each claim against source documents
const verified = await Promise.all(
claims.map(async (claim) => {
const match = await findSupportingEvidence(claim, sourceDocuments);
return {
claim: claim.text,
supported: match.confidence > 0.8,
source: match.document,
confidence: match.confidence,
};
})
);
const groundingScore = verified.filter(v => v.supported).length / verified.length;
return {
answer: response,
citations: verified.filter(v => v.supported).map(v => v.source),
groundingScore,
};
}If the grounding score drops below 0.7, the response is flagged for human review or replaced with "I don't have that information — let me connect you with our team."
Layer 2: Structural Validation
For responses that include structured data (numbers, dates, IDs), validate against your actual systems:
const validators: Record<string, Validator> = {
orderNumber: {
pattern: /ORD-\d{4,6}/,
validate: async (id) => {
const exists = await db.orders.exists({ id });
return { valid: exists, type: "order_reference" };
},
},
invoiceAmount: {
pattern: /\$[\d,]+\.?\d{0,2}/,
validate: async (amount, context) => {
if (!context.orderId) return { valid: false, type: "unverifiable" };
const invoice = await db.invoices.find({ orderId: context.orderId });
return {
valid: invoice?.amount === parseFloat(amount.replace(/[$,]/g, "")),
type: "invoice_amount",
};
},
},
dateReference: {
pattern: /\b\d{4}-\d{2}-\d{2}\b|(?:January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2}/,
validate: async (date, context) => {
// Verify date exists in related records
return await verifyDateInContext(date, context);
},
},
};Layer 3: Confidence Calibration
LLMs are notoriously poorly calibrated — they're confident even when wrong. Add an explicit confidence layer:
async function calibrateResponse(response: string, context: RetrievalContext) {
// Ask the model to self-evaluate (works better than you'd expect)
const evaluation = await llm.evaluate({
prompt: `Given ONLY the following source documents, rate your confidence
that the response is factually accurate. Be conservative.
Source documents: ${context.documents}
Response: ${response}
Rate confidence 0-100 and explain any uncertain claims.`,
});
// Apply calibration curve (learned from historical data)
const calibrated = calibrationCurve(evaluation.rawConfidence);
return {
response,
confidence: calibrated,
uncertainClaims: evaluation.uncertainClaims,
action: calibrated > 0.8 ? "serve" : calibrated > 0.5 ? "flag" : "escalate",
};
}The Fallback Hierarchy
When hallucination is detected, don't just show an error. Have a graceful fallback:
Confidence > 80%: Serve the response with citations
Confidence 50-80%: Serve with disclaimer: "Based on available information..."
Confidence 30-50%: Offer partial answer + "Would you like me to connect you
with our team for the specific details?"
Confidence < 30%: "I don't have reliable information about that. Let me
connect you with a specialist."
The brands getting AI right aren't the ones with the smartest models. They're the ones with the best safety nets around the model. Build the detection layers, implement the fallbacks, and treat every AI response as guilty until proven grounded.