Feature Flags for Backend Services — Beyond the Toggle
Flags Aren't Just for the Frontend
Most teams think of feature flags as a way to hide a new button from users. That's the simplest use case. The real power of feature flags is in backend services: gradual rollouts of new algorithms, kill switches for third-party integrations, and operational flags that let you tune system behavior without deploying code.
The Four Types of Backend Flags
1. Release flags (temporary):
→ "Use new payment processor for 10% of orders"
→ Remove after full rollout (days to weeks)
2. Operational flags (long-lived):
→ "Enable circuit breaker for Stripe API"
→ Keep indefinitely for operational control
3. Experiment flags (temporary):
→ "Test new recommendation algorithm for segment A"
→ Remove after experiment concludes
4. Permission flags (long-lived):
→ "Enable bulk import for enterprise customers"
→ Keep until feature is generally available
Implementation Patterns
Pattern 1: Percentage Rollout
interface FeatureFlag {
key: string;
enabled: boolean;
rolloutPercentage: number; // 0-100
targetSegments?: string[];
killSwitch: boolean;
}
function isFeatureEnabled(
flag: FeatureFlag,
context: { userId: string; segment?: string }
): boolean {
// Kill switch overrides everything
if (flag.killSwitch) return false;
if (!flag.enabled) return false;
// Segment targeting
if (flag.targetSegments?.length) {
if (!context.segment || !flag.targetSegments.includes(context.segment)) {
return false;
}
}
// Deterministic percentage rollout (same user always gets same result)
const hash = murmurhash3(`${flag.key}:${context.userId}`);
const bucket = (hash % 100) + 1; // 1-100
return bucket <= flag.rolloutPercentage;
}Pattern 2: Backend Kill Switches
// Wrap third-party calls with operational flags
async function processPayment(order: Order): Promise<PaymentResult> {
const useNewProcessor = isFeatureEnabled(
flags.get("new_payment_processor"),
{ userId: order.customerId }
);
if (useNewProcessor) {
try {
return await newPaymentProcessor.charge(order);
} catch (error) {
// If new processor fails, automatically fall back
metrics.increment("payment.new_processor.fallback");
return await legacyPaymentProcessor.charge(order);
}
}
return await legacyPaymentProcessor.charge(order);
}
// Operational flag for circuit breaking
async function callExternalApi(request: ApiRequest): Promise<ApiResponse> {
if (flags.get("external_api_circuit_breaker")?.killSwitch) {
// Return cached/default response when circuit is open
return getCachedResponse(request) ?? getDefaultResponse(request);
}
return await externalApi.call(request);
}Pattern 3: Gradual Algorithm Rollout
// Test a new recommendation algorithm in production
async function getRecommendations(userId: string): Promise<Product[]> {
const useNewAlgorithm = isFeatureEnabled(
flags.get("recommendation_v2"),
{ userId }
);
const startTime = Date.now();
let recommendations: Product[];
let algorithm: string;
if (useNewAlgorithm) {
recommendations = await newRecommendationEngine.getRecommendations(userId);
algorithm = "v2";
} else {
recommendations = await currentRecommendationEngine.getRecommendations(userId);
algorithm = "v1";
}
// Track which algorithm served which user
metrics.histogram("recommendations.latency", Date.now() - startTime, { algorithm });
analytics.track("recommendations_served", {
userId,
algorithm,
count: recommendations.length,
productIds: recommendations.map(r => r.id),
});
return recommendations;
}Flag Lifecycle Management
The biggest problem with feature flags isn't creating them — it's cleaning them up:
// Flag metadata tracks lifecycle
interface FlagMetadata {
key: string;
type: "release" | "operational" | "experiment" | "permission";
createdBy: string;
createdAt: Date;
expiresAt?: Date; // When should this flag be removed?
owner: string; // Who's responsible for cleanup?
jiraTicket?: string; // Cleanup ticket
lastEvaluated?: Date; // When was this flag last checked?
}
// Automated stale flag detection
async function findStaleFlags(): Promise<FlagMetadata[]> {
const flags = await flagStore.getAllMetadata();
return flags.filter(flag => {
// Release flags older than 30 days are stale
if (flag.type === "release") {
const age = daysSince(flag.createdAt);
return age > 30 && flag.rolloutPercentage === 100;
}
// Experiment flags that expired
if (flag.type === "experiment" && flag.expiresAt) {
return new Date() > flag.expiresAt;
}
// Flags never evaluated in 90 days
if (flag.lastEvaluated) {
return daysSince(flag.lastEvaluated) > 90;
}
return false;
});
}Testing with Feature Flags
// Test both paths — flags introduce branching that needs coverage
describe("Payment processing", () => {
it("processes payment with new processor when flag is enabled", async () => {
flags.override("new_payment_processor", { enabled: true, rolloutPercentage: 100 });
const result = await processPayment(mockOrder);
expect(newProcessor.charge).toHaveBeenCalled();
});
it("processes payment with legacy processor when flag is disabled", async () => {
flags.override("new_payment_processor", { enabled: false });
const result = await processPayment(mockOrder);
expect(legacyProcessor.charge).toHaveBeenCalled();
});
it("falls back to legacy when new processor fails", async () => {
flags.override("new_payment_processor", { enabled: true, rolloutPercentage: 100 });
newProcessor.charge.mockRejectedValue(new Error("timeout"));
const result = await processPayment(mockOrder);
expect(legacyProcessor.charge).toHaveBeenCalled();
});
});Rules for Backend Feature Flags
1. Every release flag has an expiration date and cleanup owner
2. Kill switches should be instant (cached locally, no network call)
3. Flag evaluation must be deterministic (same input = same result)
4. Always have a fallback for the "flag off" path
5. Log which flag variant was used for every decision
6. Test both flag states in your CI/CD pipeline
7. Run a monthly "flag cleanup" — delete stale flags
8. Never nest flags (if flag A && flag B → impossible to reason about)
Feature flags in backend services are an operational superpower. They let you deploy code without releasing features, roll back behavior without rolling back code, and test in production with real traffic. But like any powerful tool, they require discipline — especially around cleanup. A codebase with 200 stale flags is harder to maintain than one with no flags at all.