ScaledByDesign/Insights
ServicesPricingAboutContact
Book a Call
Scaled By Design

Fractional CTO + execution partner for revenue-critical systems.

Company

  • About
  • Services
  • Contact

Resources

  • Insights
  • Pricing
  • FAQ

Legal

  • Privacy Policy
  • Terms of Service

© 2026 ScaledByDesign. All rights reserved.

contact@scaledbydesign.com

On This Page

$28K/Month on AWS for a Mid-Stage StartupWhere the Money Was GoingThe Caching PyramidLayer 1: Browser Cache HeadersLayer 2: CDN Edge CachingLayer 3: Application Cache (Redis)Cache InvalidationThe ResultsPerformance Improvement (Bonus)Common Caching MistakesImplementation Priority
  1. Insights
  2. Architecture
  3. The Caching Strategy That Cut Our Client's AWS Bill by 60%

The Caching Strategy That Cut Our Client's AWS Bill by 60%

January 1, 2026·ScaledByDesign·
cachingawsperformanceinfrastructure

$28K/Month on AWS for a Mid-Stage Startup

Our client was processing 2M requests per day with a straightforward stack: Next.js frontend, Node.js API, PostgreSQL database, S3 for assets. Their AWS bill had crept from $4K to $28K over 18 months as traffic grew. The reflexive answer was "optimize the code." The actual answer was caching.

Where the Money Was Going

Monthly AWS breakdown (before):
  RDS (PostgreSQL):    $8,200  (29%)  ← database was the bottleneck
  EC2/ECS (compute):   $7,400  (26%)
  CloudFront + S3:     $4,100  (15%)
  ElastiCache:         $0      (0%)   ← no caching at all
  Data transfer:       $3,800  (14%)
  Other (monitoring):  $4,500  (16%)
  Total:               $28,000

The database was handling 12,000 queries per second. Most of them were identical reads being repeated thousands of times per hour.

The Caching Pyramid

Layer 1: Browser Cache (free, immediate)
  ├── Static assets: Cache for 1 year (immutable hashes)
  ├── API responses: Cache for 60-300 seconds (stale-while-revalidate)
  └── Impact: -30% of requests never hit your servers

Layer 2: CDN Cache (CloudFront/Vercel Edge)
  ├── HTML pages: Cache for 60 seconds + stale-while-revalidate
  ├── API responses: Cache for 30-300 seconds by route
  └── Impact: -50% of remaining requests never hit origin

Layer 3: Application Cache (Redis/ElastiCache)
  ├── Database query results: Cache for 60-3600 seconds
  ├── Computed values: Cache for hours/days
  └── Impact: -80% of database queries eliminated

Layer 4: Database Query Optimization
  ├── Only queries that MUST hit the database reach it
  └── Impact: Remaining queries are fast and efficient

Layer 1: Browser Cache Headers

// Set proper cache headers for different content types
 
// Static assets (JS, CSS, images with hashed filenames)
// Cache forever — the hash changes when content changes
res.setHeader("Cache-Control", "public, max-age=31536000, immutable");
 
// API responses that change infrequently (product catalog)
res.setHeader(
  "Cache-Control",
  "public, max-age=60, stale-while-revalidate=300"
);
// Serves cached version for 60s, then revalidates in background
// User always gets a fast response
 
// User-specific data (cart, account)
res.setHeader("Cache-Control", "private, no-cache");
// Never cache — always fresh
 
// HTML pages
res.setHeader(
  "Cache-Control",
  "public, max-age=0, s-maxage=60, stale-while-revalidate=300"
);
// Browser always checks, CDN caches for 60s

Impact: 30% of requests eliminated. Browser serves from local cache without any network request.

Layer 2: CDN Edge Caching

CloudFront behaviors configured per path:

/api/products/*     → Cache 5 min, vary by query string
/api/categories/*   → Cache 1 hour, vary by nothing
/api/search?*       → Cache 2 min, vary by full query string
/api/cart/*          → No cache (user-specific)
/api/user/*          → No cache (user-specific)
/_next/static/*      → Cache 1 year (immutable)
/images/*            → Cache 1 year (immutable, transformed)
/*.html              → Cache 60s, stale-while-revalidate 5 min

Impact: 50% of remaining requests served from CDN edge. Origin server load drops dramatically.

Layer 3: Application Cache (Redis)

This is where the biggest savings happen:

// Generic caching wrapper with automatic invalidation
async function cached<T>(
  key: string,
  ttlSeconds: number,
  fetcher: () => Promise<T>
): Promise<T> {
  // Check Redis first
  const cachedValue = await redis.get(key);
  if (cachedValue) return JSON.parse(cachedValue);
 
  // Cache miss — fetch from database
  const value = await fetcher();
 
  // Store in Redis with TTL
  await redis.setex(key, ttlSeconds, JSON.stringify(value));
 
  return value;
}
 
// Usage: Product catalog (changes rarely)
async function getProduct(id: string) {
  return cached(`product:${id}`, 3600, async () => {
    return db.query("SELECT * FROM products WHERE id = $1", [id]);
  });
}
 
// Usage: Category listing (changes daily)
async function getCategories() {
  return cached("categories:all", 1800, async () => {
    return db.query("SELECT * FROM categories ORDER BY sort_order");
  });
}
 
// Usage: Search results (changes frequently but can be stale for 60s)
async function searchProducts(query: string, page: number) {
  const cacheKey = `search:${query}:${page}`;
  return cached(cacheKey, 60, async () => {
    return db.query("SELECT * FROM products WHERE ...", [query]);
  });
}

Cache Invalidation

// When data changes, invalidate affected cache keys
async function updateProduct(id: string, data: ProductUpdate) {
  await db.query("UPDATE products SET ... WHERE id = $1", [id, ...]);
 
  // Invalidate specific product cache
  await redis.del(`product:${id}`);
 
  // Invalidate category listing (product might affect it)
  await redis.del("categories:all");
 
  // Invalidate search cache (pattern delete)
  const searchKeys = await redis.keys("search:*");
  if (searchKeys.length > 0) await redis.del(...searchKeys);
}

Impact: Database queries dropped from 12,000/sec to 2,400/sec. 80% reduction.

The Results

Monthly AWS breakdown (after):
  RDS (PostgreSQL):    $3,200  (-61%)  ← downsized instance
  EC2/ECS (compute):   $3,100  (-58%)  ← fewer instances needed
  CloudFront + S3:     $2,400  (-41%)  ← better cache hit ratio
  ElastiCache (Redis): $1,200  (new)   ← small Redis instance
  Data transfer:       $800    (-79%)  ← CDN serves most traffic
  Other (monitoring):  $300    (-93%)
  Total:               $11,000 (-61%)

  Monthly savings: $17,000
  Annual savings: $204,000
  Implementation cost: ~$15,000 (2 weeks of engineering)
  ROI: 13.6x in year one

Performance Improvement (Bonus)

Average API response time:
  Before: 340ms (p50), 1,200ms (p99)
  After:  12ms (p50, cache hit), 180ms (p99, cache miss)

Page load time:
  Before: 2.8s
  After:  0.9s

Database CPU utilization:
  Before: 78% average (spikes to 95%)
  After:  22% average (spikes to 45%)

Common Caching Mistakes

❌ Caching everything with the same TTL
   → Different data needs different freshness guarantees

❌ No cache invalidation strategy
   → Stale data is worse than slow data

❌ Caching user-specific data in shared cache
   → User A sees User B's cart (security incident)

❌ Not monitoring cache hit rates
   → You don't know if caching is actually working

❌ Cache stampede on expiration
   → 1,000 requests hit the DB simultaneously when cache expires
   → Fix: Use stale-while-revalidate or cache locking

Implementation Priority

Week 1: Browser cache headers (free, immediate impact)
Week 2: CDN configuration (CloudFront/Vercel behaviors)
Week 3: Redis for top 10 most-hit database queries
Week 4: Cache invalidation and monitoring dashboard

Total effort: 4 weeks of focused engineering
Expected savings: 40-60% of current infrastructure costs

Caching isn't glamorous. But it's the highest-ROI infrastructure investment most startups can make. Before you scale up your database, add more servers, or rewrite your application — add a caching layer. The math almost always works in your favor.

Previous
Observability That Actually Helps You Sleep at Night
Next
The Real Cost of Microservices at Your Scale
Insights
Event-Driven Architecture Without the PhD — A Practical GuideCQRS Without the Complexity — A Practical Implementation GuideThe Strangler Fig Migration That Saved a 10-Year-Old MonolithWhy You Should Start With a MonolithEvent-Driven Architecture for the Rest of UsThe Real Cost of Microservices at Your ScaleThe Caching Strategy That Cut Our Client's AWS Bill by 60%API Design Mistakes That Will Haunt You for YearsMulti-Tenant Architecture: The Decisions You Can't UndoCI/CD Pipelines That Actually Make You FasterThe Rate Limiting Strategy That Saved Our Client's APIWhen to Rewrite vs Refactor: The Decision Framework

Ready to Ship?

Let's talk about your engineering challenges and how we can help.

Book a Call