CI/CD Pipelines That Actually Make You Faster
·ScaledByDesign·
ci-cddevopspipelinesengineering
The 45-Minute Pipeline
Your engineer pushes code. CI starts. Forty-five minutes later, the build fails because of a flaky test that has nothing to do with their change. They re-run it. Another 45 minutes. It passes. They merge. The whole cycle took 2 hours for a 10-line change.
This isn't CI/CD. This is continuous waiting.
Why Pipelines Get Slow
Typical slow pipeline breakdown:
Install dependencies: 3 min (downloads the internet every time)
Lint + type check: 4 min (checks everything, not just changes)
Unit tests: 8 min (runs all 2,400 tests serially)
Integration tests: 12 min (spins up databases, waits for seeds)
E2E tests: 15 min (launches browser, flaky selectors)
Build: 5 min (no caching, rebuilds everything)
Deploy to staging: 3 min
Total: ~50 minutes
With flaky retry: ~95 minutes
The Fast Pipeline Architecture
Principle 1: Parallelize Everything
# BEFORE: Sequential pipeline (50 minutes)
steps:
- install
- lint
- type-check
- unit-tests
- integration-tests
- e2e-tests
- build
- deploy
# AFTER: Parallel pipeline (12 minutes)
steps:
- install (cached, 30 seconds)
- parallel:
- lint + type-check (3 min)
- unit-tests (3 min, parallelized across 4 runners)
- build (2 min, cached)
- integration-tests (4 min, only affected services)
- e2e-tests (5 min, only critical paths)
- deploy (1 min, pre-built artifact)Principle 2: Cache Aggressively
# Cache node_modules based on lock file hash
- name: Cache dependencies
uses: actions/cache@v4
with:
path: node_modules
key: deps-${{ hashFiles('package-lock.json') }}
# Cache build output
- name: Cache Next.js build
uses: actions/cache@v4
with:
path: .next/cache
key: nextjs-${{ hashFiles('**/*.ts', '**/*.tsx') }}
# Cache test results (don't re-run unchanged tests)
- name: Cache test results
uses: actions/cache@v4
with:
path: .jest-cache
key: tests-${{ hashFiles('src/**/*.test.ts') }}
# Impact: Install goes from 3 min to 30 sec
# Build goes from 5 min to 45 sec (incremental)Principle 3: Only Test What Changed
# Determine which files changed
- name: Get changed files
id: changes
run: |
FILES=$(git diff --name-only ${{ github.event.before }} HEAD)
echo "changed=$FILES" >> $GITHUB_OUTPUT
# Only run backend tests if backend files changed
- name: Backend tests
if: contains(steps.changes.outputs.changed, 'src/api/')
run: npm run test:api
# Only run frontend tests if frontend files changed
- name: Frontend tests
if: contains(steps.changes.outputs.changed, 'src/components/')
run: npm run test:components
# Only run E2E if critical paths changed
- name: E2E tests
if: |
contains(steps.changes.outputs.changed, 'src/app/checkout') ||
contains(steps.changes.outputs.changed, 'src/app/auth')
run: npm run test:e2ePrinciple 4: Kill Flaky Tests
A flaky test is worse than no test. It erodes trust in the
entire test suite. Engineers start ignoring failures.
Detection:
Track test pass/fail rate over 30 days.
Any test that fails > 2% of the time WITHOUT code changes
is flaky.
Policy:
1. Flaky test detected → Quarantine immediately
2. Quarantined tests run separately (not blocking)
3. Owner has 1 week to fix or delete the test
4. If not fixed in 1 week, test is deleted
Dashboard:
Total tests: 2,400
Passing: 2,385 (99.4%)
Quarantined (flaky): 12 (0.5%)
Disabled: 3 (0.1%)
Flaky rate trend: ↓ (was 3.2% last month)
The Pipeline Stages
Stage 1: Fast Feedback (< 3 minutes)
Runs on EVERY push. Must be fast. Must be reliable.
✓ Linting (ESLint)
✓ Type checking (TypeScript)
✓ Unit tests (affected files only)
✓ Security scanning (dependency audit)
If this stage fails, the developer knows within 3 minutes.
They can fix it while the code is still in their head.
Stage 2: Thorough Testing (< 10 minutes)
Runs on PR creation and updates.
✓ Full unit test suite (parallelized)
✓ Integration tests (affected services)
✓ Build verification
✓ Bundle size check
✓ Performance benchmarks (compared to main)
This is the quality gate. PRs can't merge without green.
Stage 3: E2E and Deploy (< 10 minutes)
Runs after merge to main.
✓ Critical path E2E tests (login, checkout, core flows)
✓ Build production artifact
✓ Deploy to staging
✓ Smoke tests against staging
✓ Deploy to production (if smoke tests pass)
Not every code path needs E2E. Test the 5-10 critical
user journeys that generate revenue.
Measuring Pipeline Health
Weekly Pipeline Metrics:
Speed:
Median pipeline time: 8 min (target: < 10 min)
P95 pipeline time: 14 min (target: < 20 min)
Time to first feedback: 2 min (target: < 3 min)
Reliability:
Pipeline success rate: 94% (target: > 95%)
Flaky test rate: 0.8% (target: < 1%)
False positive rate: 0.3% (target: < 0.5%)
Throughput:
Deploys per day: 8 (target: > 5)
Lead time (commit → prod): 45 min (target: < 1 hour)
Rollback time: 3 min (target: < 5 min)
The ROI Math
Team of 8 engineers, 12 PRs per day:
Before (50-min pipeline):
Wait time per PR: 50 min × 1.3 avg retries = 65 min
Daily wait time: 65 min × 12 PRs = 780 min (13 hours)
Context switching cost: ~30 min per wait = 6 hours/day
Total daily cost: 19 engineer-hours wasted
After (12-min pipeline):
Wait time per PR: 12 min × 1.05 avg retries = 13 min
Daily wait time: 13 min × 12 PRs = 156 min (2.6 hours)
Context switching cost: minimal (can stay focused)
Total daily cost: 2.6 engineer-hours
Savings: 16.4 engineer-hours per day
At $100/hour loaded cost: $1,640/day = $426,000/year
Investment to fix: 2-3 weeks of engineering time (~$40K)
ROI: 10x in year one
A fast, reliable pipeline isn't a nice-to-have — it's the highest-leverage investment in engineering productivity. Every minute you shave off the pipeline compounds across every engineer, every PR, every day. Fix the pipeline, and everything else gets faster.