Testing Strategy & Quality Gates

Comprehensive testing strategy with unit, integration, contract, and E2E tests. Test pyramid, coverage enforcement, and quality gates in CI.

Milestone: Foundation

foundational

CFR

Job to be done: When we ship code without clear test strategy, I want a balanced test pyramid with ownership and metrics, so I can ship reliably without slow suites blocking PRs.

For engineers

Build a balanced test portfolio with unit, integration, and E2E tests organized in a pyramid structure; set up coverage thresholds in CI, quarantine flaky tests, and implement contract testing for critical service boundaries over three weeks.

What you’ll implement

These are the roadmap epic features, organized as a starter backlog.

Code Coverage Baseline

Integration Testing

End-to-End Testing

Test Data Management

Flaky Test Detection

Parallel Test Execution

Execution guide

Practical guidance aligned to the Execution Kit Definition of Done.

Outcome

Teams ship confidently with a balanced test portfolio, fast feedback loops, and quality gates that catch defects early without slowing delivery.

Before to After Transformation

× BEFORETesting as afterthought

Few tests, slow suites, frequent flakes, regressions reach production, testing is 'someone else's job'

# Current state:
- 12% code coverage
- Tests take 45 min to run
- 20+ flaky tests ignored
- 'Works on my machine'
- No one owns test quality
- E2E tests break every sprint

AFTERQuality built-in

Balanced test pyramid, fast feedback, <2% flaky rate, confidence to deploy, shared ownership

# Target state:
├── 70% unit (< 2 min)
├── 20% integration (< 5 min)
├── 10% e2e (< 10 min)
- Flaky rate < 2%
- Coverage > 80% on critical paths
- Contract tests for all APIs
- Testing guidelines published

# DORA improvements:
# - Deployment frequency: weekly to daily
# - Lead time: 7 days to 2 days

Symptoms

Frequent regressions and flaky tests blocking PRs

Slow test suites (>10min) delaying deployments

Unclear test ownership and pyramid violations

Prerequisites

CI pipeline running on PRs

Basic code review process

Teams have time allocated for quality work

Implementation steps

Week 1

Audit current test portfolio (count by type, coverage, run time)
Define test pyramid ratios for your context (e.g., 70/20/10)
Establish test naming conventions and folder structure
Set up coverage reporting in CI (with realistic thresholds)

Week 2

Identify and quarantine/fix top 10 flaky tests
Add contract tests for critical service boundaries
Create test data management strategy (fixtures, factories, seeding)
Implement parallel test execution

Week 3

Add smoke/sanity tests for post-deploy validation
Define mutation testing baseline for critical paths
Create testing guidelines doc (when to write what)
Set up test quality metrics dashboard

Definition of Done

Test pyramid documented with target ratios
Coverage thresholds enforced in CI
Flaky test rate < 2%
Test suite runs < 10 minutes for unit/integration
Critical paths have contract tests
Testing guidelines published

Metrics

Leading Indicators

Test coverage % (overall and per-component)
Flaky test rate (tests that fail intermittently)
Test suite duration (p50/p95)
Tests per PR (are tests being added?)
Mutation score for critical paths

Lagging Indicators

Change failure rate
Defect escape rate (bugs found in prod)
Time to detect (how fast do tests catch issues?)
Rework rate (fixes to recently shipped code)

Failure modes

Inverted pyramid: too many E2E, too few unit tests to slow feedback

Coverage theater: hitting numbers without testing behavior

Flaky tests ignored: everyone clicks 're-run' without fixing

Test data coupling: tests depend on shared state, break randomly

Missing contract tests: integration failures in production

No parallel execution: 45-minute test suites blocking PRs

Ownership

Teams/Engineers

Write tests at appropriate levels for new code
Maintain and fix tests they own
Follow testing guidelines

Tech Leads

Review test quality in PRs (not just coverage)
Ensure balanced test pyramid
Champion test refactoring when needed

Platform/DevOps

Optimize test infrastructure (parallelization, caching)
Provide test utilities and patterns
Track and report test health metrics
Define E2E test strategy and tooling
Own contract testing framework
Analyze defect patterns and identify gaps

What good looks like (by org scale)

Small Teams

Test pyramid documented (even if informal)
Unit tests for business logic
CI blocks merge on test failure
Coverage > 60% with meaningful tests

Medium Orgs

Integration tests for API boundaries
Contract tests for service dependencies
Flaky test tracking and <5% rate
Test suite completes < 15 minutes
Smoke tests post-deploy

Enterprise

Mutation testing for critical paths
Parallel execution across shards
Test quality dashboards
Automated visual regression
Performance testing in CI

References

Testing Trophy (Kent C. Dodds)

Test Pyramid (Martin Fowler)

Contract Testing with Pact

Testing Library Guiding Principles

Playwright Best Practices

Resources

Templates and related materials for this kit.

Templates

Copy/paste artifacts that support this kit.

Definition of Done (DoD)

A ready-to-use DoD checklist that bakes in quality, security, and operability.

Related capabilities

Capabilities tracked under this epic in the roadmap.

Code Coverage Baseline
>= 70% code coverage for unit tests with branch coverage tracked and enforced in CI.
Integration Testing
>= 60% of services have integration tests covering critical API endpoints and database interactions.
End-to-End Testing
>= 50% of critical user journeys covered by automated E2E tests (Playwright, Cypress).
Test Data Management
>= 70% of tests use factories or builders for test data (no hardcoded magic values).
Flaky Test Detection
>= 90% of flaky tests detected and fixed within 1 sprint. Flaky rate < 2%.
Parallel Test Execution
>= 70% of test suites run tests in parallel, reducing total test time by >= 50%.

Related kits

Other kits in the same milestone or with similar DORA impact.

Code Quality & Review Standards

Foundation

CFR

Backlog Quality & Planning Enablement

Foundation

CI/CD & Build Automation

Foundation

Observability & Monitoring Foundations

Foundation

MTTR

CFR

Before to After Transformation

× BEFORETesting as afterthought

Few tests, slow suites, frequent flakes, regressions reach production, testing is 'someone else's job'

# Current state:
- 12% code coverage
- Tests take 45 min to run
- 20+ flaky tests ignored
- 'Works on my machine'
- No one owns test quality
- E2E tests break every sprint

AFTERQuality built-in

Balanced test pyramid, fast feedback, <2% flaky rate, confidence to deploy, shared ownership

# Target state:
├── 70% unit (< 2 min)
├── 20% integration (< 5 min)
├── 10% e2e (< 10 min)
- Flaky rate < 2%
- Coverage > 80% on critical paths
- Contract tests for all APIs
- Testing guidelines published

# DORA improvements:
# - Deployment frequency: weekly to daily
# - Lead time: 7 days to 2 days

Implementation steps

Week 1

Audit current test portfolio (count by type, coverage, run time)
Define test pyramid ratios for your context (e.g., 70/20/10)
Establish test naming conventions and folder structure
Set up coverage reporting in CI (with realistic thresholds)

Week 2

Identify and quarantine/fix top 10 flaky tests
Add contract tests for critical service boundaries
Create test data management strategy (fixtures, factories, seeding)
Implement parallel test execution

Week 3

Add smoke/sanity tests for post-deploy validation
Define mutation testing baseline for critical paths
Create testing guidelines doc (when to write what)
Set up test quality metrics dashboard

Metrics

Leading Indicators

Test coverage % (overall and per-component)
Flaky test rate (tests that fail intermittently)
Test suite duration (p50/p95)
Tests per PR (are tests being added?)
Mutation score for critical paths

Lagging Indicators

Change failure rate
Defect escape rate (bugs found in prod)
Time to detect (how fast do tests catch issues?)
Rework rate (fixes to recently shipped code)

Failure modes

Inverted pyramid: too many E2E, too few unit tests to slow feedback

Coverage theater: hitting numbers without testing behavior

Flaky tests ignored: everyone clicks 're-run' without fixing

Test data coupling: tests depend on shared state, break randomly

Missing contract tests: integration failures in production

No parallel execution: 45-minute test suites blocking PRs

Ownership

Teams/Engineers

Write tests at appropriate levels for new code
Maintain and fix tests they own
Follow testing guidelines

Tech Leads

Review test quality in PRs (not just coverage)
Ensure balanced test pyramid
Champion test refactoring when needed

Platform/DevOps

Optimize test infrastructure (parallelization, caching)
Provide test utilities and patterns
Track and report test health metrics
Define E2E test strategy and tooling
Own contract testing framework
Analyze defect patterns and identify gaps