Advanced Testing & Performance Validation

Contract testing, performance testing with gates, regression test automation, and runtime security testing (DAST).

Milestone: Acceleration

intermediate

CFR

Job to be done: When regressions escape to production and manual testing delays releases by days, I want automated smoke and UAT tests across environments, so I can ship frequently without regression surprises.

For engineers

Implement post-deploy smoke tests in CI and staging, automate UAT scenarios with environment-specific Playwright configs, and establish test health dashboards tracking pass rate and flake rate to prevent regressions.

What you’ll implement

These are the roadmap epic features, organized as a starter backlog.

Contract Testing Between Services

Performance Testing in CI

Dynamic Application Security Testing

Mutation Testing for Critical Code

Visual Regression Testing

Execution guide

Practical guidance aligned to the Execution Kit Definition of Done.

Outcome

Teams prevent regressions and ship confidently through automated smoke tests in CI, UAT scenario coverage, and environment-based test execution.

Before to After Transformation

× BEFOREManual testing bottleneck with frequent regressions

Tests run only locally or in CI, no post-deploy validation, UAT is manual

# Before state:
- Tests: Unit tests only, no integration or E2E
- CI: 5-min unit tests, no smoke tests
- Deploy: Manual QA testing (2-3 days)
- Regressions: 2-3 per release (caught in production)

# Typical release flow:
1. Code merged to main
2. CI runs unit tests ✓
3. Deploy to staging (manual)
4. QA team manually tests (2-3 days)
5. Bugs found, devs fix, repeat
6. Deploy to production (Friday night)
7. Weekend hotfix for regression

# Metrics:
- Lead time: 7-10 days
- Change failure rate: 25%
- Deployment frequency: Weekly (Fridays only)

AFTERAutomated quality gates with confidence

Smoke tests in CI, post-deploy tests in staging, UAT automated, regressions prevented

# After state:
- Tests: Unit (5 min) + Smoke (3 min) + UAT (15 min) + Regression (30 min)
- CI: Smoke tests required before merge
- Deploy: Automated with post-deploy smoke tests
- Regressions: 0-1 per quarter (caught pre-production)

# Typical release flow:
1. PR opened
2. CI: unit + smoke tests (8 min) ✓
3. Merge to main
4. Auto-deploy to staging
5. Post-deploy smoke tests (3 min) ✓
6. UAT tests (15 min) ✓
7. Auto-promote to production
8. Production smoke tests ✓
9. Ship with confidence

# Metrics:
- Lead time: 1-2 days
- Change failure rate: 5%
- Deployment frequency: Daily (multiple times)

Symptoms

Manual testing delays releases by days or weeks

Regressions escape to production regularly (features break unexpectedly)

UAT is ad-hoc 'click around and hope' instead of scenario coverage

Tests run only in CI, not in deployed environments (deploy surprises)

Prerequisites

A CI pipeline that runs on pull requests

At least one deployed environment (staging or pre-production)

A test framework in place (Jest, Playwright, pytest, etc.)

Implementation steps

Week 1

Define smoke test scenarios (critical user journeys, 5-10 tests, < 5 min runtime)
Add smoke tests to CI pipeline (must pass before merge)
Create UAT scenario catalog (Given/When/Then for major features)
Set up test health dashboard (pass rate, flake rate, runtime)

Week 2

Implement post-deploy smoke tests in staging (run after every deploy)
Automate UAT scenarios as Playwright/Selenium tests (target: 10-15 scenarios)
Add environment-based test config (CI vs staging vs production smoke tests)
Establish test quality gates (95% pass rate, <5% flake rate)

Week 3

Run full regression suite on staging nightly or per release candidate
Add critical-path tests as required CI checks (cannot merge if failing)
Measure and optimize test runtime (parallelize, cache, prioritize fast tests)
Run retrospective on test effectiveness (false positives, missed bugs)

Definition of Done

Smoke tests run in CI and pass before merge
Post-deploy smoke tests run in staging on every deploy
UAT scenario catalog exists with 10+ automated tests
Test health dashboard tracks pass rate, flake rate, runtime
Critical-path tests are required CI checks

Metrics

Leading Indicators

Smoke test pass rate (target: 100%)
UAT scenario coverage (# automated / total scenarios)
Test flake rate (target: <5%)
CI test runtime (target: <10 min for smoke, <30 min for full)
Post-deploy test execution frequency (every deploy)

Lagging Indicators

Change failure rate (DORA)
Lead time for changes (DORA)
Defect escape rate (production bugs)
Mean time to detect (MTTD)
Release frequency (confidence from automation)

Failure modes

Smoke tests cover only happy paths, miss error handling and edge cases

Tests are flaky (pass/fail inconsistently) and teams ignore failures

UAT scenarios exist but are never automated (manual testing persists)

Tests run only in CI, not in deployed environments (deploy surprises)

Test suite grows too large and slow, developers bypass or disable tests

Ownership

Engineering Teams

Write and maintain smoke and UAT tests
Fix flaky tests immediately (prioritize test health)
Keep test runtime under budget (optimize slow tests)

Platform/DevOps

Provide test infrastructure and environment configs
Maintain test health dashboard and alerting
Enforce test quality gates in CI/CD

Product/QA

Define UAT scenario catalog with acceptance criteria
Validate test coverage against user journeys
Prioritize critical-path tests for CI gates

What good looks like (by org scale)

Small Teams

5-10 smoke tests in CI (< 5 min runtime)
Post-deploy smoke tests in staging
UAT scenario catalog documented (not all automated)

Medium Orgs

Smoke + UAT + regression test suite (10 + 15 + 50 tests)
Environment-specific test configs (CI, staging, production)
Test health dashboard with pass rate and flake tracking
Critical-path tests required for merge

Enterprise

Full test pyramid (unit, integration, E2E, performance, security)
Production smoke tests running continuously (synthetic monitoring)
Automated test generation from UAT scenarios (AI-assisted)
Test coverage mapped to business-critical user journeys

References

Testing in Production (Charity Majors)

Playwright Best Practices

Google Testing Blog

Test Pyramid (Martin Fowler)

Resources

Templates and related materials for this kit.

Templates

Copy/paste artifacts that support this kit.

No templates are linked to this kit yet.

Related capabilities

Capabilities tracked under this epic in the roadmap.

Contract Testing Between Services
>= 70% of service-to-service integrations use contract tests (Pact) to prevent breaking changes.
Performance Testing in CI
>= 60% of critical APIs have automated performance tests with latency/throughput gates in CI pipeline.
Dynamic Application Security Testing
>= 70% of web apps scanned with DAST (OWASP ZAP, Burp) in staging environment weekly with findings tracked.
Mutation Testing for Critical Code
>= 50% of critical business logic code (payment, auth, data processing) uses mutation testing to validate test quality.
Visual Regression Testing
>= 60% of user-facing pages have automated visual regression tests (Percy, Chromatic) catching UI issues.

Related kits

Other kits in the same milestone or with similar DORA impact.

Secure & Performant Build Pipelines

Acceleration

Secure Code & Advanced Review

Acceleration

CFR

Advanced Release Coordination

Acceleration

Continuous Planning & Compliance Integration

Acceleration

Before to After Transformation

× BEFOREManual testing bottleneck with frequent regressions

Tests run only locally or in CI, no post-deploy validation, UAT is manual

# Before state:
- Tests: Unit tests only, no integration or E2E
- CI: 5-min unit tests, no smoke tests
- Deploy: Manual QA testing (2-3 days)
- Regressions: 2-3 per release (caught in production)

# Typical release flow:
1. Code merged to main
2. CI runs unit tests ✓
3. Deploy to staging (manual)
4. QA team manually tests (2-3 days)
5. Bugs found, devs fix, repeat
6. Deploy to production (Friday night)
7. Weekend hotfix for regression

# Metrics:
- Lead time: 7-10 days
- Change failure rate: 25%
- Deployment frequency: Weekly (Fridays only)

AFTERAutomated quality gates with confidence

Smoke tests in CI, post-deploy tests in staging, UAT automated, regressions prevented

# After state:
- Tests: Unit (5 min) + Smoke (3 min) + UAT (15 min) + Regression (30 min)
- CI: Smoke tests required before merge
- Deploy: Automated with post-deploy smoke tests
- Regressions: 0-1 per quarter (caught pre-production)

# Typical release flow:
1. PR opened
2. CI: unit + smoke tests (8 min) ✓
3. Merge to main
4. Auto-deploy to staging
5. Post-deploy smoke tests (3 min) ✓
6. UAT tests (15 min) ✓
7. Auto-promote to production
8. Production smoke tests ✓
9. Ship with confidence

# Metrics:
- Lead time: 1-2 days
- Change failure rate: 5%
- Deployment frequency: Daily (multiple times)

Implementation steps

Week 1

Define smoke test scenarios (critical user journeys, 5-10 tests, < 5 min runtime)
Add smoke tests to CI pipeline (must pass before merge)
Create UAT scenario catalog (Given/When/Then for major features)
Set up test health dashboard (pass rate, flake rate, runtime)

Week 2

Implement post-deploy smoke tests in staging (run after every deploy)
Automate UAT scenarios as Playwright/Selenium tests (target: 10-15 scenarios)
Add environment-based test config (CI vs staging vs production smoke tests)
Establish test quality gates (95% pass rate, <5% flake rate)

Week 3

Run full regression suite on staging nightly or per release candidate
Add critical-path tests as required CI checks (cannot merge if failing)
Measure and optimize test runtime (parallelize, cache, prioritize fast tests)
Run retrospective on test effectiveness (false positives, missed bugs)

Metrics

Leading Indicators

Smoke test pass rate (target: 100%)
UAT scenario coverage (# automated / total scenarios)
Test flake rate (target: <5%)
CI test runtime (target: <10 min for smoke, <30 min for full)
Post-deploy test execution frequency (every deploy)

Lagging Indicators

Change failure rate (DORA)
Lead time for changes (DORA)
Defect escape rate (production bugs)
Mean time to detect (MTTD)
Release frequency (confidence from automation)

Failure modes

Smoke tests cover only happy paths, miss error handling and edge cases

Tests are flaky (pass/fail inconsistently) and teams ignore failures

UAT scenarios exist but are never automated (manual testing persists)

Tests run only in CI, not in deployed environments (deploy surprises)

Test suite grows too large and slow, developers bypass or disable tests

Ownership

Engineering Teams

Write and maintain smoke and UAT tests
Fix flaky tests immediately (prioritize test health)
Keep test runtime under budget (optimize slow tests)

Platform/DevOps

Provide test infrastructure and environment configs
Maintain test health dashboard and alerting
Enforce test quality gates in CI/CD

Product/QA

Define UAT scenario catalog with acceptance criteria
Validate test coverage against user journeys
Prioritize critical-path tests for CI gates