DevOps Symptoms - Find Your Starting Point

Start from a symptom

If you do not know where to start, pick what hurts today and get recommended Execution Kits and Templates matched to your symptom.

Search symptoms

Search by symptom title, signals, or DORA metric (LT, DF, MTTR, CFR).

56 symptoms found

"It works on my machine" syndrome
works-on-my-machine

Environment inconsistencies cause bugs that only appear in production.

DORA impact

CFR

MTTR

Signals

Bugs that can't be reproduced locally
Manual environment setup taking hours/days
Production configurations differ from development

Recommended kits

CI/CD & Build Automation - Standardize build and deploy environments.

Infrastructure & Operations Baseline - Use containers for environment parity.

Recommended templates

Service README - Document environment setup and dependencies.

Access permissions are sprawled
access-sprawl

Too many people have too much access; no audit trail.

DORA impact

CFR

Signals

Everyone has admin/prod access
No access reviews conducted
Former employees still have access

Recommended kits

Secure Code & Advanced Review - Implement access controls and audit.

Alert fatigue
alert-fatigue

Too many alerts, most are noise, real issues get missed.

DORA impact

MTTR

Signals

Hundreds of alerts per day
On-call ignores most alerts
Critical issues missed in the noise

Recommended kits

Observability & Monitoring Foundations - Implement alert hygiene.

SLO-Driven Observability & Error Budgets - Alert on SLO burn rate, not symptoms.

API versioning nightmare
api-versioning-nightmare

Breaking changes and version conflicts plague integrations.

DORA impact

CFR

Signals

Consumers break when APIs change
Multiple API versions in production
No deprecation policy

Recommended kits

Code Quality & Review Standards - Implement API versioning strategy.

Recommended templates

Architecture Decision Record (ADR) - Document versioning decisions.

Architecture is unknown
architecture-unknown

No one has a clear picture of how systems connect.

DORA impact

MTTR

CFR

Signals

No up-to-date architecture diagrams
Surprise dependencies discovered during incidents
Cannot answer "what depends on this?"

Recommended kits

Observability & Monitoring Foundations - Build service catalog and dependencies.

Recommended templates

Architecture Decision Record (ADR) - Document architecture decisions.

Audit and compliance is painful
audit-and-compliance-pain

Evidence collection is manual and late; controls are bolted on.

DORA impact

Signals

Evidence is collected at the end of a project
Approvals are email/meetings instead of automated gates
Security/compliance work is a "separate stream"

Recommended kits

Continuous Planning & Compliance Integration - Bring compliance into refinement and DoD.

Self-Optimizing Build & Policy Governance - Fail fast on policy/security with automated governance.

Progressive Delivery & Advanced Deployment - Policy-driven approvals for high-risk changes.

Recommended templates

Definition of Done (DoD) - Make controls explicit and repeatable in delivery.

Service README - Document ownership, runbooks, and dependencies for audits.

Change aversion
change-aversion

Fear of change leads to infrequent, risky releases.

DORA impact

CFR

Signals

Quarterly or less frequent releases
"If it is not broke, do not fix it" culture
Resistance to trying new approaches

Recommended kits

Progressive Delivery & Advanced Deployment - Enable safe experimentation.

Testing Strategy & Quality Gates - Build confidence to change.

CI pipeline is too slow
slow-ci

Build and test times discourage frequent integration.

DORA impact

Signals

CI takes 30+ minutes
Developers batch changes to avoid CI waits
Feedback loop is too long

Recommended kits

CI/CD & Build Automation - Optimize CI for speed.

Testing Strategy & Quality Gates - Parallelize and optimize tests.

Cloud costs are a surprise
cloud-cost-surprise

Unexpected bills; no visibility into what's driving costs.

Signals

Bill shocks at end of month
Orphaned resources accumulating
No cost allocation by team/service

Recommended kits

Observability & Monitoring Foundations - Implement cloud cost visibility.

Compliance evidence is manual
compliance-manual

Collecting audit evidence requires manual work before reviews.

DORA impact

Signals

Screenshot evidence collection
Manual spreadsheet tracking
Last-minute scramble before audits

Recommended kits

Continuous Planning & Compliance Integration - Automate evidence collection.

Self-Optimizing Build & Policy Governance - Build compliance into CI/CD.

Database migrations are scary
database-migrations-scary

Schema changes are risky and require downtime.

DORA impact

CFR

Signals

Database changes require maintenance windows
No rollback plan for migrations
Schema changes block releases

Recommended kits

CI/CD & Build Automation - Automate and test migrations.

Dependency vulnerabilities piling up
dependency-vulnerabilities

Known vulnerabilities in dependencies are not addressed.

DORA impact

CFR

Signals

Hundreds of unfixed CVEs
No process for updating dependencies
Vulnerabilities discovered in production

Recommended kits

Secure & Performant Build Pipelines - Implement dependency management.

Secure Code & Advanced Review - Review and remediate vulnerable dependencies.

Deployment queue / bottleneck
deployment-queue

Releases wait in queue because only certain people can deploy.

DORA impact

Signals

Only a few people know how to deploy
Deployments scheduled days/weeks in advance
Code sits waiting after approval

Recommended kits

CI/CD & Build Automation - Automate deployments so anyone can trigger them.

Deployment Automation Foundations - Build unified deployment automation.

Recommended templates

Definition of Done (DoD) - Clarify what "ready to deploy" means.

Deployments feel scary
deployments-feel-scary

Releases are high-stress, manual, and hard to roll back.

DORA impact

CFR

MTTR

Signals

Deploys require a "war room" or heroics
Rollback is painful or unreliable
You avoid shipping on Fridays / before holidays

Recommended kits

Release Management Foundations - Standardize release hygiene and rollback paths.

CI/CD & Build Automation - Create a predictable build + deploy baseline.

Testing Strategy & Quality Gates - Catch regressions before they ship.

Progressive Delivery & Advanced Deployment - Reduce blast radius with safe rollout patterns.

Recommended templates

Definition of Done (DoD) - Make quality and operability explicit before merge.

Incident Runbook - Standardize how you respond when things go wrong.

Disaster recovery is untested
no-disaster-recovery

Backups exist but recovery has never been tested.

DORA impact

MTTR

Signals

No DR drills conducted
RTO/RPO not defined or not met
Backup restore never validated

Recommended kits

Resilient Operations & Chaos Engineering - Implement and test DR procedures.

Recommended templates

Incident Runbook - Document DR procedures.

Documentation is outdated
outdated-docs

Docs exist but don't reflect reality; developers don't trust them.

DORA impact

MTTR

Signals

Docs contradict current behavior
No one updates docs after changes
Developers bypass docs, ask directly

Recommended templates

Service README - Keep living documentation.

Architecture Decision Record (ADR) - Document decisions when made.

Drowning in operational toil
toil-overload

Manual, repetitive work consumes engineering time.

DORA impact

MTTR

Signals

Same manual tasks done weekly/daily
On-call is constant firefighting
No time for improvements, just keeping lights on

Recommended kits

Observability & Monitoring Foundations - Identify and automate toil.

CI/CD & Build Automation - Automate repetitive deployment tasks.

Recommended templates

Incident Runbook - Standardize incident response.

End-to-end tests are painful
e2e-test-pain

E2E tests are slow, flaky, and hard to maintain.

DORA impact

CFR

Signals

E2E suite takes hours to run
Tests break with UI changes
Nobody wants to write or fix E2E tests

Recommended kits

Testing Strategy & Quality Gates - Implement testing pyramid strategy.

Environment drift
environment-drift

Dev, staging, and prod are all different; surprises in prod.

DORA impact

CFR

MTTR

Signals

Features work in staging, break in prod
Manual configuration changes in prod
No one knows exact prod configuration

Recommended kits

CI/CD & Build Automation - Implement infrastructure as code.

Infrastructure & Operations Baseline - Ensure environment parity.

Estimates are always wrong
estimation-inaccurate

Work takes 3x longer than estimated, planning is unreliable.

DORA impact

Signals

Commitments regularly missed
Surprise complexity discovered mid-work
No historical data for estimation

Recommended kits

Backlog Quality & Planning Enablement - Use flow metrics instead of estimates.

Feature flags are out of control
feature-flags-chaos

Old flags accumulate, nobody knows what's active.

DORA impact

CFR

Signals

Hundreds of flags, most never cleaned up
No documentation on what flags do
Bugs caused by unexpected flag combinations

Recommended kits

Progressive Delivery & Advanced Deployment - Implement feature flag lifecycle management.

Recommended templates

Feature Flag Lifecycle - Define flag lifecycle and cleanup process.

Flaky tests erode trust
flaky-tests

Unreliable tests are ignored, defeating their purpose.

DORA impact

CFR

Signals

Tests fail randomly, require re-runs
Team ignores test failures ("it's just flaky")
CI builds take forever due to retries

Recommended kits

Testing Strategy & Quality Gates - Implement test reliability practices.

Flying blind in production
no-observability

Can't tell if things are working until users complain.

DORA impact

MTTR

CFR

Signals

Find out about issues from customer support
No dashboards or metrics for services
Log diving is the only way to debug

Recommended kits

Observability & Monitoring Foundations - Build observability foundation.

SLO-Driven Observability & Error Budgets - Define what "working" means with SLOs.

Recommended templates

Service README - Document how to monitor each service.

Handoff hell between teams
handoff-hell

Work passes through multiple teams, each adding delay.

DORA impact

Signals

Tickets move through 5+ queues
Each handoff adds days of wait time
Information lost between handoffs

Recommended kits

Backlog Quality & Planning Enablement - Map and optimize value stream.

CI/CD & Build Automation - Reduce dependencies with standardized paths.

Hero culture
hero-culture

Success depends on individual heroes, not sustainable systems.

DORA impact

MTTR

CFR

Signals

Same people always save the day
Hero stories celebrated, systemic fixes ignored
Burnout among top performers

Recommended kits

Observability & Monitoring Foundations - Build reliable systems, not hero dependencies.

Hotfixes bypass the process
hotfixes-bypass-process

Emergency fixes skip CI/CD, creating technical debt and risk.

DORA impact

CFR

Signals

Direct commits to main/production branches
Hotfixes not tested before deployment
Different process for emergencies vs. normal releases

Recommended kits

CI/CD & Build Automation - Fast CI makes normal process fast enough for hotfixes.

Progressive Delivery & Advanced Deployment - Enable safe fast rollouts.

Recommended templates

Incident Runbook - Define a safe hotfix process.

Incidents keep happening
incidents-keep-happening

Reliability issues recur; the same failure modes repeat.

DORA impact

MTTR

CFR

Signals

Paging/alerts are noisy or low-signal
Runbooks are missing or outdated
Post-incident actions don't get executed

Recommended kits

Observability & Monitoring Foundations - Baseline observability, runbooks, and ownership.

Resilient Operations & Chaos Engineering - Mature incident response + DR drills and automation.

SLO-Driven Observability & Error Budgets - Add SLOs, error budgets, and DORA dashboards.

Recommended templates

Incident Runbook - Start with a consistent incident process.

Service README - Make ownership, deploy, and run links obvious.

Lead time is too long
lead-time-too-long

Work gets stuck in queues; integration and testing happen late.

DORA impact

Signals

PRs sit unreviewed or are massive
Integration happens at the end (big-bang merges)
Testing is mostly manual or happens after "done"

Recommended kits

Backlog Quality & Planning Enablement - Clarify priorities, acceptance criteria, and flow.

CI/CD & Build Automation - Automate the path from commit to deploy.

Testing Strategy & Quality Gates - Shift test left with fast automated feedback.

Recommended templates

Definition of Done (DoD) - Align on what "done" means across roles.

Service README - Reduce tribal knowledge that slows delivery.

Legacy system nobody understands
legacy-system-fear

Critical system with no documentation or original authors.

DORA impact

CFR

MTTR

Signals

Original developers left the company
No documentation or tests
Fear of touching the code

Recommended kits

Testing Strategy & Quality Gates - Add characterization tests.

Recommended templates

Architecture Decision Record (ADR) - Document discoveries and decisions.

Service README - Capture operational knowledge.

Low bus factor
bus-factor

If one person leaves, critical knowledge is lost.

DORA impact

MTTR

Signals

Single person who knows the system
No knowledge sharing practices
Key person risk identified

Recommended templates

Service README - Document critical knowledge.

Architecture Decision Record (ADR) - Capture decision context.

Metrics collected but not used
metrics-not-used

Dashboards exist but don't drive decisions.

Signals

Dashboards nobody looks at
Data not discussed in planning
Decisions made on gut feel, not data

Recommended kits

Observability & Monitoring Foundations - Make metrics actionable.

SLO-Driven Observability & Error Budgets - Use SLOs to drive decisions.

Microservices chaos
microservices-chaos

Too many services; hard to debug across boundaries.

DORA impact

MTTR

CFR

Signals

Requests fail and no one knows why
Tracing across services is impossible
Each service has different patterns

Recommended kits

Observability & Monitoring Foundations - Implement distributed tracing.

Code Quality & Review Standards - Standardize service patterns.

Recommended templates

Service README - Document service interactions.

Monolith is slowing us down
monolith-pain

Large codebase makes changes risky and slow.

DORA impact

CFR

Signals

Small changes require full regression testing
Deployments affect unrelated features
Multiple teams blocked by same codebase

Recommended kits

Code Quality & Review Standards - Improve code structure and boundaries.

Testing Strategy & Quality Gates - Build confidence for incremental changes.

Recommended templates

Architecture Decision Record (ADR) - Document decomposition decisions.

No confidence in changes
no-test-coverage

Low test coverage means changes feel risky.

DORA impact

CFR

Signals

Test coverage below 50%
No tests for critical paths
Afraid to refactor due to breakage risk

Recommended kits

Testing Strategy & Quality Gates - Build test coverage systematically.

Recommended templates

Definition of Done (DoD) - Require tests for new code.

No continuous improvement
no-retrospectives

Same problems persist; no learning from mistakes.

DORA impact

CFR

MTTR

Signals

No retrospectives or postmortems
Improvement items never actioned
Repeating the same mistakes

Recommended kits

Backlog Quality & Planning Enablement - Establish improvement cadence.

Resilient Operations & Chaos Engineering - Implement blameless postmortems.

Recommended templates

Incident Runbook - Include postmortem process.

No SLOs defined
no-slos

No agreed-upon reliability targets; everything is equally important.

DORA impact

CFR

MTTR

Signals

No error budgets
All incidents treated with same urgency
No reliability vs. velocity trade-off discussions

Recommended kits

SLO-Driven Observability & Error Budgets - Define and implement SLOs.

Recommended templates

SLO / SLI Template - Document SLO targets.

On-call burnout
on-call-burnout

On-call rotation is exhausting and unsustainable.

DORA impact

MTTR

Signals

Frequent pages during off-hours
Same people always on-call
On-call dreaded by the team

Recommended kits

Observability & Monitoring Foundations - Reduce toil and improve reliability.

SLO-Driven Observability & Error Budgets - Use error budgets to prioritize reliability work.

Onboarding takes too long
onboarding-slow

New team members need months to become productive.

DORA impact

Signals

No onboarding documentation
Setup takes days of tribal knowledge
New hires shadow for weeks

Recommended templates

Service README - Document setup and context.

Post-mortems produce no action
post-mortems-ignored

Incident reviews happen but follow-up items are never completed.

DORA impact

CFR

MTTR

Signals

Same root causes appear repeatedly
Action items from reviews never done
Blameful culture around incidents

Recommended kits

Resilient Operations & Chaos Engineering - Implement blameless postmortem process.

Recommended templates

Incident Runbook - Include postmortem template.

PR reviews are a bottleneck
pr-review-bottleneck

Pull requests sit unreviewed for days or get rubber-stamped.

DORA impact

Signals

PRs waiting 2+ days for review
Reviews are perfunctory ("LGTM")
Large PRs that are hard to review

Recommended kits

Code Quality & Review Standards - Establish PR size and review guidelines.

Recommended templates

Definition of Done (DoD) - Define review expectations.

Regressions escape to production
regression-escapes

Old bugs keep coming back; fixes break other things.

DORA impact

CFR

Signals

Same bugs fixed multiple times
Fixes introduce new bugs
No regression test suite

Recommended kits

Testing Strategy & Quality Gates - Build regression test automation.

Release train chaos
release-train-chaos

Coordinating releases across teams is painful and error-prone.

DORA impact

Signals

Multiple teams must coordinate deployments
One team blocks another's release
Release calendar is overbooked

Recommended kits

Release Management Foundations - Enable independent deployments.

Progressive Delivery & Advanced Deployment - Decouple release from deployment.

Recommended templates

Release Checklist - Standardize release coordination.

Rollback has never been tested
rollback-untested

Rollback procedures exist but are never practiced.

DORA impact

MTTR

CFR

Signals

No rollback drills conducted
Rollback scripts are outdated
Database rollback is manual and scary

Recommended kits

Release Management Foundations - Implement and practice rollback procedures.

Resilient Operations & Chaos Engineering - Include rollback in DR drills.

Recommended templates

Incident Runbook - Document rollback procedures.

Scaling is manual
scaling-is-manual

Capacity planning and scaling require human intervention.

DORA impact

MTTR

Signals

Performance degrades during traffic spikes
Manual server provisioning
Over-provisioning to avoid scaling issues

Recommended kits

Infrastructure & Operations Baseline - Implement auto-scaling.

Secrets in code
secrets-in-code

Credentials, API keys, or tokens found in repositories.

DORA impact

CFR

Signals

Secrets committed to git history
Shared credentials in config files
No secret rotation process

Recommended kits

Secure Code & Advanced Review - Implement proper secrets management.

Secure & Performant Build Pipelines - Scan for secrets in CI.

Security is an afterthought
security-as-afterthought

Security reviews happen late, blocking releases.

DORA impact

CFR

Signals

Security review at end of project
Vulnerabilities found after release
Security team is a bottleneck

Recommended kits

Self-Optimizing Build & Policy Governance - Enforce security policy automatically as the pipeline matures.

Secure Code & Advanced Review - Secure secrets from the start.

Secure & Performant Build Pipelines - Catch vulnerabilities early.

Recommended templates

Definition of Done (DoD) - Include security checks in DoD.

Silos between teams
silos-between-teams

Dev, Ops, Security don't collaborate; us vs. them mentality.

DORA impact

MTTR

Signals

"Throwing over the wall" behavior
Blame games when things go wrong
Teams optimize locally, not globally

Recommended kits

Backlog Quality & Planning Enablement - Establish cross-functional collaboration.

Snowflake servers
snowflake-servers

Each server is uniquely configured; replacement is risky.

DORA impact

MTTR

CFR

Signals

Manual SSH changes to production
Documentation doesn't match reality
Fear of replacing servers

Recommended kits

CI/CD & Build Automation - Implement infrastructure as code.

Infrastructure & Operations Baseline - Standardize with containers.

Tech debt is crushing velocity
tech-debt-crushing

Accumulated shortcuts making every change harder.

DORA impact

CFR

Signals

Simple changes take weeks
"Do not touch that code" areas
Workarounds on top of workarounds

Recommended kits

Code Quality & Review Standards - Implement tech debt management.

Testing Strategy & Quality Gates - Build safety net for refactoring.

Recommended templates

Architecture Decision Record (ADR) - Document debt and remediation decisions.

Testing is a bottleneck
testing-bottleneck

Manual testing or QA handoffs slow down delivery.

DORA impact

CFR

Signals

Dedicated QA phase adds days/weeks
Most testing is manual
Bugs found late in the cycle

Recommended kits

Testing Strategy & Quality Gates - Automate testing and shift left.

CI/CD & Build Automation - Integrate tests into CI pipeline.

Recommended templates

Definition of Done (DoD) - Include test coverage in DoD.

Too many meetings
meeting-overload

Engineers spend more time in meetings than coding.

DORA impact

Signals

Calendar full of syncs and standups
Fragmented focus time
Important decisions made outside meetings anyway

Recommended kits

Backlog Quality & Planning Enablement - Optimize collaboration patterns.

Too much work in progress
work-in-progress-overload

Teams juggle many items, nothing gets finished.

DORA impact

Signals

Constant context switching
Items started but not completed for weeks
Everyone is "busy" but little ships

Recommended kits

Backlog Quality & Planning Enablement - Implement WIP limits and flow metrics.

Tool sprawl / integration chaos
too-many-tools

Too many disconnected tools creating friction.

DORA impact

Signals

Context switching between 10+ tools daily
Data lives in silos, manual copying between systems
No single source of truth for project status

Recommended kits

CI/CD & Build Automation - Consolidate around golden paths.

Tribal knowledge dependency
tribal-knowledge

Critical knowledge exists only in people's heads, not documented.

DORA impact

MTTR

Signals

Cannot deploy if "the person" is on vacation
New team members take months to ramp up
Same questions asked repeatedly

Recommended kits

Observability & Monitoring Foundations - Document runbooks and operational knowledge.

Recommended templates

Service README - Capture essential service knowledge.

Incident Runbook - Document how to respond to issues.

Architecture Decision Record (ADR) - Record architectural decisions and context.

Unclear priorities
unclear-priorities

Everything is "high priority," so nothing is.

DORA impact

Signals

Frequent priority changes mid-sprint
Multiple stakeholders with conflicting demands
No clear way to say "no" to requests

Recommended kits

Backlog Quality & Planning Enablement - Establish prioritization framework.

Unclear service ownership
unclear-ownership

Nobody knows who owns what, causing delays and finger-pointing.

DORA impact

MTTR

Signals

"Who owns this?" asked frequently
Incidents get bounced between teams
Services have no clear maintainer

Recommended kits

Observability & Monitoring Foundations - Establish ownership and accountability.

Recommended templates

Service README - Document ownership clearly.

Tip: Run the assessment first if you want a prioritized plan; use this page when you need a quick starting point.

Start from a symptom

If you do not know where to start, pick what hurts today and get recommended Execution Kits and Templates matched to your symptom.

Search symptoms

Search by symptom title, signals, or DORA metric (LT, DF, MTTR, CFR).

56 symptoms found

"It works on my machine" syndrome
works-on-my-machine

Environment inconsistencies cause bugs that only appear in production.

DORA impact

CFR

MTTR

Signals

Bugs that can't be reproduced locally
Manual environment setup taking hours/days
Production configurations differ from development

Recommended kits

CI/CD & Build Automation - Standardize build and deploy environments.

Infrastructure & Operations Baseline - Use containers for environment parity.

Recommended templates

Service README - Document environment setup and dependencies.

Access permissions are sprawled
access-sprawl

Too many people have too much access; no audit trail.

DORA impact

CFR

Signals

Everyone has admin/prod access
No access reviews conducted
Former employees still have access

Recommended kits

Secure Code & Advanced Review - Implement access controls and audit.

Alert fatigue
alert-fatigue

Too many alerts, most are noise, real issues get missed.

DORA impact

MTTR

Signals

Hundreds of alerts per day
On-call ignores most alerts
Critical issues missed in the noise

Recommended kits

Observability & Monitoring Foundations - Implement alert hygiene.

SLO-Driven Observability & Error Budgets - Alert on SLO burn rate, not symptoms.

API versioning nightmare
api-versioning-nightmare

Breaking changes and version conflicts plague integrations.

DORA impact

CFR

Signals

Consumers break when APIs change
Multiple API versions in production
No deprecation policy

Recommended kits

Code Quality & Review Standards - Implement API versioning strategy.

Recommended templates

Architecture Decision Record (ADR) - Document versioning decisions.

Architecture is unknown
architecture-unknown

No one has a clear picture of how systems connect.

DORA impact

MTTR

CFR

Signals

No up-to-date architecture diagrams
Surprise dependencies discovered during incidents
Cannot answer "what depends on this?"

Recommended kits

Observability & Monitoring Foundations - Build service catalog and dependencies.

Recommended templates

Architecture Decision Record (ADR) - Document architecture decisions.

Audit and compliance is painful
audit-and-compliance-pain

Evidence collection is manual and late; controls are bolted on.

DORA impact

Signals

Evidence is collected at the end of a project
Approvals are email/meetings instead of automated gates
Security/compliance work is a "separate stream"

Recommended kits

Continuous Planning & Compliance Integration - Bring compliance into refinement and DoD.

Self-Optimizing Build & Policy Governance - Fail fast on policy/security with automated governance.

Progressive Delivery & Advanced Deployment - Policy-driven approvals for high-risk changes.

Recommended templates

Definition of Done (DoD) - Make controls explicit and repeatable in delivery.

Service README - Document ownership, runbooks, and dependencies for audits.

Change aversion
change-aversion

Fear of change leads to infrequent, risky releases.

DORA impact

CFR

Signals

Quarterly or less frequent releases
"If it is not broke, do not fix it" culture
Resistance to trying new approaches

Recommended kits

Progressive Delivery & Advanced Deployment - Enable safe experimentation.

Testing Strategy & Quality Gates - Build confidence to change.

CI pipeline is too slow
slow-ci

Build and test times discourage frequent integration.

DORA impact

Signals

CI takes 30+ minutes
Developers batch changes to avoid CI waits
Feedback loop is too long

Recommended kits

CI/CD & Build Automation - Optimize CI for speed.

Testing Strategy & Quality Gates - Parallelize and optimize tests.

Cloud costs are a surprise
cloud-cost-surprise

Unexpected bills; no visibility into what's driving costs.

Signals

Bill shocks at end of month
Orphaned resources accumulating
No cost allocation by team/service

Recommended kits

Observability & Monitoring Foundations - Implement cloud cost visibility.

Compliance evidence is manual
compliance-manual

Collecting audit evidence requires manual work before reviews.

DORA impact

Signals

Screenshot evidence collection
Manual spreadsheet tracking
Last-minute scramble before audits

Recommended kits

Continuous Planning & Compliance Integration - Automate evidence collection.

Self-Optimizing Build & Policy Governance - Build compliance into CI/CD.

Database migrations are scary
database-migrations-scary

Schema changes are risky and require downtime.

DORA impact

CFR

Signals

Database changes require maintenance windows
No rollback plan for migrations
Schema changes block releases

Recommended kits

CI/CD & Build Automation - Automate and test migrations.

Dependency vulnerabilities piling up
dependency-vulnerabilities

Known vulnerabilities in dependencies are not addressed.

DORA impact

CFR

Signals

Hundreds of unfixed CVEs
No process for updating dependencies
Vulnerabilities discovered in production

Recommended kits

Secure & Performant Build Pipelines - Implement dependency management.

Secure Code & Advanced Review - Review and remediate vulnerable dependencies.

Deployment queue / bottleneck
deployment-queue

Releases wait in queue because only certain people can deploy.

DORA impact

Signals

Only a few people know how to deploy
Deployments scheduled days/weeks in advance
Code sits waiting after approval

Recommended kits

CI/CD & Build Automation - Automate deployments so anyone can trigger them.

Deployment Automation Foundations - Build unified deployment automation.

Recommended templates

Definition of Done (DoD) - Clarify what "ready to deploy" means.

Deployments feel scary
deployments-feel-scary

Releases are high-stress, manual, and hard to roll back.

DORA impact

CFR

MTTR

Signals

Deploys require a "war room" or heroics
Rollback is painful or unreliable
You avoid shipping on Fridays / before holidays

Recommended kits

Release Management Foundations - Standardize release hygiene and rollback paths.

CI/CD & Build Automation - Create a predictable build + deploy baseline.

Testing Strategy & Quality Gates - Catch regressions before they ship.

Progressive Delivery & Advanced Deployment - Reduce blast radius with safe rollout patterns.

Recommended templates

Definition of Done (DoD) - Make quality and operability explicit before merge.

Incident Runbook - Standardize how you respond when things go wrong.

Disaster recovery is untested
no-disaster-recovery

Backups exist but recovery has never been tested.

DORA impact

MTTR

Signals

No DR drills conducted
RTO/RPO not defined or not met
Backup restore never validated

Recommended kits

Resilient Operations & Chaos Engineering - Implement and test DR procedures.

Recommended templates

Incident Runbook - Document DR procedures.

Documentation is outdated
outdated-docs

Docs exist but don't reflect reality; developers don't trust them.

DORA impact

MTTR

Signals

Docs contradict current behavior
No one updates docs after changes
Developers bypass docs, ask directly

Recommended templates

Service README - Keep living documentation.

Architecture Decision Record (ADR) - Document decisions when made.

Drowning in operational toil
toil-overload

Manual, repetitive work consumes engineering time.

DORA impact

MTTR

Signals

Same manual tasks done weekly/daily
On-call is constant firefighting
No time for improvements, just keeping lights on

Recommended kits

Observability & Monitoring Foundations - Identify and automate toil.

CI/CD & Build Automation - Automate repetitive deployment tasks.

Recommended templates

Incident Runbook - Standardize incident response.

End-to-end tests are painful
e2e-test-pain

E2E tests are slow, flaky, and hard to maintain.

DORA impact

CFR

Signals

E2E suite takes hours to run
Tests break with UI changes
Nobody wants to write or fix E2E tests

Recommended kits

Testing Strategy & Quality Gates - Implement testing pyramid strategy.

Environment drift
environment-drift

Dev, staging, and prod are all different; surprises in prod.

DORA impact

CFR

MTTR

Signals

Features work in staging, break in prod
Manual configuration changes in prod
No one knows exact prod configuration

Recommended kits

CI/CD & Build Automation - Implement infrastructure as code.

Infrastructure & Operations Baseline - Ensure environment parity.

Estimates are always wrong
estimation-inaccurate

Work takes 3x longer than estimated, planning is unreliable.

DORA impact

Signals

Commitments regularly missed
Surprise complexity discovered mid-work
No historical data for estimation

Recommended kits

Backlog Quality & Planning Enablement - Use flow metrics instead of estimates.

Feature flags are out of control
feature-flags-chaos

Old flags accumulate, nobody knows what's active.

DORA impact

CFR

Signals

Hundreds of flags, most never cleaned up
No documentation on what flags do
Bugs caused by unexpected flag combinations

Recommended kits

Progressive Delivery & Advanced Deployment - Implement feature flag lifecycle management.

Recommended templates

Feature Flag Lifecycle - Define flag lifecycle and cleanup process.

Flaky tests erode trust
flaky-tests

Unreliable tests are ignored, defeating their purpose.

DORA impact

CFR

Signals

Tests fail randomly, require re-runs
Team ignores test failures ("it's just flaky")
CI builds take forever due to retries

Recommended kits

Testing Strategy & Quality Gates - Implement test reliability practices.

Flying blind in production
no-observability

Can't tell if things are working until users complain.

DORA impact

MTTR

CFR

Signals

Find out about issues from customer support
No dashboards or metrics for services
Log diving is the only way to debug

Recommended kits

Observability & Monitoring Foundations - Build observability foundation.

SLO-Driven Observability & Error Budgets - Define what "working" means with SLOs.

Recommended templates

Service README - Document how to monitor each service.

Handoff hell between teams
handoff-hell

Work passes through multiple teams, each adding delay.

DORA impact

Signals

Tickets move through 5+ queues
Each handoff adds days of wait time
Information lost between handoffs

Recommended kits

Backlog Quality & Planning Enablement - Map and optimize value stream.

CI/CD & Build Automation - Reduce dependencies with standardized paths.

Hero culture
hero-culture

Success depends on individual heroes, not sustainable systems.

DORA impact

MTTR

CFR

Signals

Same people always save the day
Hero stories celebrated, systemic fixes ignored
Burnout among top performers

Recommended kits

Observability & Monitoring Foundations - Build reliable systems, not hero dependencies.

Hotfixes bypass the process
hotfixes-bypass-process

Emergency fixes skip CI/CD, creating technical debt and risk.

DORA impact

CFR

Signals

Direct commits to main/production branches
Hotfixes not tested before deployment
Different process for emergencies vs. normal releases

Recommended kits

CI/CD & Build Automation - Fast CI makes normal process fast enough for hotfixes.

Progressive Delivery & Advanced Deployment - Enable safe fast rollouts.

Recommended templates

Incident Runbook - Define a safe hotfix process.

Incidents keep happening
incidents-keep-happening

Reliability issues recur; the same failure modes repeat.

DORA impact

MTTR

CFR

Signals

Paging/alerts are noisy or low-signal
Runbooks are missing or outdated
Post-incident actions don't get executed

Recommended kits

Observability & Monitoring Foundations - Baseline observability, runbooks, and ownership.

Resilient Operations & Chaos Engineering - Mature incident response + DR drills and automation.

SLO-Driven Observability & Error Budgets - Add SLOs, error budgets, and DORA dashboards.

Recommended templates

Incident Runbook - Start with a consistent incident process.

Service README - Make ownership, deploy, and run links obvious.

Lead time is too long
lead-time-too-long

Work gets stuck in queues; integration and testing happen late.

DORA impact

Signals

PRs sit unreviewed or are massive
Integration happens at the end (big-bang merges)
Testing is mostly manual or happens after "done"

Recommended kits

Backlog Quality & Planning Enablement - Clarify priorities, acceptance criteria, and flow.

CI/CD & Build Automation - Automate the path from commit to deploy.

Testing Strategy & Quality Gates - Shift test left with fast automated feedback.

Recommended templates

Definition of Done (DoD) - Align on what "done" means across roles.

Service README - Reduce tribal knowledge that slows delivery.

Legacy system nobody understands
legacy-system-fear

Critical system with no documentation or original authors.

DORA impact

CFR

MTTR

Signals

Original developers left the company
No documentation or tests
Fear of touching the code

Recommended kits

Testing Strategy & Quality Gates - Add characterization tests.

Recommended templates

Architecture Decision Record (ADR) - Document discoveries and decisions.

Service README - Capture operational knowledge.

Low bus factor
bus-factor

If one person leaves, critical knowledge is lost.

DORA impact

MTTR

Signals

Single person who knows the system
No knowledge sharing practices
Key person risk identified

Recommended templates

Service README - Document critical knowledge.

Architecture Decision Record (ADR) - Capture decision context.

Metrics collected but not used
metrics-not-used

Dashboards exist but don't drive decisions.

Signals

Dashboards nobody looks at
Data not discussed in planning
Decisions made on gut feel, not data

Recommended kits

Observability & Monitoring Foundations - Make metrics actionable.

SLO-Driven Observability & Error Budgets - Use SLOs to drive decisions.

Microservices chaos
microservices-chaos

Too many services; hard to debug across boundaries.

DORA impact

MTTR

CFR

Signals

Requests fail and no one knows why
Tracing across services is impossible
Each service has different patterns

Recommended kits

Observability & Monitoring Foundations - Implement distributed tracing.

Code Quality & Review Standards - Standardize service patterns.

Recommended templates

Service README - Document service interactions.

Monolith is slowing us down
monolith-pain

Large codebase makes changes risky and slow.

DORA impact

CFR

Signals

Small changes require full regression testing
Deployments affect unrelated features
Multiple teams blocked by same codebase

Recommended kits

Code Quality & Review Standards - Improve code structure and boundaries.

Testing Strategy & Quality Gates - Build confidence for incremental changes.

Recommended templates

Architecture Decision Record (ADR) - Document decomposition decisions.

No confidence in changes
no-test-coverage

Low test coverage means changes feel risky.

DORA impact

CFR

Signals

Test coverage below 50%
No tests for critical paths
Afraid to refactor due to breakage risk

Recommended kits

Testing Strategy & Quality Gates - Build test coverage systematically.

Recommended templates

Definition of Done (DoD) - Require tests for new code.

No continuous improvement
no-retrospectives

Same problems persist; no learning from mistakes.

DORA impact

CFR

MTTR

Signals

No retrospectives or postmortems
Improvement items never actioned
Repeating the same mistakes

Recommended kits

Backlog Quality & Planning Enablement - Establish improvement cadence.

Resilient Operations & Chaos Engineering - Implement blameless postmortems.

Recommended templates

Incident Runbook - Include postmortem process.

No SLOs defined
no-slos

No agreed-upon reliability targets; everything is equally important.

DORA impact

CFR

MTTR

Signals

No error budgets
All incidents treated with same urgency
No reliability vs. velocity trade-off discussions

Recommended kits

SLO-Driven Observability & Error Budgets - Define and implement SLOs.

Recommended templates

SLO / SLI Template - Document SLO targets.

On-call burnout
on-call-burnout

On-call rotation is exhausting and unsustainable.

DORA impact

MTTR

Signals

Frequent pages during off-hours
Same people always on-call
On-call dreaded by the team

Recommended kits

Observability & Monitoring Foundations - Reduce toil and improve reliability.

SLO-Driven Observability & Error Budgets - Use error budgets to prioritize reliability work.

Onboarding takes too long
onboarding-slow

New team members need months to become productive.

DORA impact

Signals

No onboarding documentation
Setup takes days of tribal knowledge
New hires shadow for weeks

Recommended templates

Service README - Document setup and context.

Post-mortems produce no action
post-mortems-ignored

Incident reviews happen but follow-up items are never completed.

DORA impact

CFR

MTTR

Signals

Same root causes appear repeatedly
Action items from reviews never done
Blameful culture around incidents

Recommended kits

Resilient Operations & Chaos Engineering - Implement blameless postmortem process.

Recommended templates

Incident Runbook - Include postmortem template.

PR reviews are a bottleneck
pr-review-bottleneck

Pull requests sit unreviewed for days or get rubber-stamped.

DORA impact

Signals

PRs waiting 2+ days for review
Reviews are perfunctory ("LGTM")
Large PRs that are hard to review

Recommended kits

Code Quality & Review Standards - Establish PR size and review guidelines.

Recommended templates

Definition of Done (DoD) - Define review expectations.

Regressions escape to production
regression-escapes

Old bugs keep coming back; fixes break other things.

DORA impact

CFR

Signals

Same bugs fixed multiple times
Fixes introduce new bugs
No regression test suite

Recommended kits

Testing Strategy & Quality Gates - Build regression test automation.

Release train chaos
release-train-chaos

Coordinating releases across teams is painful and error-prone.

DORA impact

Signals

Multiple teams must coordinate deployments
One team blocks another's release
Release calendar is overbooked

Recommended kits

Release Management Foundations - Enable independent deployments.

Progressive Delivery & Advanced Deployment - Decouple release from deployment.

Recommended templates

Release Checklist - Standardize release coordination.

Rollback has never been tested
rollback-untested

Rollback procedures exist but are never practiced.

DORA impact

MTTR

CFR

Signals

No rollback drills conducted
Rollback scripts are outdated
Database rollback is manual and scary

Recommended kits

Release Management Foundations - Implement and practice rollback procedures.

Resilient Operations & Chaos Engineering - Include rollback in DR drills.

Recommended templates

Incident Runbook - Document rollback procedures.

Scaling is manual
scaling-is-manual

Capacity planning and scaling require human intervention.

DORA impact

MTTR

Signals

Performance degrades during traffic spikes
Manual server provisioning
Over-provisioning to avoid scaling issues

Recommended kits

Infrastructure & Operations Baseline - Implement auto-scaling.

Secrets in code
secrets-in-code

Credentials, API keys, or tokens found in repositories.

DORA impact

CFR

Signals

Secrets committed to git history
Shared credentials in config files
No secret rotation process

Recommended kits

Secure Code & Advanced Review - Implement proper secrets management.

Secure & Performant Build Pipelines - Scan for secrets in CI.

Security is an afterthought
security-as-afterthought

Security reviews happen late, blocking releases.

DORA impact

CFR

Signals

Security review at end of project
Vulnerabilities found after release
Security team is a bottleneck

Recommended kits

Self-Optimizing Build & Policy Governance - Enforce security policy automatically as the pipeline matures.

Secure Code & Advanced Review - Secure secrets from the start.

Secure & Performant Build Pipelines - Catch vulnerabilities early.

Recommended templates

Definition of Done (DoD) - Include security checks in DoD.

Silos between teams
silos-between-teams

Dev, Ops, Security don't collaborate; us vs. them mentality.

DORA impact

MTTR

Signals

"Throwing over the wall" behavior
Blame games when things go wrong
Teams optimize locally, not globally

Recommended kits

Backlog Quality & Planning Enablement - Establish cross-functional collaboration.

Snowflake servers
snowflake-servers

Each server is uniquely configured; replacement is risky.

DORA impact

MTTR

CFR

Signals

Manual SSH changes to production
Documentation doesn't match reality
Fear of replacing servers

Recommended kits

CI/CD & Build Automation - Implement infrastructure as code.

Infrastructure & Operations Baseline - Standardize with containers.

Tech debt is crushing velocity
tech-debt-crushing

Accumulated shortcuts making every change harder.

DORA impact

CFR

Signals

Simple changes take weeks
"Do not touch that code" areas
Workarounds on top of workarounds

Recommended kits

Code Quality & Review Standards - Implement tech debt management.

Testing Strategy & Quality Gates - Build safety net for refactoring.

Recommended templates

Architecture Decision Record (ADR) - Document debt and remediation decisions.

Testing is a bottleneck
testing-bottleneck

Manual testing or QA handoffs slow down delivery.

DORA impact

CFR

Signals

Dedicated QA phase adds days/weeks
Most testing is manual
Bugs found late in the cycle

Recommended kits

Testing Strategy & Quality Gates - Automate testing and shift left.

CI/CD & Build Automation - Integrate tests into CI pipeline.

Recommended templates

Definition of Done (DoD) - Include test coverage in DoD.

Too many meetings
meeting-overload

Engineers spend more time in meetings than coding.

DORA impact

Signals

Calendar full of syncs and standups
Fragmented focus time
Important decisions made outside meetings anyway

Recommended kits

Backlog Quality & Planning Enablement - Optimize collaboration patterns.

Too much work in progress
work-in-progress-overload

Teams juggle many items, nothing gets finished.

DORA impact

Signals

Constant context switching
Items started but not completed for weeks
Everyone is "busy" but little ships

Recommended kits

Backlog Quality & Planning Enablement - Implement WIP limits and flow metrics.

Tool sprawl / integration chaos
too-many-tools

Too many disconnected tools creating friction.

DORA impact

Signals

Context switching between 10+ tools daily
Data lives in silos, manual copying between systems
No single source of truth for project status

Recommended kits

CI/CD & Build Automation - Consolidate around golden paths.

Tribal knowledge dependency
tribal-knowledge

Critical knowledge exists only in people's heads, not documented.

DORA impact

MTTR

Signals

Cannot deploy if "the person" is on vacation
New team members take months to ramp up
Same questions asked repeatedly

Recommended kits

Observability & Monitoring Foundations - Document runbooks and operational knowledge.

Recommended templates

Service README - Capture essential service knowledge.

Incident Runbook - Document how to respond to issues.

Architecture Decision Record (ADR) - Record architectural decisions and context.

Unclear priorities
unclear-priorities

Everything is "high priority," so nothing is.

DORA impact

Signals

Frequent priority changes mid-sprint
Multiple stakeholders with conflicting demands
No clear way to say "no" to requests

Recommended kits

Backlog Quality & Planning Enablement - Establish prioritization framework.

Unclear service ownership
unclear-ownership

Nobody knows who owns what, causing delays and finger-pointing.

DORA impact

MTTR

Signals

"Who owns this?" asked frequently
Incidents get bounced between teams
Services have no clear maintainer

Recommended kits

Observability & Monitoring Foundations - Establish ownership and accountability.

Recommended templates

Service README - Document ownership clearly.

Tip: Run the assessment first if you want a prioritized plan; use this page when you need a quick starting point.

Start from a symptom

Search symptoms

"It works on my machine" syndromeworks-on-my-machine

Access permissions are sprawledaccess-sprawl

Alert fatiguealert-fatigue

API versioning nightmareapi-versioning-nightmare

Architecture is unknownarchitecture-unknown

Audit and compliance is painfulaudit-and-compliance-pain

Change aversionchange-aversion

CI pipeline is too slowslow-ci

Cloud costs are a surprisecloud-cost-surprise

Compliance evidence is manualcompliance-manual

Database migrations are scarydatabase-migrations-scary

Dependency vulnerabilities piling updependency-vulnerabilities

Deployment queue / bottleneckdeployment-queue

Deployments feel scarydeployments-feel-scary

Disaster recovery is untestedno-disaster-recovery

Documentation is outdatedoutdated-docs

Drowning in operational toiltoil-overload

End-to-end tests are painfule2e-test-pain

Environment driftenvironment-drift

Estimates are always wrongestimation-inaccurate

Feature flags are out of controlfeature-flags-chaos

Flaky tests erode trustflaky-tests

Flying blind in productionno-observability

Handoff hell between teamshandoff-hell

Hero culturehero-culture

Hotfixes bypass the processhotfixes-bypass-process

Incidents keep happeningincidents-keep-happening

Lead time is too longlead-time-too-long

Legacy system nobody understandslegacy-system-fear

Low bus factorbus-factor

Metrics collected but not usedmetrics-not-used

Microservices chaosmicroservices-chaos

Monolith is slowing us downmonolith-pain

No confidence in changesno-test-coverage

No continuous improvementno-retrospectives

No SLOs definedno-slos

On-call burnouton-call-burnout

Onboarding takes too longonboarding-slow

Post-mortems produce no actionpost-mortems-ignored

PR reviews are a bottleneckpr-review-bottleneck

Regressions escape to productionregression-escapes

Release train chaosrelease-train-chaos

Rollback has never been testedrollback-untested

Scaling is manualscaling-is-manual

Secrets in codesecrets-in-code

Security is an afterthoughtsecurity-as-afterthought

Silos between teamssilos-between-teams

Snowflake serverssnowflake-servers

Tech debt is crushing velocitytech-debt-crushing

Testing is a bottlenecktesting-bottleneck

Too many meetingsmeeting-overload

Too much work in progresswork-in-progress-overload

Tool sprawl / integration chaostoo-many-tools

Tribal knowledge dependencytribal-knowledge

Unclear prioritiesunclear-priorities

Unclear service ownershipunclear-ownership

Start from a symptom

Search symptoms

"It works on my machine" syndromeworks-on-my-machine

Access permissions are sprawledaccess-sprawl

Alert fatiguealert-fatigue

API versioning nightmareapi-versioning-nightmare

Architecture is unknownarchitecture-unknown

Audit and compliance is painfulaudit-and-compliance-pain

Change aversionchange-aversion

CI pipeline is too slowslow-ci

Cloud costs are a surprisecloud-cost-surprise

Compliance evidence is manualcompliance-manual

Database migrations are scarydatabase-migrations-scary

Dependency vulnerabilities piling updependency-vulnerabilities

Deployment queue / bottleneckdeployment-queue

Deployments feel scarydeployments-feel-scary

Disaster recovery is untestedno-disaster-recovery

Documentation is outdatedoutdated-docs

Drowning in operational toiltoil-overload

End-to-end tests are painfule2e-test-pain

Environment driftenvironment-drift

Estimates are always wrongestimation-inaccurate

"It works on my machine" syndrome
works-on-my-machine

Access permissions are sprawled
access-sprawl

Alert fatigue
alert-fatigue

API versioning nightmare
api-versioning-nightmare

Architecture is unknown
architecture-unknown

Audit and compliance is painful
audit-and-compliance-pain

Change aversion
change-aversion

CI pipeline is too slow
slow-ci

Cloud costs are a surprise
cloud-cost-surprise

Compliance evidence is manual
compliance-manual

Database migrations are scary
database-migrations-scary

Dependency vulnerabilities piling up
dependency-vulnerabilities

Deployment queue / bottleneck
deployment-queue

Deployments feel scary
deployments-feel-scary

Disaster recovery is untested
no-disaster-recovery

Documentation is outdated
outdated-docs

Drowning in operational toil
toil-overload

End-to-end tests are painful
e2e-test-pain

Environment drift
environment-drift

Estimates are always wrong
estimation-inaccurate

Feature flags are out of control
feature-flags-chaos

Flaky tests erode trust
flaky-tests

Flying blind in production
no-observability

Handoff hell between teams
handoff-hell

Hero culture
hero-culture

Hotfixes bypass the process
hotfixes-bypass-process

Incidents keep happening
incidents-keep-happening

Lead time is too long
lead-time-too-long

Legacy system nobody understands
legacy-system-fear

Low bus factor
bus-factor

Metrics collected but not used
metrics-not-used

Microservices chaos
microservices-chaos

Monolith is slowing us down
monolith-pain

No confidence in changes
no-test-coverage

No continuous improvement
no-retrospectives

No SLOs defined
no-slos

On-call burnout
on-call-burnout

Onboarding takes too long
onboarding-slow

Post-mortems produce no action
post-mortems-ignored

PR reviews are a bottleneck
pr-review-bottleneck

Regressions escape to production
regression-escapes

Release train chaos
release-train-chaos

Rollback has never been tested
rollback-untested

Scaling is manual
scaling-is-manual

Secrets in code
secrets-in-code

Security is an afterthought
security-as-afterthought

Silos between teams
silos-between-teams

Snowflake servers
snowflake-servers

Tech debt is crushing velocity
tech-debt-crushing

Testing is a bottleneck
testing-bottleneck

Too many meetings
meeting-overload

Too much work in progress
work-in-progress-overload

Tool sprawl / integration chaos
too-many-tools

Tribal knowledge dependency
tribal-knowledge

Unclear priorities
unclear-priorities

Unclear service ownership
unclear-ownership

"It works on my machine" syndrome
works-on-my-machine

Access permissions are sprawled
access-sprawl

Alert fatigue
alert-fatigue

API versioning nightmare
api-versioning-nightmare

Architecture is unknown
architecture-unknown

Audit and compliance is painful
audit-and-compliance-pain

Change aversion
change-aversion

CI pipeline is too slow
slow-ci

Cloud costs are a surprise
cloud-cost-surprise

Compliance evidence is manual
compliance-manual

Database migrations are scary
database-migrations-scary

Dependency vulnerabilities piling up
dependency-vulnerabilities

Deployment queue / bottleneck
deployment-queue

Deployments feel scary
deployments-feel-scary

Disaster recovery is untested
no-disaster-recovery

Documentation is outdated
outdated-docs

Drowning in operational toil
toil-overload

End-to-end tests are painful
e2e-test-pain

Environment drift
environment-drift

Estimates are always wrong
estimation-inaccurate

Feature flags are out of control
feature-flags-chaos

Flaky tests erode trust
flaky-tests

Flying blind in production
no-observability

Handoff hell between teams
handoff-hell