Skip to main content
    DevOps
    Way of Working
    1. Home
    2. Kits
    3. Operate Iac Baseline

    Infrastructure & Operations Baseline

    Infrastructure as Code for all infrastructure, runbook standards, and operational readiness practices.

    Milestone: Foundation
    foundational
    DF
    MTTR

    Job to be done: When infrastructure is created manually via portal with no version control or audit trail, I want to define everything as code in git with automated provisioning, so I can rebuild environments consistently and recover from failures predictably.

    For engineers

    You will document your current infrastructure in code using Terraform or CloudFormation, set up an automated IaC pipeline with drift detection, implement tested backup and restore procedures, and establish disaster recovery runbooks that can rebuild your entire environment in minutes.

    What you’ll implement

    These are the roadmap epic features, organized as a starter backlog.

    1
    Infrastructure as Code
    2
    Operational Runbooks
    3
    On-Call Rotation
    4
    Autoscaling Configuration
    5
    Backup and Recovery

    Execution guide

    Practical guidance aligned to the Execution Kit Definition of Done.

    Outcome

    Infrastructure is defined as code, version-controlled, and provisioned through automated pipelines with basic monitoring and backup strategies.

    Before to After Transformation

    × BEFOREClickOps infrastructure

    Infrastructure created manually via portal, no version control, configuration drift, and disaster recovery relies on documentation

    # Infrastructure management:
    - Create resources via Azure/AWS portal
    - Document steps in Confluence (maybe)
    - Configuration drift across environments
    - DR plan: "We think we know how to rebuild it"
    - Backup strategy: Manual snapshots
    
    Pain points:
    - Env parity violations
    - 3+ hours to provision new environment
    - No audit trail
    - Bus factor: 1-2 people
    AFTERInfrastructure as Code

    All infrastructure version-controlled in git, automated provisioning via pipelines, drift detection, and automated backups

    # IaC with Terraform:
    terraform apply # 5 minutes to provision
    git history # full audit trail
    terraform plan # preview changes
    automated drift detection # daily scans
    
    Benefits:
    - Env parity: 100% identical configs
    - Provisioning: 5-10 minutes (automated)
    - Disaster recovery: Tested quarterly
    - Compliance: Policy-as-code enforcement
    - Team knowledge: Codified, shareable

    Symptoms

    Infrastructure provisioned manually through cloud console
    Configuration changes not tracked or auditable
    Inconsistent infrastructure across environments
    No disaster recovery plan or tested backups
    Infrastructure changes cause unexpected outages

    Prerequisites

    Cloud account or infrastructure platform access
    Version control system (Git)
    Basic understanding of infrastructure requirements
    CI/CD pipeline capability

    Implementation steps

    Week 1
    • Audit current infrastructure and document as code (Terraform, CloudFormation, Pulumi)
    • Set up IaC repository with version control
    • Implement basic infrastructure modules (network, compute, storage)
    • Create infrastructure CI/CD pipeline for validation
    Week 2
    • Apply IaC to non-production environment
    • Implement automated backup strategy for critical data
    • Add infrastructure monitoring (CPU, memory, disk)
    • Document disaster recovery procedures
    Week 3
    • Establish infrastructure change approval process
    • Test disaster recovery and backup restoration
    • Apply IaC to production with change management
    • Set up cost monitoring and optimization alerts

    Definition of Done

    • 90%+ of infrastructure defined in version-controlled IaC
    • Infrastructure changes deployed through CI/CD pipeline
    • Automated backups for all critical data with tested restoration
    • Infrastructure monitoring in place for all resources
    • Disaster recovery plan documented and tested
    • Infrastructure provisioning is repeatable and consistent

    Metrics

    Leading Indicators
    • Infrastructure change frequency
    • IaC coverage (%)
    • Backup success rate
    Lagging Indicators
    • Infrastructure-related incidents
    • Mean time to restore infrastructure
    • Configuration drift incidents

    Failure modes

    IaC without state management (lost track of infrastructure)
    No testing of disaster recovery (backup fails when needed)
    Credentials hardcoded in IaC (security vulnerability)
    Infrastructure changes bypass IaC (manual drift)

    Ownership

    Platform/DevOps
    • Develop and maintain IaC codebase
    • Manage infrastructure CI/CD pipelines
    • Implement backup and monitoring strategies
    SRE/Operations
    • Define infrastructure requirements and standards
    • Test disaster recovery procedures
    • Monitor infrastructure health and costs
    Security
    • Review infrastructure security configurations
    • Manage secrets and credentials for IaC
    • Audit infrastructure changes for compliance

    What good looks like (by org scale)

    Small Teams
    • Basic Terraform/Bicep for core infrastructure
    • Version-controlled IaC in git
    • Manual terraform apply with peer review
    Medium Orgs
    • Automated IaC pipelines with drift detection
    • Modular IaC with reusable components
    • Automated backup/restore procedures
    Enterprise
    • Self-service infrastructure via IaC catalog
    • Policy-as-code enforcement (OPA/Sentinel)
    • Automated compliance scanning and remediation

    References

    Terraform Best Practices
    Infrastructure as Code Principles

    Resources

    Templates and related materials for this kit.

    Templates
    Copy/paste artifacts that support this kit.
    Architecture Decision Record (ADR)
    A short ADR template for recording decisions and keeping architecture aligned over time.
    Service Onboarding Checklist (Golden Path)
    A checklist for onboarding a new service into the platform: ownership, CI/CD, observability, and security.

    Related capabilities

    Capabilities tracked under this epic in the roadmap.

    • Infrastructure as Code
      >= 70% of infrastructure managed via IaC (Terraform, Pulumi, CloudFormation) in version control.
    • Operational Runbooks
      >= 80% of critical services have runbooks for deployment, incident response, and disaster recovery.
    • On-Call Rotation
      >= 90% of production services have defined on-call rotation with < 15min incident response SLA.
    • Autoscaling Configuration
      >= 70% of stateless services have horizontal autoscaling based on CPU/memory or custom metrics.
    • Backup and Recovery
      >= 90% of stateful services (databases, volumes) have automated backups with tested recovery procedures.

    Related kits

    Other kits in the same milestone or with similar DORA impact.

    Deployment Automation Foundations
    Foundation
    DF
    MTTR
    Backlog Quality & Planning Enablement
    Foundation
    LT
    DF
    CI/CD & Build Automation
    Foundation
    DF
    LT
    Observability & Monitoring Foundations
    Foundation
    MTTR
    CFR
    DevOps
    Way of Working

    DevOps practices for the entire delivery lifecycle

    © 2019-2026 devopswow.com. Created by Burhan Öcüt

    PartnersAboutPrivacyTermsCookies