DevOps Roadmap for Startups: From Zero to Production in 90 Days

Why Startups Need DevOps From Day One

Most startups delay DevOps investment until it becomes a crisis — a production outage, a security breach, or a deployment process so painful that engineers dread releasing code. By that point, the technical debt is enormous and fixing it costs far more than getting it right early.

The good news: you don't need a dedicated DevOps team or an enterprise budget to build a solid DevOps foundation. With the right prioritisation, a two-person startup can have professional-grade deployment practices within 90 days.

This roadmap tells you exactly what to build, in what order, and why.

The Core Principle: Automate the Path to Production

DevOps at its core is about making the path from "code written" to "code running safely in production" fast, reliable, and repeatable. Every practice in this roadmap serves that goal.

Month 1: The Foundation (Days 1–30)

Week 1: Version Control and Branch Strategy

If you don't already have a branch strategy, define one now. Recommended for startups:

GitHub Flow (simplest, works for most startups):

▹main branch is always deployable
▹Feature work happens in short-lived branches (feature/user-auth, fix/login-bug)
▹Branches merge to main via pull request
▹Every merge to main triggers a deployment

Rules to enforce immediately:

▹No direct pushes to main — all changes via PR
▹At least one reviewer required to merge
▹PRs must pass automated checks before merge

Week 2: Your First CI Pipeline

CI (Continuous Integration) means every code change is automatically tested. Start simple.

GitHub Actions — basic CI for a Node.js app:

name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm test
      - run: npm run lint

This gives you: automated tests on every PR, lint checks to catch code quality issues, and a clear signal when something is broken before it reaches main.

Week 3: Containerise Your Application

Containers (Docker) eliminate "works on my machine" problems and are the foundation for scalable deployments.

Dockerfile for a Node.js application:

FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV production
COPY --from=deps /app/node_modules ./node_modules
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]

Key practices:

▹Use multi-stage builds to keep image size small
▹Pin specific base image versions (not node:latest)
▹Run as a non-root user in production
▹Never bake secrets into the image

Week 4: Environment Management

Define your environments and what each one is for:

Environment	Purpose	Deployed when
Development	Local dev	N/A (runs locally)
Staging	Pre-production testing	Every merge to `main`
Production	Live users	Manual approval or tag

Use environment variables (not hardcoded values) for all config. Use a secrets manager — AWS Secrets Manager, HashiCorp Vault, or GitHub Secrets for CI — never commit secrets to the repo.

Month 2: Deployment Automation (Days 31–60)

Week 5–6: Continuous Deployment Pipeline

Extend your CI pipeline to include automated deployment.

GitHub Actions — CI/CD pipeline with staging auto-deploy:

name: CI/CD

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm test

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}

  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - name: Deploy to staging
        run: |
          kubectl set image deployment/app             app=ghcr.io/${{ github.repository }}:${{ github.sha }}             --namespace=staging

Week 7: Infrastructure as Code

Never click through cloud consoles to provision infrastructure. Everything should be code.

Terraform — provision a basic AWS setup:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  backend "s3" {
    bucket = "my-startup-terraform-state"
    key    = "prod/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_ecs_cluster" "main" {
  name = "startup-cluster"
}

resource "aws_ecs_service" "app" {
  name            = "app-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
}

Start with Terraform for AWS (or equivalent for GCP/Azure). Store state in S3. Use modules to avoid repeating yourself. Every infrastructure change goes through PR review — no manual console changes allowed.

Week 8: Kubernetes Basics (If You Need It)

Kubernetes isn't right for every startup. You need it if: you have multiple services, you need horizontal scaling, or you're spending more than $2k/month on compute and want to optimise.

If you do need Kubernetes, start with a managed cluster: AWS EKS, GCP GKE, or DigitalOcean Kubernetes. Don't self-manage the control plane.

Minimum viable Kubernetes setup:

▹One cluster with staging and production namespaces
▹Deployments (not raw pods) for all services
▹Resource requests and limits on every container
▹Horizontal Pod Autoscaler for traffic-sensitive services
▹Ingress controller (NGINX or Traefik) for HTTP routing

Month 3: Observability and Security (Days 61–90)

Week 9: Logging

You cannot debug production issues without logs. Set up centralised logging on day one of month three.

Minimum viable logging stack:

▹Application logs: Use structured JSON logging (not plain text). Every log line should have: timestamp, level, message, request ID, user ID (if applicable).
▹Log aggregation: Ship logs to a centralised store — Datadog, AWS CloudWatch, or the ELK stack (Elasticsearch + Logstash + Kibana).
▹Retention: Keep 30 days of logs in hot storage, 90 days in cold storage.

Week 10: Metrics and Alerting

Logs tell you what happened. Metrics tell you the health of your system in real time.

The four golden signals (monitor these first):

▹Latency: How long do requests take? Alert if p99 > 2 seconds.
▹Traffic: Requests per second. Alert on sudden drops (could indicate an outage).
▹Error rate: Percentage of 5xx responses. Alert if > 1%.
▹Saturation: CPU, memory, disk usage. Alert at 80% sustained.

Use Prometheus + Grafana (open source) or Datadog (SaaS, easier to get started). Set up on-call rotation with PagerDuty or Opsgenie.

Week 11: Security Foundations

Security is not optional, even for startups. The basics:

Code security:

▹Enable Dependabot or Renovate for automated dependency updates
▹Add a SAST scanner (Semgrep, CodeQL) to your CI pipeline
▹Scan Docker images for vulnerabilities (Trivy, Snyk)

Access security:

▹Enforce MFA on GitHub, AWS, and all cloud accounts
▹Use IAM roles, not access keys, for CI/CD pipelines
▹Follow least-privilege: services get only the permissions they need
▹Rotate secrets on a schedule (use a secrets manager)

Network security:

▹No resources with public IPs except your load balancer and bastion host
▹All internal services communicate within a private VPC
▹WAF (Web Application Firewall) in front of your public endpoints

Week 12: Runbooks and Incident Response

Before you need them in a 3am outage, write runbooks for your most common incident scenarios.

Every runbook should cover: symptoms, diagnosis steps, resolution steps, and escalation path.

Also define your incident severity levels:

Severity	Definition	Response time
P1	Total outage, all users affected	15 minutes
P2	Major feature broken, >20% users affected	1 hour
P3	Minor feature degraded, workaround exists	4 hours
P4	Cosmetic or low-impact issue	Next sprint

90-Day Checklist

▹[ ] Branch protection rules on main
▹[ ] CI pipeline running on every PR
▹[ ] All services containerised (Docker)
▹[ ] Staging environment with auto-deploy
▹[ ] Production deploy requires manual approval
▹[ ] Infrastructure defined in Terraform
▹[ ] No secrets in the codebase (secrets manager in use)
▹[ ] Centralised logging with 30-day retention
▹[ ] Alerts on the four golden signals
▹[ ] MFA enforced on all cloud accounts
▹[ ] At least one runbook per critical service
▹[ ] On-call rotation defined

How Long Does This Actually Take?

With two dedicated engineers and no legacy systems to untangle, this roadmap typically takes 10–12 weeks. With one part-time engineer and existing technical debt, allow 16–20 weeks.

The fastest path is to bring in a DevOps consultant for the first 30 days to build the foundation, then hand it over to your team with documentation. This approach compresses months of learning into weeks of implementation.

Need Help Building Your DevOps Foundation?

We implement DevOps foundations for startups and scale-ups — CI/CD pipelines, Kubernetes clusters, Terraform infrastructure, and observability stacks. Most engagements start with a 2-week foundation sprint.

Book a free DevOps discovery call →

Need hands-on help?

We're a specialist DevOps & Atlassian consulting firm. Book a free call to talk through your specific situation.

Get a Free Consultation