Kubernetes Cost Optimisation: How to Cut Your Cloud Bill by 40%

The Kubernetes Cost Problem

Kubernetes makes it easy to deploy workloads. It also makes it easy to waste money. Teams provision clusters "just in case", set resource requests based on guesswork, and leave idle namespaces running for months.

The average Kubernetes deployment is 40–60% over-provisioned according to benchmarks from cloud providers. For a team spending $20,000/month on cloud infrastructure, that's $8,000–$12,000 in waste every month.

This guide covers the optimisations that consistently deliver the biggest savings — in order of impact.

1. Right-Size Your Resource Requests and Limits

This is where most of the money is hidden. Every Pod in Kubernetes has two resource settings:

▹Requests: The amount of CPU/memory the scheduler reserves for the Pod
▹Limits: The maximum it can consume

When requests are set too high, nodes appear "full" before they're actually full. The scheduler can't place new Pods, so you scale up and pay for nodes that are mostly idle.

How to Find Over-Provisioned Workloads

Use kubectl top pods to see actual consumption:

kubectl top pods --all-namespaces --sort-by=cpu

Compare actual usage to requested resources. If a pod requests 2 CPU but consistently uses 0.2 CPU, it's 10x over-provisioned.

Use the Vertical Pod Autoscaler in Recommendation Mode

VPA can recommend right-sized values without actually changing anything:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"

After a week, check the recommendations with kubectl describe vpa my-app-vpa. VPA shows the 50th, 90th, and 95th percentile resource usage — use these to set accurate requests.

Typical saving: 20–35% just from right-sizing.

2. Enable Cluster Autoscaler

Without Cluster Autoscaler, your node count is static. You're either over-provisioned (wasting money) or under-provisioned (causing failures). Cluster Autoscaler adds nodes when Pods can't be scheduled and removes nodes when they've been idle for 10+ minutes.

containers:
  - name: cluster-autoscaler
    image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.0
    command:
      - ./cluster-autoscaler
      - --cloud-provider=aws
      - --expander=least-waste
      - --scale-down-delay-after-add=5m
      - --scale-down-unneeded-time=10m

The --expander=least-waste flag tells the autoscaler to choose the node type that wastes the least resources when scaling up.

Typical saving: 15–25% by eliminating idle nodes overnight and on weekends.

3. Use Spot/Preemptible Instances for Non-Critical Workloads

Spot instances (AWS) or Preemptible VMs (GCP) are spare capacity sold at 60–90% discount. The trade-off: they can be terminated with 2 minutes' notice.

This makes them ideal for CI/CD build jobs, batch processing, dev and staging environments, and stateless microservices.

Mixed Node Groups on EKS

managedNodeGroups:
  - name: on-demand-critical
    instanceType: m5.xlarge
    desiredCapacity: 2
    minSize: 2
    maxSize: 4
    labels:
      workload-type: critical

  - name: spot-general
    instanceTypes: ['m5.xlarge', 'm5.2xlarge', 'm4.xlarge']
    spot: true
    minSize: 0
    maxSize: 20
    labels:
      workload-type: batch

Use node selectors to route workloads to the right pool. Multiple instance types in the spot group improves availability when one instance type is unavailable.

Typical saving: 50–70% on compute for workloads that can tolerate interruption.

4. Implement Namespace-Level Resource Quotas

Without quotas, any team can deploy unlimited resources. One misconfigured deployment can fill a cluster.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-frontend
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

Also set LimitRanges so Pods without resource specs get sensible defaults rather than being unlimited:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-frontend
spec:
  limits:
    - default:
        cpu: "500m"
        memory: "512Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      type: Container

5. Scale Down Dev Environments at Night

Most dev and staging environments are idle 70% of the time — nights and weekends. Scale them to zero during off-hours.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-dev
  namespace: development
spec:
  schedule: "0 20 * * 1-5"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: kubectl
              image: bitnami/kubectl:latest
              command:
                - /bin/sh
                - -c
                - kubectl scale deployment --all --replicas=0 -n development

Typical saving: 60–70% on dev/staging compute costs.

6. Use Horizontal Pod Autoscaler Correctly

HPA scales your Pods based on CPU or custom metrics. Most teams set it up but forget to tune the scale-down behaviour, which leads to HPA thrashing:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0

The stabilizationWindowSeconds: 300 on scale-down prevents removing pods too aggressively during brief traffic dips.

7. Get Cost Visibility with Kubecost

You can't optimise what you can't see. Kubecost (free tier available) gives you cost breakdown by namespace, deployment, label, and team.

helm repo add cost-analyzer https://kubecost.github.io/cost-analyzer
helm install kubecost cost-analyzer/cost-analyzer   --namespace kubecost   --create-namespace

Once running, you'll see which teams and workloads are driving costs — making it easy to have data-driven conversations about resource usage.

Prioritised Cost Optimisation Checklist

Action	Effort	Typical Saving
Right-size resource requests	Medium	20–35%
Enable Cluster Autoscaler	Low	15–25%
Spot instances for non-critical workloads	Medium	50–70% on eligible workloads
Scale down dev environments overnight	Low	60–70% on dev costs
Clean up unused PVCs and PVs	Low	5–15%
Implement resource quotas	Low	Prevents future waste
Deploy Kubecost for visibility	Low	Enables all future optimisation

Start at the top of the list. Right-sizing + Cluster Autoscaler alone typically delivers 30–40% savings with two days of work.

Need Help Cutting Your Kubernetes Costs?

We conduct Kubernetes cost reviews and implement optimisation strategies for engineering teams. Most clients see 30–50% reduction in their cloud bill within the first month.

Book a free Kubernetes cost review →

Need hands-on help?

We're a specialist DevOps & Atlassian consulting firm. Book a free call to talk through your specific situation.

Get a Free Consultation