The Kubernetes Cost Problem
Kubernetes makes it easy to deploy workloads. It also makes it easy to waste money. Teams provision clusters "just in case", set resource requests based on guesswork, and leave idle namespaces running for months.
The average Kubernetes deployment is 40–60% over-provisioned according to benchmarks from cloud providers. For a team spending $20,000/month on cloud infrastructure, that's $8,000–$12,000 in waste every month.
This guide covers the optimisations that consistently deliver the biggest savings — in order of impact.
1. Right-Size Your Resource Requests and Limits
This is where most of the money is hidden. Every Pod in Kubernetes has two resource settings:
- ▹Requests: The amount of CPU/memory the scheduler reserves for the Pod
- ▹Limits: The maximum it can consume
When requests are set too high, nodes appear "full" before they're actually full. The scheduler can't place new Pods, so you scale up and pay for nodes that are mostly idle.
How to Find Over-Provisioned Workloads
Use kubectl top pods to see actual consumption:
kubectl top pods --all-namespaces --sort-by=cpuCompare actual usage to requested resources. If a pod requests 2 CPU but consistently uses 0.2 CPU, it's 10x over-provisioned.
Use the Vertical Pod Autoscaler in Recommendation Mode
VPA can recommend right-sized values without actually changing anything:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off"After a week, check the recommendations with kubectl describe vpa my-app-vpa. VPA shows the 50th, 90th, and 95th percentile resource usage — use these to set accurate requests.
Typical saving: 20–35% just from right-sizing.
2. Enable Cluster Autoscaler
Without Cluster Autoscaler, your node count is static. You're either over-provisioned (wasting money) or under-provisioned (causing failures). Cluster Autoscaler adds nodes when Pods can't be scheduled and removes nodes when they've been idle for 10+ minutes.
containers:
- name: cluster-autoscaler
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.0
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --expander=least-waste
- --scale-down-delay-after-add=5m
- --scale-down-unneeded-time=10mThe --expander=least-waste flag tells the autoscaler to choose the node type that wastes the least resources when scaling up.
Typical saving: 15–25% by eliminating idle nodes overnight and on weekends.
3. Use Spot/Preemptible Instances for Non-Critical Workloads
Spot instances (AWS) or Preemptible VMs (GCP) are spare capacity sold at 60–90% discount. The trade-off: they can be terminated with 2 minutes' notice.
This makes them ideal for CI/CD build jobs, batch processing, dev and staging environments, and stateless microservices.
Mixed Node Groups on EKS
managedNodeGroups:
- name: on-demand-critical
instanceType: m5.xlarge
desiredCapacity: 2
minSize: 2
maxSize: 4
labels:
workload-type: critical
- name: spot-general
instanceTypes: ['m5.xlarge', 'm5.2xlarge', 'm4.xlarge']
spot: true
minSize: 0
maxSize: 20
labels:
workload-type: batchUse node selectors to route workloads to the right pool. Multiple instance types in the spot group improves availability when one instance type is unavailable.
Typical saving: 50–70% on compute for workloads that can tolerate interruption.
4. Implement Namespace-Level Resource Quotas
Without quotas, any team can deploy unlimited resources. One misconfigured deployment can fill a cluster.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-frontend
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"Also set LimitRanges so Pods without resource specs get sensible defaults rather than being unlimited:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-frontend
spec:
limits:
- default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container5. Scale Down Dev Environments at Night
Most dev and staging environments are idle 70% of the time — nights and weekends. Scale them to zero during off-hours.
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-dev
namespace: development
spec:
schedule: "0 20 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- kubectl scale deployment --all --replicas=0 -n developmentTypical saving: 60–70% on dev/staging compute costs.
6. Use Horizontal Pod Autoscaler Correctly
HPA scales your Pods based on CPU or custom metrics. Most teams set it up but forget to tune the scale-down behaviour, which leads to HPA thrashing:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0The stabilizationWindowSeconds: 300 on scale-down prevents removing pods too aggressively during brief traffic dips.
7. Get Cost Visibility with Kubecost
You can't optimise what you can't see. Kubecost (free tier available) gives you cost breakdown by namespace, deployment, label, and team.
helm repo add cost-analyzer https://kubecost.github.io/cost-analyzer
helm install kubecost cost-analyzer/cost-analyzer --namespace kubecost --create-namespaceOnce running, you'll see which teams and workloads are driving costs — making it easy to have data-driven conversations about resource usage.
Prioritised Cost Optimisation Checklist
| Action | Effort | Typical Saving |
|---|---|---|
| Right-size resource requests | Medium | 20–35% |
| Enable Cluster Autoscaler | Low | 15–25% |
| Spot instances for non-critical workloads | Medium | 50–70% on eligible workloads |
| Scale down dev environments overnight | Low | 60–70% on dev costs |
| Clean up unused PVCs and PVs | Low | 5–15% |
| Implement resource quotas | Low | Prevents future waste |
| Deploy Kubecost for visibility | Low | Enables all future optimisation |
Start at the top of the list. Right-sizing + Cluster Autoscaler alone typically delivers 30–40% savings with two days of work.
Need Help Cutting Your Kubernetes Costs?
We conduct Kubernetes cost reviews and implement optimisation strategies for engineering teams. Most clients see 30–50% reduction in their cloud bill within the first month.
Book a free Kubernetes cost review →
Need hands-on help?
We're a specialist DevOps & Atlassian consulting firm. Book a free call to talk through your specific situation.
Get a Free Consultation