Cost optimization is easier when treated as a continuous engineering workflow, not a quarterly cleanup project. This playbook focuses on practical changes that lower spend while preserving reliability and performance.
1. Right-size requests and limits
- Audit CPU and memory requests against actual usage over 14-30 days.
- Eliminate oversized limits that trigger expensive node scaling.
- Use vertical recommendations as a baseline, then tune with production traffic.
2. Tune autoscaling
- Configure HPA based on meaningful workload signals, not just CPU defaults.
- Set minimum replicas by service criticality and traffic profile.
- Use cluster autoscaler priorities to scale cheaper node pools first.
3. Improve scheduling efficiency
- Apply taints, tolerations, and node affinity for workload segregation.
- Use pod anti-affinity only where availability requirements justify the cost.
- Run batch jobs on spot or preemptible nodes with retry-safe design.
4. Build cost visibility into operations
- Tag namespaces and teams consistently for chargeback visibility.
- Review top cost drivers weekly as part of operations review.
- Track cost-per-service and cost-per-request trends over time.
Quick win: start by reducing resource requests for non-critical workloads and monitor p95 latency for one week before wider rollout.