Kubernetes Requests and HPA – When Numbers Don’t Add Up

TL;DR: If you are using Kubernetes deployments with request.cpu set to a value other than 1.0 and Horizontal Pod Autoscaling – you may find that the CPU percentages reported by your pods are at odds with CPU percentages reported by HPA.

Background

Kubernetes provides a Quality of Service concept to pods in terms of requests and limits. According to the resource-qos design proposal:

Requests are ‘the amount of that resource that the system will guarantee to the container’

and

Limits are ‘the maximum amount that the system will allow the container to use’.

Kubernetes also provides Horizontal Pod Autoscaling (HPA) to scale the number of pods based on resource (primarily CPU) utilization.

HPA + Requests

Typically requests are expressed in terms of cores (1.0) or millicores (1000).

According to the Kubernetes HPA Docs, HPA ‘calculates the utilization value as a percentage of the equivalent resource request on the containers in each pod’.

In addition, the HPA Autoscaling Algorithm states that ‘CPU utilization is the recent CPU usage of a pod (average across the last 1 minute) divided by the CPU requested by the pod’.

Thus, it is a pretty straight-forward calculation when requests = 1.0. In such cases, the CPU usage reported by HPA equals (or is very close to) to the actual average CPU usage of all pods.

However, it becomes a little tricky when requests do not equal 1.0!

Hey, the HPA CPU usage is off!

It is very easy to forget about the relationship between HPA and requests, which may lead to interesting observations.

If you set CPU request = 0.5 (or 500 millicores), you may find that the CPU utilization reported by HPA is higher than the actual average CPU utilization reported by the pods. To be precise, in this case, the CPU usage reported by HPA is twice as high! So which one is correct: the average POD CPU usage reported by the cAdvisor or the one reported by HPA?

It turns out that both are correct! You just have to remember about the relationship between HPA and requests and it will all come back together. In this case, when CPU request = 0.5, HPA will report the CPU usage as being twice as high because it will calculate the usage based on half of a CPU, i.e 25% of 1 CPU = 50% of 0.5 CPU. The cAdvisor reports the actual CPU usage (25%), while HPA reports the CPU usage adjusted for the request (50%).

The same thing happens when the request > 1.0. In this case, the CPU usage reported by HPA will be lower than the actual average usage reported by the cAdvisor. For example, when the request = 2.0 (2000 millicores), the HPA will report CPU usage at 25% when cAdvisor reports the average CPU usage at 50%.

Illya's Blog