I am planning to use Descheduler in my AKS deployment to balance memory consumption of AKS nodes. My current output of kubectl top nodes is:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
aks-nodepool1-53884836-vmss000000 198m 10% 12317Mi 97%
aks-nodepool1-53884836-vmss000001 189m 9% 12952Mi 102%
aks-nodepool1-53884836-vmss000002 213m 11% 12747Mi 101%
aks-nodepool1-53884836-vmss000003 135m 7% 5970Mi 47%
However when i tried different scenarios in Descheduler i got following output
I0612 13:51:55.145678 1 nodeutilization.go:204] "Node is underutilized" node="aks-nodepool1-53884836-vmss000003" usage={"cpu":"810m","memory":"476Mi","pods":"27"} usagePercentage={"cpu":42.63,"memory":3.78,"pods":10.8}
I0612 13:51:55.145712 1 nodeutilization.go:204] "Node is underutilized" node="aks-nodepool1-53884836-vmss000000" usage={"cpu":"582m","memory":"501Mi","pods":"54"} usagePercentage={"cpu":30.63,"memory":3.98,"pods":21.6}
I0612 13:51:55.145725 1 nodeutilization.go:204] "Node is underutilized" node="aks-nodepool1-53884836-vmss000001" usage={"cpu":"950m","memory":"596Mi","pods":"61"} usagePercentage={"cpu":50,"memory":4.74,"pods":24.4}
I0612 13:51:55.145743 1 nodeutilization.go:210] "Node is appropriately utilized" node="aks-nodepool1-53884836-vmss000002" usage={"cpu":"962m","memory":"647Mi","pods":"56"} usagePercentage={"cpu":50.63,"memory":5.14,"pods":22.4}
As you can see that utilization seen by Descheduler is drastically different from what top is reporting. Especially memory which is not more than 5% utilized in any of the nodes whereas top is reporting 47% and above
When i describe the node i see that utilization of all the custom pods as 0.
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
default alerts-667b7bc-88djq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
default alerts-ag-5544b98c45-xjnss 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
default api-6db9645d8b-p6jqm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
default authentication-766496cf6b-js77v 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
default authentication-ag-585fdf767nsp6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
default authentication-validator-76b444457c-6x66x 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
default authorization-checker-5789b576ff-wssl2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
default authorization-78f759f849-xmk2p 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
default backups-agent-68f47f764c-vpmlh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
default backups-f7d6c765d-qsl8v 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32d
....
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 582m (30%) 4647m (244%)
memory 501Mi (3%) 3657Mi (29%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Describe node allocated resources seem to conform with what descheduler is seeing. Given that utilization of all the custom pods is shown as 0%. Getting top pod for one of the pods e.g. kubectl top pods alerts-667b7bc-88djq
NAME CPU(cores) MEMORY(bytes)
alerts-667b7bc-88djq 2m 108Mi
PodMetrics seems to agree with this. kubectl describe PodMetrics alerts-667b7bc-88djq
API Version: metrics.k8s.io/v1beta1
Containers:
Name: alerts
Usage:
Cpu: 1268872n
Memory: 111272Ki
Kind: PodMetrics
Any help understanding whats going on here. Why describe node is failing to register any resource utilization (and subsequently descheduler reporting the same) whereas top nodes is presenting a totally different picture ?