I am facing difficulty in this, maybe the answer is simple so if someone knows the answer, please comment here.
I have created an EKS cluster using the following manifest.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: test-cluster
region: us-west-2
version: "1.29"
vpc:
subnets:
public:
us-west-2a: { id: subnet-094d01de2dd2148c0 }
us-west-2b: { id: subnet-04429e132a1f42826 }
us-west-2c: { id: subnet-028a738bdafc344c6 }
nodeGroups:
- name: ng-spot
instanceType: t3.medium
labels: { role: builders }
desiredCapacity: 2
minSize: 2
maxSize: 4
volumeSize: 30
ssh:
allow: true
publicKeyName: techies
tags:
Name: ng-spot
maxPodsPerNode: 110
This cluster is for testing purposes, so I am using the t3.medium
instance with the maximum pod limit 110
.
arun@ArunLAL555:~$ k get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-37-0.us-west-2.compute.internal Ready <none> 26m v1.29.0-eks-5e0fdde
ip-192-168-86-42.us-west-2.compute.internal Ready <none> 26m v1.29.0-eks-5e0fdde
arun@ArunLAL555:~$ kubectl get nodes -o jsonpath='{.items[*].status.allocatable.pods}{"\n"}'
110 110
This ensures that I can create 110
pods on each node.
arun@ArunLAL555:~$ k create deployment test-deploy --image nginx --replicas 50
deployment.apps/test-deploy created
arun@ArunLAL555:~$ k get po
NAME READY STATUS RESTARTS AGE
test-deploy-859f95ffcc-2c5k6 0/1 ContainerCreating 0 19s
test-deploy-859f95ffcc-2p9rh 1/1 Running 0 19s
test-deploy-859f95ffcc-468wm 0/1 ContainerCreating 0 18s
.
.
test-deploy-859f95ffcc-xxm7z 0/1 ContainerCreating 0 18s
test-deploy-859f95ffcc-z88x6 1/1 Running 0 19s
Here, the remaining pods not getting IP
arun@ArunLAL555:~$ k events po test-deploy-859f95ffcc-xxm7z
1s (x5 over 55s) Warning FailedCreatePodSandBox Pod/test-deploy-859f95ffcc-m7t62 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "528eaad224c5578435db12a57a8fa7063a03423b28d57c681bab742cc8389a1a": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
The following are the subnets and their IP availability
arun@ArunLAL555:~$ aws eks describe-cluster --name test-cluster --query "cluster.resourcesVpcConfig.su
bnetIds"
[
"subnet-094d01de2dd2148c0",
"subnet-04429e132a1f42826",
"subnet-028a738bdafc344c6"
]
arun@ArunLAL555:~$ aws ec2 describe-subnets --subnet-ids subnet-094d01de2dd2148c0 subnet-04429e132a1f42826 subnet-028a738bdafc344c6 --query 'Subnets[*].[SubnetId,AvailableIpAddressCount]' --output text
subnet-028a738bdafc344c6 8167
subnet-094d01de2dd2148c0 8185
subnet-04429e132a1f42826 8168
I have updated the VPC CNI
arun@ArunLAL555:~$ kubectl describe daemonset aws-node --namespace kube-system | grep amazon-k8s-cni: | cut -d : -f 3
v1.16.0-eksbuild.1
arun@ArunLAL555:~$ aws eks create-addon --cluster-name test-cluster --addon-name vpc-cni --addon-version v1.17.1-eksbuild.1 \
-service> --service-account-role-arn arn:aws:iam::111122223333:role/AmazonEKSVPCCNIRole
{
"addon": {
"addonName": "vpc-cni",
"clusterName": "test-cluster",
"status": "CREATING",
"addonVersion": "v1.17.1-eksbuild.1",
"health": {
"issues": []
},
"addonArn": "arn:aws:eks:us-west-2:111122223333:addon/test-cluster/vpc-cni/fec7333d-c1fc-c2fc-1287-c14beaa883f8",
"createdAt": "2024-03-22T19:35:54.685000+05:30",
"modifiedAt": "2024-03-22T19:35:54.703000+05:30",
"serviceAccountRoleArn": "arn:aws:iam::111122223333:role/AmazonEKSVPCCNIRole",
"tags": {}
}
}
arun@ArunLAL555:~$ aws eks describe-addon --cluster-name test-cluster --addon-name vpc-cni --query addon.addonVersion --output text
v1.17.1-eksbuild.1
After that, I have terminated the existing instances, since that the nodes art not getting ready.
arun@ArunLAL555:~$ k get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-40-177.us-west-2.compute.internal NotReady <none> 86s v1.29.0-eks-5e0fdde
ip-192-168-83-11.us-west-2.compute.internal NotReady <none> 3m29s v1.29.0-eks-5e0fdde
arun@ArunLAL555:~$ k describe nodes ip-192-168-40-177.us-west-2.compute.internal
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 22 Mar 2024 19:45:20 +0530 Fri, 22 Mar 2024 19:44:49 +0530 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 22 Mar 2024 19:45:20 +0530 Fri, 22 Mar 2024 19:44:49 +0530 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 22 Mar 2024 19:45:20 +0530 Fri, 22 Mar 2024 19:44:49 +0530 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Fri, 22 Mar 2024 19:45:20 +0530 Fri, 22 Mar 2024 19:44:49 +0530 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
I would like to know why this is happening, if someone know the answer please comment.
- First, why the Pods didn't get IP even though the pod limit was set to the maximum
- Second, why the nodes are not ready after updating the VPC CNI plugin