Kubernetes, node, aws, python

A node in Kubernetes can be referred to as the powerhouse of the cluster that runs containerized applications and resources such as the pods, deployments, services, stateful sets, and so on. You can have multiple nodes depending on your use case and your cluster configuration they can also be scaled up or down. Nodes provide the underlying infrastructure and resources, such as CPU, memory, and storage, necessary to run the resources in a cluster. So you can imagine what happens when one of them decides to power down, basically switching the node status from a "Ready" state to a "NotReady" state causing resources tied to that node to be inactive and stuck.

The plan is to drain out the bad node and delete it so the scheduler can automatically assign the pods to a new node.

However, this can be tricky as some resources remain tied to the bad node even after deletion

To effectively do this with minimal downtime, the steps below

identify the name of the bad node

kubectl get node

ip-10-0-0-1.us-east-2 Ready 196d

ip-10-0-0-4.us-east-2 NotReady 147d

run the drain command

kubectl drain ip-10-0-0-4.us-east-2 --ignore-daemonsets --delete-emptydir-data

this might take a while to complete but the goal here is to uncordon the node so that no resource is assigned to the node and begin the draining process

scale down all deployments|statefulset (You can choose to be specific and scale down the resources tied only to the bad node)

kubectl scale --replicas=0 deployment --all

kubectl scale --replicas=0 statefulsets --all

finally, delete the node

kubectl delete node ip-10-0-0-4.us-east-2

Once done, you can scale the resources back up

kubectl scale --replicas=1 deployment --all

Draining and deleting a bad node

How to drain out the bad node and delete it