Draining and deleting a bad node
How to drain out the bad node and delete it
A node in Kubernetes can be referred to as the powerhouse of the cluster that runs containerized applications and resources such as the pods, deployments, services, stateful sets, and so on. You can have multiple nodes depending on your use case and your cluster configuration they can also be scaled up or down. Nodes provide the underlying infrastructure and resources, such as CPU, memory, and storage, necessary to run the resources in a cluster. So you can imagine what happens when one of them decides to power down, basically switching the node status from a "Ready" state to a "NotReady" state causing resources tied to that node to be inactive and stuck.
The plan is to drain out the bad node and delete it so the scheduler can automatically assign the pods to a new node.
However, this can be tricky as some resources remain tied to the bad node even after deletion
To effectively do this with minimal downtime, the steps below
- identify the name of the bad node
kubectl get node
ip-10-0-0-1.us-east-2 Ready 196d
ip-10-0-0-4.us-east-2 NotReady 147d
- run the drain command
kubectl drain ip-10-0-0-4.us-east-2 --ignore-daemonsets --delete-emptydir-data
this might take a while to complete but the goal here is to uncordon the node so that no resource is assigned to the node and begin the draining process
- scale down all deployments|statefulset (You can choose to be specific and scale down the resources tied only to the bad node)
kubectl scale --replicas=0 deployment --all
kubectl scale --replicas=0 statefulsets --all
finally, delete the node
kubectl delete node ip-10-0-0-4.us-east-2
Once done, you can scale the resources back up
kubectl scale --replicas=1 deployment --all