Restart a Kubernetes Cluster in a Practical Way

As cloud-native technologies continue to gain momentum, developers are focusing more on transforming conventional applications into cloud-native applications, hoping to take advantage of the flexibility and scalability that cloud-native technologies like Kubernetes offer.

Powerful as Kubernetes is, it could still bring difficulties in practice. For example, it might be a puzzle when it comes to restarting a Kubernetes cluster. In this article, we'll look into how to restart a Kubernetes cluster in a practical way.

What is a Kubernetes Cluster

A Kubernetes cluster is a combination of nodes that run containerized applications. These nodes can be virtual machines if the cluster is deployed in a cloud environment, or physical machines if the cluster is running in an on-premises environment. A Kubernetes cluster includes at least one control plane and a number of worker nodes. The control plane exposes the Kubernetes API so that the worker nodes can communicate with the control plane.

As the control plane oversees the state of a Kubernetes cluster, worker nodes handles tasks assigned by it to actually run containerized applications in pods. Moreover, the pods are not tied to any specific worker nodes. Kubernetes can schedule them around the cluster according to the declarative YAML manifests to improve stability and efficiency. To learn more about the concept of Kubernetes cluster, see Cluster Architecture.

Restart a Kubernetes Cluster

This article assumes that you set up your Kubernetes cluster through kubeadm or KubeKey.

You have to make sure that you at least finish the backup for ectd before restarting your cluster, which would prevent you from the loss of critical data. Next, let's go into details about the process of restarting a Kubernetes cluster.

Shut down worker nodes

Connect to a worker node through SSH.

Run the following commands to stop pod scheduling and drain existing pods on the node.


kubectl cordon <worker node name>
kubectl drain <worker node name> --ignore-daemonsets --delete-emptydir-data

Run the following command to stop kubelet.
```
sudo systemctl stop kubelet
```
Run the following command to stop Docker.
```
sudo systemctl stop docker
```
Run the following command to shut down the worker node.
```
sudo shutdown now
```
Perform the same operations on other worker nodes (if any) to shut them down.

Shut down control planes

Connect to a control plane through SSH.

Run the following commands to stop pod scheduling and drain existing pods on the node.


kubectl cordon <control plane name>
kubectl drain <control plane name> --ignore-daemonsets --delete-emptydir-data

Run the following command to stop kubelet.
```
sudo systemctl stop kubelet
```
Run the following command to stop Docker.
```
sudo systemctl stop docker
```
(Optional) If your etcd is deployed on the control plane, you need to run the following command to stop etcd service. If your etcd runs in the form of pod in your Kubernetes cluster, you can skip this step.
```
sudo systemctl stop etcd
```
Run the following command to shut down the control plane.
```
sudo shutdown now
```
Perform the same operations on other control planes (if any) to shut them down.

(Optional) Shut down ectd nodes

For a Kubernetes cluster deployed by kubeadm, etcd runs as a pod in the cluster and you can skip this step. If you set up your Kubernetes cluster through other methods, you may need to perform the following steps.

Connect to an etcd node through SSH.
Run the following command to stop kubelet.
```
sudo systemctl stop kubelet
```
Run the following command to stop etcd.
```
sudo systemctl stop etcd
```
Run the following command to stop Docker.
```
sudo systemctl stop docker
```
Run the following command to shut down the ectd node.
```
sudo shutdown now
```
Perform the same operations on other etcd nodes (if any) to shut them down.

Shut down storage

When all the worker nodes and control planes are shut down, you can shut down any persistent storage devices (if any).

Restart the Kubernetes cluster

Power on any persistent storage devices (if any).
Power on the instances for ectd nodes. You can log in to the etcd nodes and run the command docker ps to ensure that ectd is up and running.
Power on the instances for control planes. You can log in to the control planes and run the command docker ps to ensure that kube-apiserver, kube-controller-manager, and kube-scheduler are up and running.
Power on the instances for worker nodes. You can log in to the worker nodes and run the command docker ps to ensure that kubelet and kube-proxy are up and running.

Conclusion

This article hopes to give you a practical idea about how to restart a Kubernetes cluster. Nevertheless, restarting Kubernetes clusters requires caution because we might come across downtime during the restarting process, especially when we run single replicas of our application. In this connection, we should always pay attention to issues necessary to be taken into consideration before restarting any Kubernetes clusters.