This page will describe how I use kured (Kubenetes Reboot Daemon) for my homelab kubernetes cluster. I won't go into the details of kured itself though. For more information about kured, please refer to the links provided in the references section.
Keeping your servers up to date via the package management system should be a defacto SOP for people running homelabs. For my homelab servers, most of which are running Ubuntu, that means using the apt package manager to find and apply any package updates on a regular periodic basis. I use an ansible playbook to provide daily automated package updates to all of my local and remote homelab servers.
An issue arises for kubernetes cluster nodes though when kernel or core system updates have been applied, and the server requires a reboot in order to complete the update. The normal way of doing node maintenance of any sort with Kubernetes is to first drain and cordon the node (
kubectl drain node_name --ignore-daemonsets). Once all of the workloads have been moved to other nodes then the cordoned node can be rebooted. After reboot it is made available to accept workloads by uncordoning the node (
kubectl uncordon node_name).
Having to do this manually, whether for a large production cluster, or a more modest homelab cluster, can be time consuming, prone to errors, and easily forgotten or missed if not paying close attention to the updates that have been applied.
This is where kured comes in. The description from the kured github page describes it well;
Kured (KUbernetes REboot Daemon) is a Kubernetes daemonset that performs safe automatic node reboots when the need to do so is indicated by the package management system of the underlying OS.
There is one issue though, that will affect those running Raspberry Pi based Kubernetes clusters. Currently weaveworks only provides images built against the amd64 architecture. As I mentioned in the multi-architecture post, one solution would be to manually build an arm64 image for kured.
Luckily, this has already been done for us, at the raspbernetes multi-arch-images github page. They track a number of images, including kured, and provide up to date multi-architecture images for each. The raspbernetes image built for kured (and likely all the others; I've only used their kured image, so far) is a drop in replacement for the weaveworks image, so using it is a simple case of substituting it in the weaveworks kured manifest file.
Starting with the installation instructions on the weaveworks/kured github page, download the latest manifest file using wget or curl;
$ latest=$(curl -s https://api.github.com/repos/weaveworks/kured/releases | jq -r ..tag_name) $ wget https://github.com/weaveworks/kured/releases/download/$latest/kured-$latest-dockerhub.yaml
Next, substitute the kured multi-architecture image from raspbernetes;
$ diff kured-1.8.0-dockerhub.yaml-dist kured-1.8.0-dockerhub_raspbernetes.yaml 95c95,96 < image: docker.io/weaveworks/kured:1.8.0 --- > #image: docker.io/weaveworks/kured:1.8.0 > image: raspbernetes/kured:1.8.0
Once you have the modifications to the manifest file completed, kured is installed to the cluster in the standard manner using kubectl;
$ kubectl apply -f kured-1.8.0-dockerhub_raspbernetes.yaml
This will start a kured pod on each node, which will then manage automated node reboots as required.
$ kubectl get pods -l name=kured -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kured-b6xr9 1/1 Running 0 108m 10.42.0.16 node-1-rpi4 <none> <none> kured-brwrn 1/1 Running 0 109m 10.42.1.10 node-2-lxc <none> <none> kured-n5xsl 1/1 Running 0 111m 10.42.2.14 node-3-lxc <none> <none> kured-8ql4r 1/1 Running 0 109m 10.42.3.12 node-4-lxc <none> <none> kured-jbvm7 1/1 Running 0 109m 10.42.5.13 node-5-rpi4 <none> <none> kured-zfcj8 1/1 Running 0 110m 10.42.4.14 node-6-rpi4 <none> <none> kured-6jzmz 1/1 Running 0 110m 10.42.6.14 node-7-rpi4 <none> <none>
I've been using kured for a few months now (at the time of writing this post) and it has performed flawlessly for me. Usually the only way I know it has been running is when I happen to notice the uptime change on my kubernetes nodes.
It is pretty cool to watch it at work though. Seeing a node drain and corden itself, reboot and then uncordon itself, all autonomously, is more than a little surreal.
(created: 2021-10-12, last modified: 2021-10-12 at 09:03:00)