Managing containers (@home)
As part of my home infra renovations I've been transferring more and more functions to run in containers on top of Rasperry Pis. Even though specifics are extremely hacky and home use only, there's been quite a few lessons learned which link into enterprise grade ways of working also.
This journey mainly started off with Hypriot, which provides great automation to setup docker basics on Pi. So getting to the initial state of delivering everything in one infrastructure as a code model was straight forward.
Hypriot provides support for cloud-init, which makes automating deployment very easy.
Essential flow was / is like this:
Deployment image = Hypriot (GIT)
Deployment configuration = cloud-init user-data
Persistent storage = NFS
(Re-)Deployment model = flash SD card with image & config
Whole magic is in the cloud-init user-data, which has a lot of failures actually in this model but we'll get to that later.
Activities that I placed in the user-data:
- SSH host + user keys
- updates + installation of needed extra packages
- NFS mount
- creation and start of docker instances
Now this of course means that everything comes ready and rocking when the image boots on the Pi, which is good. But bad is the part where the actual deployment model requires physical actions with the SD card. Lifecycle process is straightforward, no updates, no changes, just re-deploy with latest image and good to go. Image updates versions and data is retained in NFS.
The obvious issue with this lifecycle process is that it can't be automated itself since the process includes removing the flash card, reflashing it, and placing it back. Good start, but room for more.
The reason why title is a bit ambiguous, is because there was plenty of steps which didn't actually go anywhere. Or they went into archive of things tried and deemed non feasible.
The first thing I tried as the next evolution was going Kubernetes. I did manage to get a fully automated process of setting up a full Kubernetes cluster going, which was like:
Setup master node -> store keys to persistent storage -> deploy workers which pickup the keys
I even converted all my docker configs into Kubernetes and managed to get everything running on that side, but then I failed to get the ingress controller working. Traefik "should" work and has worked but at least at the specific time when I was trying the stories didn't come together. Which led me to try k3s, which was better but then I got into challenges about deployment automation.
At this point I started to think that even if I get the Kubernetes or K3S running with the process, I still have the same problem as I originally had. Manual intervention at the (re-)deployment.
So I started to venture on another track, PXE/NFS boot. This got the deployment into a whole other level. POE powered Rasperry Pis, PXE booting on NFS root. The whole deployment can be done and re-done all through automation. But there's a catch, as there usually is.
NFS root Pi is really great for disposable deployments, especially when POE powered, but that doesn't actually work for docker. As docker runtime does a lot of bind mounts across locations, the setup fails if the /var/docker filesystem is on NFS.
As a next step, I started thinking about hybrid deployment, NFS root with just the docker overlay in the SD card. This works with Rasperry Pi 4 and up where PXE boot can be set as primary. (older versions don't PXE if SD card is connected).
But it starts getting fairly complex to use ready configs when the deployment model itself is quite custom.
After all this, I'm not 100% sure if the Kubernetes will actually bring value to the flow, since there's not really that much LCM done on container level. Due to capacity limits and all, container instances are fairly bound to specific HW. If the HW fails, the flow can deploy the stuff on any other HW, so monitoring metric is more HW bound than content bound.
Now after all things back and forth, I have a fairly solid plan for what the setup should look like.
From container point of view, if I will go with orchestration it'll be K3S. But at this point due to the everything fails and simplicity is key approach, I don't think I will add that here. As in this deployment it's also quite important info to understand what breaks if I unplug a specific physical node, which kinda makes container automation a disadvantage. Linked to the fact that at the current point all the services running on the nodes are solutions which can't be made node resilient.
Now the key part, removing physical work from the deployment flow. PXE booting is the only approach to actually achieve this, but in order to retain simplivity in the runtime the deployment must be be on SD card. Kind of having your cake and eating it too situation, except it's not.
I previously configured my VMware setup to work in a way that they always PXE boot, but if there's no re-install needed the PXE boot actually directs to booting from local disk. So intent is to do same here, always PXE boot and create a mini PXE boot OS that just re-images the SD card from NFS with appropriate user-data.
This way the full deployments can be end to end automated by an external flow. Which is part of the other key part in the target state, getting a pipeline to run the whole show.
Bits and pieces are already in GIT, but now after this part gets implemented I can start to implement GIThub actions as part of the solution so that Rasperry Pis get fully redeployed including defined containers any time a new commit is done to the code.
Needs a few more weekends to get it done, but plan is clear.