Passing node labels to pods in Kubernetes- 7 mins
In my current project we faced the challenge of deploying Cassandra cluster in Kubernetes. We don’t use any of the cloud providers for hosting Cassandra nor Kubernetes. Since the beginning, there were almost no problem with spinning a Cassandra cluster. Recently, however, because of our hardware setup, we faced the issue of making Cassandra rack aware on Kubernetes cluster.
The setup is(n’t) straightforward. We have 6 VMs for Cassandra, which are grouped into 3 racks - 2 VMs per rack. All of the VMs for Cassandra are labeled in k8s, so that we guarantee with affinity rules, that only Cassandra instances will be deployed there. Additionally the VMs are labeled with rack information:
rack-3. This is precisely the information I needed to push down through Kubernetes to Cassandra itself.
Kubernetes and DownwardAPI
After some quick investigation I found the Kubernetes DownwardAPI. Without too much of a view I was sure that I can use any label specified on node and put it into the container environment variable:
Someone should have seen my face when I found out that you can only reference some restricted metadata with the DownwardAPI, and node labels isn’t one of them. There are even couple of issues and feature requests opened on how to pass through a node label into the pod:
So, ok, it’s not that easy but it’s not something that cannot be done right. In a moment I thought about using an
initContainer to get the node label on which is the pod scheduled, and then add the label on to the pod. Shouldn’t be that hard, right:
Well. Almost. Quite. But not what I’d expect. Though the pod was labeled:
the environment variable was empty inside the container. That’s due to the fact, that the resolution of env vars with DownwardAPI happens during pods scheduling and not execution. Dohhh. So another brainer. But fortunately with little help of a teammate of mine I finally made it with the following approach
Just as a reminder, the original idea was to pass a node label to container with Cassandra inside, so it can use that information to configure Cassandra node with rack information. It’s also important to note that Cassandra is configured with multiple files, and one of them is
cassandra-rackdc.properties which is the place where the rack information should finally be stored. The solution is not that simple, so a picture describes it best, but in steps:
configMapis used to store generic
cassandra-rackdc.propertieswhich should be updated during deployment
initContainertakes this (immutable)
configMapand copies it onto a shared volume, which is shared with the Cassandra container
- container mounts the shared volume and uses
subPathfor mounting just one of the files; we don’t want to overwrite other files
The full blown yaml
For the purpose of readability, much configuration was removed
Uff and yay!. The following is the proof that 4 of the nodes were up with proper rack settings:
Another job done!