Jekyll2019-11-21T12:30:14+00:00https://gmaslowski.com/feed.xmlgmaslowski.comon software developmentakka-cluster.g82019-06-10T13:30:00+00:002019-06-10T13:30:00+00:00https://gmaslowski.com/akka-cluster-g8Akka Cluster Giter8 template.2019-06-10T00:00:00+00:002019-06-10T00:00:00+00:00https://gmaslowski.com/akka-cluster-giter8-template<p>Couple of times already I’ve found myself in a situation that I wanted to prototype something based on Akka Cluster. In order to do that I was almost always reviewing the <a href="https://doc.akka.io/docs/akka/2.5.5/scala/cluster-usage.html#a-simple-cluster-example">Akka Cluster Usage Sample</a> and downloading the sample, or have used one of my already created examples. The problem with that approach is that all of the samples already have code and configuration in it. Mostly, with new prototypes, I don’t need package names, configurations and snippets of code remaning from other prototypes..</p>
<p>So this time, when I wanted to prototype an actor based PubSub dispatching mechanism on Akka Cluster, I decided to finally extract a template for that, so it doesn’t take half an hour to get a cluster running locally.</p>
<h2 id="giter8-templates">Giter8 templates</h2>
<p>There’s a nice project called <a href="http://www.foundweekends.org/giter8/">Giter8</a> created exactly for such cases. I’ve stumbled upon it already some years ago. But never had the chance/need to actually create a template myself. So.. I started :).</p>
<h2 id="akka-cluster-base-project">Akka Cluster base project</h2>
<p>For my new scaffold/template project I had minimal requirements:</p>
<ul>
<li>project has to be akka cluster based</li>
<li>akka cluster should be configured right from the start, so that no code change is required to setup a working cluster</li>
<li>it has to start the cluster locally, thus enabling prototyping really fast without the need for extra configuration</li>
</ul>
<h3 id="enabling-akka-cluster">Enabling Akka Cluster</h3>
<p>This is quite easy to achieve, proper library dependencies have to be added to the project:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">libraryDependencies</span> <span class="o">++=</span> <span class="nc">Seq</span><span class="o">(</span>
<span class="s">"com.typesafe.akka"</span> <span class="o">%%</span> <span class="s">"akka-actor"</span> <span class="o">%</span> <span class="n">akkaV</span><span class="o">,</span>
<span class="s">"com.typesafe.akka"</span> <span class="o">%%</span> <span class="s">"akka-slf4j"</span> <span class="o">%</span> <span class="n">akkaV</span><span class="o">,</span>
<span class="s">"com.typesafe.akka"</span> <span class="o">%%</span> <span class="s">"akka-cluster"</span> <span class="o">%</span> <span class="n">akkaV</span><span class="o">,</span>
<span class="s">"ch.qos.logback"</span> <span class="o">%</span> <span class="s">"logback-classic"</span> <span class="o">%</span> <span class="n">logbackV</span><span class="o">,</span>
<span class="o">)</span></code></pre></figure>
<p>Additionally I’ve already added logging component, as it comes quite handy with my logback configuration I use almost everywhere.</p>
<h3 id="making-sure-akka-cluster-is-configured">Making sure Akka Cluster is configured</h3>
<p>That’s also mostly quite easy, as <a href="https://doc.akka.io/docs/akka/2.5.12/cluster-usage.html?language=scala">Akka Cluster Documentation</a> describes it quite good. But nevertheless some configuration is needed:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">akka</span> <span class="o">{</span>
<span class="n">loggers</span> <span class="k">=</span> <span class="o">[</span><span class="err">"</span><span class="kt">akka.event.slf4j.Slf4jLogger</span><span class="err">"</span><span class="o">]</span>
<span class="n">loglevel</span> <span class="k">=</span> <span class="s">"INFO"</span>
<span class="n">logging</span><span class="o">-</span><span class="n">filter</span> <span class="k">=</span> <span class="s">"akka.event.slf4j.Slf4jLoggingFilter"</span>
<span class="n">actor</span> <span class="o">{</span>
<span class="n">debug</span> <span class="o">{</span>
<span class="n">lifecycle</span> <span class="k">=</span> <span class="n">off</span>
<span class="n">receive</span> <span class="k">=</span> <span class="n">off</span>
<span class="n">autoreceive</span> <span class="k">=</span> <span class="n">off</span>
<span class="o">}</span>
<span class="n">provider</span> <span class="k">=</span> <span class="s">"cluster"</span>
<span class="o">}</span>
<span class="n">remote</span> <span class="o">{</span>
<span class="n">log</span><span class="o">-</span><span class="n">remote</span><span class="o">-</span><span class="n">lifecycle</span><span class="o">-</span><span class="n">events</span> <span class="k">=</span> <span class="n">off</span>
<span class="nv">netty</span><span class="o">.</span><span class="py">tcp</span> <span class="o">{</span>
<span class="n">hostname</span> <span class="k">=</span> <span class="s">"127.0.0.1"</span>
<span class="n">port</span> <span class="k">=</span> <span class="mi">0</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">cluster</span> <span class="o">{</span>
<span class="n">seed</span><span class="o">-</span><span class="n">nodes</span> <span class="k">=</span> <span class="o">[</span>
<span class="err">"</span><span class="kt">akka.tcp://actorSystem@</span><span class="err">127</span><span class="kt">.</span><span class="err">0</span><span class="kt">.</span><span class="err">0</span><span class="kt">.</span><span class="err">1</span><span class="kt">:</span><span class="err">2552"</span>,
<span class="err">"</span><span class="kt">akka.tcp://actorSystem@</span><span class="err">127</span><span class="kt">.</span><span class="err">0</span><span class="kt">.</span><span class="err">0</span><span class="kt">.</span><span class="err">1</span><span class="kt">:</span><span class="err">2553"</span><span class="o">]</span>
<span class="n">auto</span><span class="o">-</span><span class="n">down</span><span class="o">-</span><span class="n">unreachable</span><span class="o">-</span><span class="n">after</span> <span class="k">=</span> <span class="mi">10</span><span class="n">s</span>
<span class="nv">jmx</span><span class="o">.</span><span class="py">multi</span><span class="o">-</span><span class="n">mbeans</span><span class="o">-</span><span class="n">in</span><span class="o">-</span><span class="n">same</span><span class="o">-</span><span class="n">jvm</span> <span class="k">=</span> <span class="n">on</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<h3 id="starting-the-cluster-locally">Starting the cluster locally</h3>
<p>Starting multiple actor systems locally, that form a cluster is also trivial (it’s also shown somewhere in the Akka Cluster Example). How it’s basically achieved? Well, in one JVM (one <code class="language-plaintext highlighter-rouge">main</code> method) multiple akka nodes are being started, which join the cluster (due to the shared configuration, pointing to local seed nodes). The imported main parts of such project:</p>
<hr />
<p><code class="language-plaintext highlighter-rouge">ExampleClusterApp.scala</code> - is the wrapper main App starting multiple applications:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">package</span> <span class="nn">com.gmaslowski.example</span>
<span class="k">object</span> <span class="nc">ExampleClusterApp</span>
<span class="k">extends</span> <span class="nc">App</span> <span class="o">{</span>
<span class="nv">ExampleApp</span><span class="o">.</span><span class="py">main</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span><span class="s">"2552"</span><span class="o">).</span><span class="py">toArray</span><span class="o">)</span>
<span class="nv">ExampleApp</span><span class="o">.</span><span class="py">main</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span><span class="s">"2553"</span><span class="o">).</span><span class="py">toArray</span><span class="o">)</span>
<span class="nv">ExampleApp</span><span class="o">.</span><span class="py">main</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span><span class="s">"2554"</span><span class="o">).</span><span class="py">toArray</span><span class="o">)</span>
<span class="o">}</span></code></pre></figure>
<hr />
<p><code class="language-plaintext highlighter-rouge">ExampleApp.scala</code> - is the actuall main App starting the Akka Cluster:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">package</span> <span class="nn">com.gmaslowski.example</span>
<span class="k">import</span> <span class="nn">akka.actor.ActorSystem</span>
<span class="k">import</span> <span class="nn">akka.cluster.Cluster</span>
<span class="k">import</span> <span class="nn">com.typesafe.config.ConfigFactory</span>
<span class="k">object</span> <span class="nc">ExampleApp</span>
<span class="k">extends</span> <span class="nc">App</span>
<span class="k">with</span> <span class="nv">AkkaComponents</span><span class="o">.</span><span class="py">Default</span> <span class="c1">// provides general configuration traits
</span> <span class="k">with</span> <span class="nv">AkkaClusterComponents</span><span class="o">.</span><span class="py">Default</span> <span class="c1">// provides general configuration traits for Akka Cluster
</span> <span class="k">with</span> <span class="nv">ExampleComponents</span><span class="o">.</span><span class="py">Default</span> <span class="o">{</span> <span class="c1">// provides custom components
</span>
<span class="k">val</span> <span class="nv">port</span> <span class="k">=</span> <span class="nf">if</span> <span class="o">(</span><span class="nv">args</span><span class="o">.</span><span class="py">isEmpty</span><span class="o">)</span> <span class="s">"0"</span> <span class="k">else</span> <span class="nf">args</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">config</span> <span class="k">=</span> <span class="nc">ConfigFactory</span>
<span class="o">.</span><span class="py">parseString</span><span class="o">(</span><span class="n">s</span><span class="s">"akka.remote.netty.tcp.port=$port"</span><span class="o">)</span>
<span class="o">.</span><span class="py">withFallback</span><span class="o">(</span><span class="nv">ConfigFactory</span><span class="o">.</span><span class="py">load</span><span class="o">())</span>
<span class="k">override</span> <span class="k">val</span> <span class="nv">actorSystem</span> <span class="k">=</span> <span class="nc">ActorSystem</span><span class="o">(</span><span class="s">"actorSystem"</span><span class="o">,</span> <span class="n">config</span><span class="o">)</span>
<span class="k">override</span> <span class="k">val</span> <span class="nv">cluster</span> <span class="k">=</span> <span class="nc">Cluster</span><span class="o">(</span><span class="n">actorSystem</span><span class="o">)</span>
<span class="n">initExampleComponents</span>
<span class="o">}</span></code></pre></figure>
<ul>
<li><code class="language-plaintext highlighter-rouge">.parseString(s"akka.remote.netty.tcp.port=$port")</code> - substitutes the Akka Remoting port with provided value</li>
</ul>
<hr />
<p><code class="language-plaintext highlighter-rouge">build.sbt</code> - making sure that <code class="language-plaintext highlighter-rouge">run</code> invokes the wrapper App:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">mainClass</span> <span class="nf">in</span><span class="o">(</span><span class="nc">Compile</span><span class="o">,</span> <span class="n">run</span><span class="o">)</span> <span class="o">:=</span> <span class="nc">Some</span><span class="o">(</span><span class="s">"com.gmaslowski.example.ExampleClusterApp"</span><span class="o">)</span></code></pre></figure>
<hr />
<p><code class="language-plaintext highlighter-rouge">application.conf</code> - couple of helping settings:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">akka.remote.netty.tcp.port = 0</code> - would be random by default, but gets overriden anyway</li>
<li><code class="language-plaintext highlighter-rouge">akka.cluster.jmx.multi-mbeans-in-same-jvm</code> - informs Akka, that there are multiple clusters in one JVM</li>
</ul>
<hr />
<p><code class="language-plaintext highlighter-rouge">ClusterListener.scala</code> - also taken from the Akka Cluster Example to see/notify about the cluster events:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">package</span> <span class="nn">com.gmaslowski.example</span>
<span class="k">import</span> <span class="nn">akka.actor.</span><span class="o">{</span><span class="nc">Actor</span><span class="o">,</span> <span class="nc">ActorLogging</span><span class="o">,</span> <span class="nc">Props</span><span class="o">}</span>
<span class="k">import</span> <span class="nn">akka.cluster.Cluster</span>
<span class="k">import</span> <span class="nn">akka.cluster.ClusterEvent._</span>
<span class="k">object</span> <span class="nc">ClusterListener</span> <span class="o">{</span>
<span class="k">def</span> <span class="nf">props</span> <span class="k">=</span> <span class="nc">Props</span><span class="o">(</span><span class="n">classOf</span><span class="o">[</span><span class="kt">ClusterListener</span><span class="o">])</span>
<span class="o">}</span>
<span class="k">class</span> <span class="nc">ClusterListener</span>
<span class="k">extends</span> <span class="nc">Actor</span>
<span class="k">with</span> <span class="nc">ActorLogging</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">cluster</span> <span class="k">=</span> <span class="nc">Cluster</span><span class="o">(</span><span class="nv">context</span><span class="o">.</span><span class="py">system</span><span class="o">)</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">preStart</span><span class="o">()</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="o">{</span>
<span class="nv">cluster</span><span class="o">.</span><span class="py">subscribe</span><span class="o">(</span><span class="n">self</span><span class="o">,</span> <span class="n">initialStateMode</span> <span class="k">=</span> <span class="nc">InitialStateAsEvents</span><span class="o">,</span> <span class="n">classOf</span><span class="o">[</span><span class="kt">MemberEvent</span><span class="o">],</span> <span class="n">classOf</span><span class="o">[</span><span class="kt">UnreachableMember</span><span class="o">])</span>
<span class="o">}</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">postStop</span><span class="o">()</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="nv">cluster</span><span class="o">.</span><span class="py">unsubscribe</span><span class="o">(</span><span class="n">self</span><span class="o">)</span>
<span class="k">def</span> <span class="nf">receive</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">MemberUp</span><span class="o">(</span><span class="n">member</span><span class="o">)</span> <span class="k">=></span>
<span class="nv">log</span><span class="o">.</span><span class="py">info</span><span class="o">(</span><span class="s">"Member is Up: {}"</span><span class="o">,</span> <span class="nv">member</span><span class="o">.</span><span class="py">address</span><span class="o">)</span>
<span class="k">case</span> <span class="nc">UnreachableMember</span><span class="o">(</span><span class="n">member</span><span class="o">)</span> <span class="k">=></span>
<span class="nv">log</span><span class="o">.</span><span class="py">info</span><span class="o">(</span><span class="s">"Member detected as unreachable: {}"</span><span class="o">,</span> <span class="n">member</span><span class="o">)</span>
<span class="k">case</span> <span class="nc">MemberRemoved</span><span class="o">(</span><span class="n">member</span><span class="o">,</span> <span class="n">previousStatus</span><span class="o">)</span> <span class="k">=></span>
<span class="nv">log</span><span class="o">.</span><span class="py">info</span><span class="o">(</span><span class="s">"Member is Removed: {} after {}"</span><span class="o">,</span> <span class="nv">member</span><span class="o">.</span><span class="py">address</span><span class="o">,</span> <span class="n">previousStatus</span><span class="o">)</span>
<span class="k">case</span> <span class="k">_:</span> <span class="kt">MemberEvent</span> <span class="o">=></span> <span class="c1">// ignore
</span> <span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>And that was all there is to the simple Akka Cluster App. Now it needs to be made available as a Giter8 template.</p>
<h2 id="gitering-the-solution">Gitering the solution</h2>
<p>The documentation available at <a href="http://www.foundweekends.org/giter8/template.html">http://www.foundweekends.org/giter8/template.html</a> describes quite well, how such templates are supposed to be created. In this case I’ll just limit myself to show the g8 project structure and list some gotchas I’ve came across.</p>
<hr />
<p>Giter8 <code class="language-plaintext highlighter-rouge">default.properties</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">name</span><span class="o">=</span>Example
<span class="nv">package</span><span class="o">=</span>com.example
<span class="nv">description</span><span class="o">=</span>Example Akka Cluster App.
<span class="nv">systemname</span><span class="o">=</span><span class="nv">$name</span><span class="p">;</span><span class="nv">format</span><span class="o">=</span><span class="s2">"normalize"</span><span class="err">$</span>
<span class="nv">classname</span><span class="o">=</span><span class="nv">$name</span><span class="p">;</span><span class="nv">format</span><span class="o">=</span><span class="s2">"Camel"</span><span class="err">$</span>
verbatim <span class="o">=</span> <span class="k">*</span>.xml</code></pre></figure>
<p>This file will be used, as configuration, by Giter8 during the scaffold/generation process. The variables are being then used for substitution in the project template.</p>
<hr />
<p>Project directory structure:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">rock-solid λ ~/devenv/private/akka-cluster.g8/ master tree
<span class="nb">.</span>
├── README.md
└── src
└── main
└── g8
├── build.sbt
├── default.properties
├── project
│ └── plugins.sbt
└── src
└── main
├── resources
│ ├── application.conf
│ └── logback.xml
└── scala
└── <span class="nv">$package</span><span class="err">$</span>
├── <span class="nv">$classname$App</span>.scala
├── <span class="nv">$classname$ClusterApp</span>.scala
├── <span class="nv">$classname$Components</span>.scala
├── AkkaClusterComponents.scala
├── AkkaComponents.scala
└── ClusterListener.scala
9 directories, 12 files
rock-solid λ ~/devenv/private/akka-cluster.g8/ master </code></pre></figure>
<ul>
<li>all of template project files, by convention, should be placed either into <code class="language-plaintext highlighter-rouge">src/main/g8</code> or <code class="language-plaintext highlighter-rouge">./</code> directory.</li>
<li>the <code class="language-plaintext highlighter-rouge">default.properties</code> files contains variables which will be substituted during project scaffolding/generation;
<ul>
<li><code class="language-plaintext highlighter-rouge">$package$</code> - will be expanded into directories defined by the package name</li>
<li><code class="language-plaintext highlighter-rouge">$classname$</code> - name of classes, coming directly from the project name - that is sufficient for the fast prototyping needs</li>
</ul>
</li>
<li>inside <code class="language-plaintext highlighter-rouge">default.properties</code> file default values are residing, so that it’s not needed to provide them at all</li>
</ul>
<hr />
<p>One of the source files:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">package</span> <span class="nn">$package$</span>
<span class="k">import</span> <span class="nn">akka.actor.ActorSystem</span>
<span class="k">import</span> <span class="nn">akka.cluster.Cluster</span>
<span class="k">import</span> <span class="nn">com.typesafe.config.ConfigFactory</span>
<span class="k">object</span> <span class="nc">$classname$App</span>
<span class="k">extends</span> <span class="nc">App</span>
<span class="k">with</span> <span class="nv">AkkaComponents</span><span class="o">.</span><span class="py">Default</span>
<span class="k">with</span> <span class="nv">AkkaClusterComponents</span><span class="o">.</span><span class="py">Default</span>
<span class="k">with</span> <span class="nv">$classname$Components</span><span class="o">.</span><span class="py">Default</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">port</span> <span class="k">=</span> <span class="nf">if</span> <span class="o">(</span><span class="nv">args</span><span class="o">.</span><span class="py">isEmpty</span><span class="o">)</span> <span class="s">"0"</span> <span class="k">else</span> <span class="nf">args</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">config</span> <span class="k">=</span> <span class="nc">ConfigFactory</span>
<span class="o">.</span><span class="py">parseString</span><span class="o">(</span><span class="s">"akka.remote.netty.tcp.port="</span> <span class="o">+</span> <span class="n">port</span><span class="o">)</span>
<span class="o">.</span><span class="py">withFallback</span><span class="o">(</span><span class="nv">ConfigFactory</span><span class="o">.</span><span class="py">load</span><span class="o">())</span>
<span class="k">override</span> <span class="k">val</span> <span class="nv">actorSystem</span> <span class="k">=</span> <span class="nc">ActorSystem</span><span class="o">(</span><span class="s">"$systemname$"</span><span class="o">,</span> <span class="n">config</span><span class="o">)</span>
<span class="k">override</span> <span class="k">val</span> <span class="nv">cluster</span> <span class="k">=</span> <span class="nc">Cluster</span><span class="o">(</span><span class="n">actorSystem</span><span class="o">)</span>
<span class="n">init$classname$Components</span>
<span class="o">}</span></code></pre></figure>
<h3 id="gotchas">Gotchas:</h3>
<ul>
<li>I had issues while trying to use Scala string interpolation like this <code class="language-plaintext highlighter-rouge">s"$variable"</code> - the Giter8 processor (because of <code class="language-plaintext highlighter-rouge">$</code>) tried to use <code class="language-plaintext highlighter-rouge">$variable</code> as something to replace; I fixed the issue by switching to <code class="language-plaintext highlighter-rouge">"" + variable</code> notation; not really sophisticated, but does the job ;); I did not search for a solution though</li>
<li><code class="language-plaintext highlighter-rouge">verbatim=*.xml</code> was added becasue the <code class="language-plaintext highlighter-rouge">logback.xml</code> file contains a <code class="language-plaintext highlighter-rouge">${CONSOLE_LOG_PATTERN}</code> entry, which again is not parsed by Giter8 properly</li>
</ul>
<h3 id="example-run">Example run</h3>
<p>Now it is really easy to scaffold a basic, working Akka Cluster application by simply executing <code class="language-plaintext highlighter-rouge">sbt</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">sbt new gmaslowski/akka-cluster.g8</code></pre></figure>
<p>Since a picture is worth more, than tousands of words:</p>
<script src="https://asciinema.org/a/250761.js" id="asciicast-250761" async="" data-autoplay="false" data-size="medium" data-speed="2.5" data-theme="solarized-dark"></script>
<h2 id="enhancements">Enhancements</h2>
<p>I already have some ideas of what I could change and benefit from it:</p>
<ul>
<li>adding Revolver Plugin for sbt, so that sbt doesn’t get blocked by the running nodes</li>
<li>adding Docker/Docker Swarm template for deploying locally</li>
<li>adding Kubernetes template, for deploying to a locally running Minikube cluster</li>
</ul>
<h2 id="links">Links</h2>
<ul>
<li><a href="http://www.foundweekends.org/giter8/template.html">Giter8</a></li>
<li><a href="https://doc.akka.io/docs/akka/2.5.12/cluster-usage.html?language=scala">Akka Cluster Usage</a></li>
<li><a href="https://github.com/gmaslowski/akka-cluster.g8">https://github.com/gmaslowski/akka-cluster.g8</a></li>
<li><a href="https://asciinema.org/a/250761?speed=2.5&theme=solarized-dark&size=medium">https://asciinema.org/a/250761?speed=2.5&theme=solarized-dark&size=medium</a></li>
<li><a href="https://asciinema.org/">Asciinema</a></li>
</ul>Couple of times already I’ve found myself in a situation that I wanted to prototype something based on Akka Cluster. In order to do that I was almost always reviewing the Akka Cluster Usage Sample and downloading the sample, or have used one of my already created examples. The problem with that approach is that all of the samples already have code and configuration in it. Mostly, with new prototypes, I don’t need package names, configurations and snippets of code remaning from other prototypes..(Not really) headless browser test execution.2019-04-28T00:00:00+00:002019-04-28T00:00:00+00:00https://gmaslowski.com/not-really-headless-browser-testing<p>Do you participate in a project which has automated UI tests? Do you get annoyed by the fact that your CI environment is flaky because of UI tests? Do you have enough of maintaining multiple slave nodes of your <a href="https://jenkins.io/">Jenkins</a> installation just for the sake of OS specific browser behaviour? If you find any of the questions familiar, please read this article where I try to explain what drove me to an obvious solution for a configuration madness I was going through in every project.</p>
<h2 id="how-do-you-run-your-ui-based-tests">How do you run your UI based tests?</h2>
<p>Many, many times in previous projects I was questioning myself, why we keep a VM (or couple of them) together with a Firefox/Chrome installation just for the sake of runnning UI tests. Discussions with QA never really resulted in any different approach. Until a point, sometime in 2017, when I was helping another team with migrating from Jenkins to <a href="https://about.gitlab.com/product/continuous-integration/">GitlabCI</a>. They had their own test automation framework (also UI based) which in previous configuration was using a jenkins-slave-node with Chrome or Firefox inside it, running inside a X session. The team wanted to move to GitlabCI for their CI pipelines, but they didn’t know how to tackle the UI based tests.</p>
<p>The approach I’m describing is nor new nor really innovative. I was even surprised myself that I haven’t thought of it years before.</p>
<h2 id="xvfb-to-the-rescue">xvfb to the rescue</h2>
<p>After quick research, I’ve found a solution to use headless option to run UI tests. Unfortunately, the team was bound to a Firefox version which did not have a headless option yet. Turned out that another approach was to use <a href="https://en.wikipedia.org/wiki/Xvfb">xvfb</a>. If I recall correctly I was following this blog article <a href="http://elementalselenium.com/tips/38-headless">http://elementalselenium.com/tips/38-headless</a> in order to get the tests running. With <code class="language-plaintext highlighter-rouge">xvfb</code> at disposal, the UI testing job could still run a browser within a X session, but without the need to actually display it. Furthermore, I already understood that I can containerize <code class="language-plaintext highlighter-rouge">xvfb</code> to leverage the <a href="https://docs.gitlab.com/runner/executors/docker.html">gitlab runner docker executor</a>, which spawns docker containers for every executed job. Well, that’s great sounds like job done.</p>
<h2 id="applied-solution">Applied solution</h2>
<p>Unfortunately, I don’t have the source code I created two years ago, to show the real solution. That’s why I decided (for the sake of completeness of this post) to fork the <a href="https://selenide.org/">Selenide</a> project (which has UI tests inside) and show how one could run <em>‘headless’</em> tests as I configured them. But before showing the code, let me repeat that the automated framework was bound to Firefox version 63.0.3. For this version the real <em>headless</em> option was not available yet - which forced me to create the image in the first place. Additionally, to show how to use headless browsers I’ve added those options to the project as well.</p>
<p><code class="language-plaintext highlighter-rouge">.gitlab-ci.yml</code>:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">stages</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">test</span>
<span class="s">.job_template</span><span class="pi">:</span> <span class="nl">&test_job</span>
<span class="na">only</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">master</span>
<span class="na">stage</span><span class="pi">:</span> <span class="s">test</span>
<span class="na">variables</span><span class="pi">:</span>
<span class="na">LANG</span><span class="pi">:</span> <span class="s">C.UTF-8</span>
<span class="na">LC_ALL</span><span class="pi">:</span> <span class="s">C.UTF-8</span>
<span class="s1">'</span><span class="s">Firefox</span><span class="nv"> </span><span class="s">xvfb'</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">registry.gitlab.com/gmaslowski-blog/headless-docker/images/xvfb-firefox:latest</span>
<span class="s"><<</span><span class="pi">:</span> <span class="nv">*test_job</span>
<span class="na">script</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">xvfb-run -a ./gradlew firefox</span>
<span class="s1">'</span><span class="s">Firefox</span><span class="nv"> </span><span class="s">Headless'</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">registry.gitlab.com/gmaslowski-blog/headless-docker/images/standalone-firefox:latest</span>
<span class="s"><<</span><span class="pi">:</span> <span class="nv">*test_job</span>
<span class="na">script</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">./gradlew firefox_headless</span>
<span class="s1">'</span><span class="s">Chrome</span><span class="nv"> </span><span class="s">Headless'</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">registry.gitlab.com/gmaslowski-blog/headless-docker/images/standalone-chrome:latest</span>
<span class="s"><<</span><span class="pi">:</span> <span class="nv">*test_job</span>
<span class="na">script</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">./gradlew chrome_headless</span></code></pre></figure>
<p>In this snippet I’ve run the tests in three different base images, with different options:</p>
<ul>
<li>using a customly built image and xvfb (with some older Firefox)</li>
<li>using a <a href="https://github.com/SeleniumHQ/docker-selenium/tree/master/StandaloneChrome">Selenium Standalome Chrome</a> (with JDK addition)</li>
<li>using a <a href="https://github.com/SeleniumHQ/docker-selenium/tree/master/StandaloneFirefox">Selenium Standalome Firefox</a> (with JDK addition)</li>
</ul>
<blockquote>
<p>For the presentation purposes I needed to build the <code class="language-plaintext highlighter-rouge">standalone-firefox:latest</code> and <code class="language-plaintext highlighter-rouge">standalone-chrome:latest</code> images myself, as Selenide uses <em>Gradle</em> as a tests runner, which in turn needed at least a JDK available. The presented image description can be found at <a href="https://gitlab.com/gmaslowski-blog/headless-docker/images">https://gitlab.com/gmaslowski-blog/headless-docker/images</a></p>
</blockquote>
<blockquote>
<p>The <code class="language-plaintext highlighter-rouge">xvfb-firefox:latest</code> is an image with <code class="language-plaintext highlighter-rouge">xvfb</code> and Firefox in the specified version. To make firefox in this version work, I needed to install aditional libraries and the gecko driver.</p>
</blockquote>
<p>What’s visible inside the <code class="language-plaintext highlighter-rouge">.gitlab-ci.yml</code> file is that there’s a stage in which 3 sets of tests are being executed, each with a different option:</p>
<ul>
<li>headless firefox</li>
<li>headless chrome</li>
<li>xvfb firefox</li>
</ul>
<p>That can also be seen in the pipeline <a href="https://gitlab.com/gmaslowski-blog/headless-docker/selenide/pipelines/58815400">https://gitlab.com/gmaslowski-blog/headless-docker/selenide/pipelines/58815400</a>. You can see that the Firefox and Chrome Headless tests are failing. I did not focus on them too much, as the failures appear <strong>only for 2 of 474 tests</strong>. For the sake of this post I did not investigate. Coming back to the <code class="language-plaintext highlighter-rouge">xvfb</code> based approach, by wrapping our command into <code class="language-plaintext highlighter-rouge">xvfb-run -a <command></code> we actually run an in memory X display, which then in turn has firefox opened. Quite interesting, right? :) Hence, our tests can run inside a container.</p>
<p>As a side note, any artifacts by the tests (like a screenshot with a failure) could be stored inside Gitlab using the <a href="https://docs.gitlab.com/ee/user/project/pipelines/job_artifacts.html">Job Artifacts</a> feature. Though I haven’t shown that in my example it is pretty straight forward to use. Same for the test reports.</p>
<h2 id="alternative-approaches">Alternative approaches</h2>
<p>I’ve tackled the problem in the aforementioned way. But was it the best one? For the time being I thought so. However, I’d like to point out, that there are other ways to solve the same problem.</p>
<h3 id="headless-browsers">Headless browsers</h3>
<p>Starting from 59 version of Chrome, it offers a <em>headless</em> functionality, which does not require any X session to be available. It runs purely inside memory - that’s a nice alternative to <code class="language-plaintext highlighter-rouge">xvfb</code> solution. In my case, as stated previously, the tests were bound to a specific version of a browser, which didn’t have headless option yet. I didn’t really try to adjust the tests to run in newer browser - I estimated that this would be more time-consuming approach.</p>
<h3 id="selenium-images-and-dockerized-selenium-grid">Selenium Images and Dockerized Selenium Grid</h3>
<p>Instead of building own image whith the browser that’s needed, one could go for an already predefined image from <a href="https://www.seleniumhq.org/">Selenium</a> as I’ve shown for the <strong>not-xvfb</strong> options.</p>
<p>Maintaining <a href="https://www.seleniumhq.org/projects/grid/">Selenium Grid</a> would eventually end up in lots of work around the hub and the nodes, so there’s an easier solution to that. Just dockerize Selenium Grid and enjoy the possibility of having it configured and deployed anywhere you like. In this article <a href="http://www.testautomationguru.com/selenium-grid-setup-using-docker/">http://www.testautomationguru.com/selenium-grid-setup-using-docker/</a> the author shows how to setup dockerized Selenium Grid with minimal effort. What’s more, one could use Docker Swarm or Kubernetes as the orchestration platform for Selenium Grid. One thing I haven’t really thought through as it comes to this approach is multi OS configuration - on one hand, I think that shouldn’t be impossible knowning all the tools. On the other hand, I think that a <strong>xvfb-based-headless</strong> approach for older browsers might be impossible to achieve on a Windows container.</p>
<h3 id="cloud-solutions">Cloud solutions</h3>
<p>If you are lucky enough to have the freedom to choose cloud-based solutions in your company (believe me that some companies still restrain from it in 2019), there are some options for you, like:</p>
<ul>
<li><a href="https://www.gridlastic.com/">https://www.gridlastic.com/</a></li>
<li><a href="https://testingbot.com/">https://testingbot.com/</a></li>
<li><a href="https://saucelabs.com/resources/automated-testing/selenium-grid">https://saucelabs.com/resources/automated-testing/selenium-grid</a></li>
</ul>
<p>Those are just couple of solutions I found on the internet. I haven’t tested any of them, but they’re there if you need them. All of the solutions are claiming to have Selenium Grid underneath, and what would be interesting for me is whether I really need to base my UI tests on Selenium, to actually leverage them?</p>
<h2 id="conclusion">Conclusion</h2>
<p>What have I learnt in general about approaches to UI tests, besides the obvious technical possibilities? I think that questioning the <em>status quo</em> is something really needed in the projects we’re building. It happens many times that we, in IT projects, tend to follow the principles and solutions which were chosen years ago. The technology, solutions and approaches evolve almost constantly. Not saying we should follow them blindly, but at least keep track on what’s going on, and choose what’s appropriate for the project we’re working on. Under the circumstances of normal development, this is quite a big effort which needs to be taken.</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://about.gitlab.com/product/continuous-integration/">GitlabCI</a></li>
<li><a href="https://www.seleniumhq.org/">Selenium</a></li>
<li><a href="https://selenide.org/">Selenide</a></li>
<li><a href="https://www.seleniumhq.org/projects/grid/">Selenium Grid</a></li>
<li><a href="https://gitlab.com/gmaslowski-blog/headless-docker">https://gitlab.com/gmaslowski-blog/headless-docker</a></li>
<li><a href="http://elementalselenium.com/tips/38-headless">http://elementalselenium.com/tips/38-headless</a></li>
<li><a href="http://www.testautomationguru.com/selenium-grid-setup-using-docker/">http://www.testautomationguru.com/selenium-grid-setup-using-docker/</a></li>
</ul>Do you participate in a project which has automated UI tests? Do you get annoyed by the fact that your CI environment is flaky because of UI tests? Do you have enough of maintaining multiple slave nodes of your Jenkins installation just for the sake of OS specific browser behaviour? If you find any of the questions familiar, please read this article where I try to explain what drove me to an obvious solution for a configuration madness I was going through in every project.Kubernetes: Init Containers promises2019-03-11T00:00:00+00:002019-03-11T00:00:00+00:00https://gmaslowski.com/k8s-init-containers<p>Some time ago, I wrote about my current project, and how did we tackle the issue of passing node labels to pods in <a href="https://kubernetes.io">Kubernetes</a> context. The solution worked (and still does), but there was a caveat to it, which I’d like to share in this short article.</p>
<h2 id="brief-introduction-to-the-problem">Brief introduction to the problem</h2>
<p>In this article <a href="https://gmaslowski.com/kubernetes-node-label-to-pod/">https://gmaslowski.com/kubernetes-node-label-to-pod/</a> I described how we’ve passed k8s node labels to the deployed pods. Not to repeat myself, I used <a href="https://kubernetes.io/docs/concepts/workloads/pods/init-containers/">Init Containers</a> with a volume mount shared between the init and app container. If you’d like to get more details about it, I advise you to read the aforementioned article.</p>
<p>But, what’s worth mentioning for this story, is that the <em>init container</em> was able to control our k8s cluster (to obtain node label) and with this script:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">cp /cassandra/cassandra-rackdc.properties /shared/cassandra-rackdc.properties &&</span>
<span class="s">sed -i.bak s/RACK/$(kubectl get no -Lvm/rack | grep ${NODE_NAME} | awk '{print $6}')/g /shared/cassandra-rackdc.properties</span></code></pre></figure>
<p>a template (coming from a <code class="language-plaintext highlighter-rouge">configMap</code> volume) was filled and copied to a <em>shared</em> volume, which then was used by the app containers. Everything was fine and working, because of the <a href="https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#understanding-init-containers"><em>promises</em></a> which are brought by init containers:</p>
<ul>
<li>started before app containers are</li>
<li>always run to completion</li>
<li>on failure, restart the whole pod (according to its <em>restartPolicy</em>)</li>
</ul>
<h3 id="symptoms">Symptoms</h3>
<p>So far, so good. With this configuration and the promises in mind, our setup was expected to always have a <em>/etc/cassandra/cassandra-rackdc.properties</em> file with the following content:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">dc=customDataCenter</span>
<span class="s">rack=customServer-<X></span></code></pre></figure>
<p>built upon defined template of following form:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">dc=customDataCenter</span>
<span class="s">rack=RACK</span></code></pre></figure>
<p>Everything was great, until one time we saw, that one of the <a href="http://cassandra.apache.org/">Cassandra</a> nodes cannot start because the persisted data refer to a different rack (unfortunately I did not preserve the actual log) with value <code class="language-plaintext highlighter-rouge">RACK</code>.</p>
<h3 id="cause">Cause</h3>
<p>We quickly found out, that the node with the failing cassandra pod had udergone some maintenance work, including <a href="https://docker.io">Docker</a> upgrade, which forced a docker daemon restart. Bum. Got you. Deletion of the failing pod solved the issue (by recreating it). Tried restarting docker deamon on another node with cassandra - same issue. At least we’re able to recreate the problem ;).</p>
<h3 id="we-need-to-go-deeper-we-always-do-always">We need to go deeper (we always do… always)</h3>
<p>Further investigation revealed the following:</p>
<ul>
<li>with every docker daemon restart, the init containers were restarted as well</li>
<li>it takes ~5 seconds to complete the <code class="language-plaintext highlighter-rouge">kubectl get no -Lvm/rack | grep ${NODE_NAME} | awk '{print $6}'</code>command</li>
<li>the <code class="language-plaintext highlighter-rouge">volume mount</code> on init container side contains <strong>proper</strong> <code class="language-plaintext highlighter-rouge">rack</code> value in cassandra config file</li>
<li>the mounted file into app container (via <code class="language-plaintext highlighter-rouge">subpath</code>) contains <strong>wrong</strong> <code class="language-plaintext highlighter-rouge">rack</code> value in cassandra config file</li>
</ul>
<blockquote>
<p>this requires some comment, as we run a <a href="https://kubernetes.io/docs/setup/independent/high-availability/">Stacked High Available Kubeadm Cluster</a> with 3 masters and external LB to route traffic to the api-servers:
<img src="https://gmaslowski.com/assets/k8s-topology-stacked.jpg" alt="Stacked Topology with etcd" /> And thus, making the <code class="language-plaintext highlighter-rouge">kubectl</code> command call slower.</p>
</blockquote>
<p>And now, bummer. How can all of it be? Why the init container is being (re)started? Where do the inconsistencies between volume mount inside the init container and app container come from?</p>
<h2 id="quick-go-through-the-documentation">Quick? go through the documentation</h2>
<p>So let us review documentation, to gather more information. From <a href="https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior">https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior</a></p>
<blockquote>
<p>Because Init Containers can be restarted, retried, or re-executed, Init Container code should be idempotent. In particular, code that writes to files on EmptyDirs should be prepared for the possibility that an output file already exists.</p>
</blockquote>
<p>Ok, that doesn’t explain a lot, but at least shows a direction. Our script is not idempotent at all! There’s a time interval - a ~5 seconds one - during which the value is the <em>to-be-substituted</em> one. Remember the script?</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">cp /cassandra/cassandra-rackdc.properties /shared/cassandra-rackdc.properties &&</span>
<span class="s">sed -i.bak s/RACK/$(kubectl get no -Lvm/rack | grep ${NODE_NAME} | awk '{print $6}')/g /shared/cassandra-rackdc.properties</span></code></pre></figure>
<p>First we copy, than we substitute. Apparently, after the template file was copied, it was picked by the app container (which in our docker-deamon-restart case starts in the same time as the init container). What’s more, the init container script gets executed every time the init container runs, regardless of the fact that it had already calculated the proper <code class="language-plaintext highlighter-rouge">rack</code> value. Can the solution be that simple?</p>
<h2 id="a-really-simple-solution">A really simple solution</h2>
<p>So.. turns out that the solution, to the couple-mindfuck-hours-long issue might be really simple. Changed the script to the following:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">test -f /shared/cassandra-rackdc.properties && \</span>
<span class="s">echo 'File exists. Not overwriting.' ||</span>
<span class="s">(cp /cassandra/cassandra-rackdc.properties /shared/cassandra-rackdc.properties && \</span>
<span class="s">sed -i.bak s/RACK/$(kubectl get no -Lrvm/rack | grep ${NODE_NAME} | awk '{print $6}')/g /shared/cassandra-rackdc.properties)"</span></code></pre></figure>
<p>Retried the failing scenario and… a success! Another job done.</p>
<h2 id="mystery">Mystery?</h2>
<p>But one thing still bothers me, and I haven’t understood it so far. Why, in the hell, the cassandra app container kept failing every restart? I mean, in the init container the mount volume eventually got the file with right value. So why the app container didn’t? It’s a shared volume between those two. Is it because of the <code class="language-plaintext highlighter-rouge">subpath volume mount</code>? I did not go through k8s code to find it out. If you have an answer, just let me know!</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-initialization/">https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-initialization/</a></li>
<li><a href="https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior">https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior</a></li>
<li><a href="https://kubernetes.io/docs/setup/independent/high-availability/">https://kubernetes.io/docs/setup/independent/high-availability/</a></li>
</ul>Some time ago, I wrote about my current project, and how did we tackle the issue of passing node labels to pods in Kubernetes context. The solution worked (and still does), but there was a caveat to it, which I’d like to share in this short article.Docker shell vs. exec form2019-01-21T00:00:00+00:002019-01-21T00:00:00+00:00https://gmaslowski.com/docker-shell-vs-exec<p>I containerize… Why I containerize you might ask? Because of reasons:</p>
<ul>
<li>CI/CD - having repeatable builds and tests, while almost totally ignoring underlying OS configuration</li>
<li>automatically run software in a clustered and replicated scenarios</li>
<li>manage software in a common and straightforward way, by using container orchestrators like <a href="https://docs.docker.com/engine/swarm/">Docker Swarm</a> or <a href="https://kubernetes.io/">Kubernetes</a></li>
</ul>
<p>As it is quite common in modern world, containerizing comes with a price to pay. And the price is called: <em>abstraction</em>. I remember someone said that</p>
<blockquote>
<p>Every abstraction layer solves one problem, by introducing ten different ones.</p>
</blockquote>
<p>Or something quite close to that. Of course, that is an oversimplication (still being true), but I wouldn’t change the docker abstraction for now due to the benefits it gives us. This post, however, is not about the benefits but about a specific issue, which I was not aware of for quite a long time. It’s related to running apps, commands inside containers - and more closely - about stopping them.</p>
<h2 id="how-to-specify-run-commands-in-docker">How to specify run commands in docker?</h2>
<p>I have created a sample project for this post, which can be found at <a href="https://github.com/gmaslowski/docker-shell-vs-exec">https://github.com/gmaslowski/docker-shell-vs-exec</a>. This project has a simple Spring based app and some Docker descriptor files, for building images and setting up container with <em>docker-compose</em> (please note, that described issues should correspond to any form of starting a docker container).</p>
<p>The simple snippet project focuses on two ways of executing commands inside a docker container:</p>
<ul>
<li><em>shell form</em> - example: <code class="language-plaintext highlighter-rouge">ENTRYPOINT java XX:+ExitOnOutOfMemoryError Djava.security.egd=file:/dev/./urandom -jar /app.jar</code></li>
<li><em>exec form</em> - example: <code class="language-plaintext highlighter-rouge">ENTRYPOINT ["java", "-XX:+ExitOnOutOfMemoryError", "-Djava.security.egd=file:/dev/./urandom", "-jar", "/app.jar"]</code></li>
</ul>
<p>Both will have the same effect, at least when it comes to running containers on top of those images. If we build the application and the docker images, as specified in the <code class="language-plaintext highlighter-rouge">README</code>, like this:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">./gradlew clean build
<span class="nb">cp </span>build/libs/docker-shell-vs-exec-0.0.1-SNAPSHOT.jar docker-shell/app.jar
<span class="nb">cp </span>build/libs/docker-shell-vs-exec-0.0.1-SNAPSHOT.jar docker-exec/app.jar
docker build <span class="nt">--build-arg</span> <span class="nv">JAR_FILE</span><span class="o">=</span>build/libs/docker-shell-vs-exec-0.0.1-SNAPSHOT.jar docker-exec <span class="nt">-t</span> dsve:exec
docker build <span class="nt">--build-arg</span> <span class="nv">JAR_FILE</span><span class="o">=</span>build/libs/docker-shell-vs-exec-0.0.1-SNAPSHOT.jar docker-shell <span class="nt">-t</span> dsve:shell</code></pre></figure>
<p>We would create two images:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">dsve:shell</code> - with command executing the Spring app in <em>shell form</em></li>
<li><code class="language-plaintext highlighter-rouge">dsve:exec</code> - with command executing the Spring app in <em>exec form</em></li>
</ul>
<p>By running the following script, we would deploy and run our containers with <em>docker-compose</em>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">docker-compose <span class="nt">-f</span> deployment/docker-compose.yml up <span class="nt">-d</span></code></pre></figure>
<p>The actual docker process runtime should look similar to that:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">/c/dev_env/projects/private/docker-shell-vs-exec <span class="o">(</span>master<span class="o">)</span>
<span class="nv">$ </span>docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bd9e3f85a7b0 dsve:shell <span class="s2">"/bin/sh -c 'java -X…"</span> About a minute ago Up 56 seconds deployment_dsve-shell_1
f0ae8ce0cbc8 dsve:exec <span class="s2">"java -XX:+ExitOnOut…"</span> About a minute ago Up 56 seconds deployment_dsve-exec_1</code></pre></figure>
<p>In the <code class="language-plaintext highlighter-rouge">COMMAND</code> section one can already see both ways (<em>shell</em> and <em>exec</em>) of executing the java app inside the container. Let us have a quick look into the containers to list the processes.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">/c/dev_env/projects/private/docker-shell-vs-exec <span class="o">(</span>master<span class="o">)</span>
<span class="nv">$ </span>docker <span class="nb">exec</span> <span class="nt">-it</span> bd9 sh
/ <span class="c"># ps uxa</span>
PID USER TIME COMMAND
1 root 0:00 /bin/sh <span class="nt">-c</span> java <span class="nt">-XX</span>:+ExitOnOutOfMemoryError <span class="nt">-Djava</span>.securi
5 root 0:08 java <span class="nt">-XX</span>:+ExitOnOutOfMemoryError <span class="nt">-Djava</span>.security.egd<span class="o">=</span>file
33 root 0:00 sh
37 root 0:00 ps uxa
/c/dev_env/projects/private/docker-shell-vs-exec <span class="o">(</span>master<span class="o">)</span>
<span class="nv">$ </span>docker <span class="nb">exec</span> <span class="nt">-it</span> f0a sh
/ <span class="c"># ps uxa</span>
PID USER TIME COMMAND
1 root 0:08 java <span class="nt">-XX</span>:+ExitOnOutOfMemoryError <span class="nt">-Djava</span>.security.egd<span class="o">=</span>file
27 root 0:00 sh
32 root 0:00 ps uxa</code></pre></figure>
<p>The difference is easy to spot, the same <em>java</em> command, one started with <code class="language-plaintext highlighter-rouge">/bin/sh</code> and the other without it.</p>
<h2 id="what-docker-documentation-says-about-those-two-forms">What Docker documentation says about those two forms?</h2>
<p>Well, it says many things, and it also describes the difference between <em>shell</em> and <em>exec</em> form. In my opinion, such <em>“details”</em> are often in places which are easy to overlook, and if you’re as impatient and careless :) as I am - you probably will overlook them as well. Careful reading of the docker documentation is strongly advised - <a href="https://docs.docker.com/engine/reference/builder/#entrypoint">https://docs.docker.com/engine/reference/builder/#entrypoint</a>.</p>
<h2 id="yes-ok-but-what-are-those-forms-implying">Yes, ok, but what are those forms implying?</h2>
<h3 id="evironment-variables-substitution">Evironment variables substitution</h3>
<p>In a <em>shell form</em>, all environment variables will be evaluated as the actual provided command will be run within a shell by prepending <code class="language-plaintext highlighter-rouge">/bin/sh -c</code> before it, which can also be observed in the snippet from previous section. In the <em>exec form</em>, however, there is no shell processing involved and the executable is being called directly. So please make sure that your env vars are being substituted before or that the executable you invoke does it.</p>
<h3 id="run-entrypoint-and-cmd">RUN, ENTRYPOINT and CMD</h3>
<p>I don’t want to focus on explaining the differences in much detail.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">RUN</code> is being used when building the image,</li>
<li><code class="language-plaintext highlighter-rouge">ENTRYPOINT</code> and <code class="language-plaintext highlighter-rouge">CMD</code> serve the purpose of starting the actuall container and parameterize it when needed.</li>
</ul>
<p>In this article <a href="http://goinbigdata.com/docker-run-vs-cmd-vs-entrypoint/">http://goinbigdata.com/docker-run-vs-cmd-vs-entrypoint/</a> you can find a really great explanation of the difference and it really wouldn’t make sense to duplicate the content. Additionally, the difference has also been explained quite well in the Docker documentation in the section <a href="https://docs.docker.com/engine/reference/builder/#understand-how-cmd-and-entrypoint-interact">understand-how-cmd-and-entrypoint-interact</a>.</p>
<h3 id="gracefully-stopping-a-container">Gracefully stopping a container</h3>
<p>But here we can get into troubles. If we try to stop a container with the <em>shell form</em></p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">/c/dev_env/projects/private/docker-shell-vs-exec <span class="o">(</span>master<span class="o">)</span>
<span class="nv">$ </span>docker stop bd9</code></pre></figure>
<p>there’s a significant time, which we might notice before the container stops. That’s because we extended the <code class="language-plaintext highlighter-rouge">stop_grace_period</code> from the default 10s to 30s - mainly for the presentation purposes. But if you look closely into the logs, you won’t find any information from the Spring application notifying that the system sent a <code class="language-plaintext highlighter-rouge">SIGTERM</code> signal. That’s due to the fact that this signal was send actually to the shell, which doesn’t pass any signals to the process it started. It is described in <a href="https://docs.docker.com/engine/reference/builder/#entrypoint">Docker documentation</a>, however it is quite easy to miss that - I know I was myself not aware of those implications for a long time. And hence, after the <code class="language-plaintext highlighter-rouge">stop_grace_period</code> passes, docker daemon sends a <code class="language-plaintext highlighter-rouge">SIGKILL</code> signal causing the container to stop, forcefully.</p>
<p>On the other hand, the <em>exec form</em> stops almost immediately</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">/c/dev_env/projects/private/docker-shell-vs-exec <span class="o">(</span>master<span class="o">)</span>
<span class="nv">$ </span>docker stop f0a</code></pre></figure>
<p>and in the logs we cas spot that Sring based application handled the <code class="language-plaintext highlighter-rouge">SIGTERM</code> command allowing to close all obtained resources:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">/c/dev_env/projects/private/docker-shell-vs-exec <span class="o">(</span>master<span class="o">)</span>
<span class="nv">$ </span>docker logs f0a <span class="nt">--tail</span><span class="o">=</span>10
2019-01-21 17:44:39.089 INFO 1 <span class="nt">---</span> <span class="o">[</span> main] o.s.web.servlet.DispatcherServlet : FrameworkServlet <span class="s1">'dispatcherServlet'</span>: initialization completed <span class="k">in </span>28 ms
2019-01-21 17:44:39.163 INFO 1 <span class="nt">---</span> <span class="o">[</span> main] o.e.jetty.server.AbstractConnector : Started ServerConnector@7a3d45bd<span class="o">{</span>HTTP/1.1,[http/1.1]<span class="o">}{</span>0.0.0.0:8080<span class="o">}</span>
2019-01-21 17:44:39.164 INFO 1 <span class="nt">---</span> <span class="o">[</span> main] .s.b.c.e.j.JettyEmbeddedServletContainer : Jetty started on port<span class="o">(</span>s<span class="o">)</span> 8080 <span class="o">(</span>http/1.1<span class="o">)</span>
2019-01-21 17:44:39.174 INFO 1 <span class="nt">---</span> <span class="o">[</span> main] com.gmaslowski.dsve.SampleApplication : Started SampleApplication <span class="k">in </span>6.492 seconds <span class="o">(</span>JVM running <span class="k">for </span>7.867<span class="o">)</span>
2019-01-21 18:21:42.328 INFO 1 <span class="nt">---</span> <span class="o">[</span> Thread-11] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@2401f4c3: startup <span class="nb">date</span> <span class="o">[</span>Mon Jan 21 17:44:33 GMT 2019]<span class="p">;</span> root of context hierarchy
2019-01-21 18:21:42.341 INFO 1 <span class="nt">---</span> <span class="o">[</span> Thread-11] o.s.j.e.a.AnnotationMBeanExporter : Unregistering JMX-exposed beans on shutdown
2019-01-21 18:21:42.394 INFO 1 <span class="nt">---</span> <span class="o">[</span> Thread-11] o.e.jetty.server.AbstractConnector : Stopped ServerConnector@7a3d45bd<span class="o">{</span>HTTP/1.1,[http/1.1]<span class="o">}{</span>0.0.0.0:8080<span class="o">}</span>
2019-01-21 18:21:42.395 INFO 1 <span class="nt">---</span> <span class="o">[</span> Thread-11] org.eclipse.jetty.server.session : Stopped scavenging
2019-01-21 18:21:42.412 INFO 1 <span class="nt">---</span> <span class="o">[</span> Thread-11] o.e.j.s.h.ContextHandler.application : Destroying Spring FrameworkServlet <span class="s1">'dispatcherServlet'</span>
2019-01-21 18:21:42.425 INFO 1 <span class="nt">---</span> <span class="o">[</span> Thread-11] o.e.jetty.server.handler.ContextHandler : Stopped o.s.b.c.e.j.JettyEmbeddedWebAppContext@50d0686<span class="o">{</span>/,[file:///tmp/jetty-docbase.3963833647300409511.8080/],UNAVAILABLE<span class="o">}</span></code></pre></figure>
<p>And that’s the crucial part. In best case scenario, the problems will only cause longer waits for the container to stop. But in worst case scenario, if the application doesn’t free any used resources (like database connections, locks etc.)… yeah, you can imagine the consequences.</p>
<p>I spotted similar issues while working with k8s as the container orchestrator. And this should be fully understandable. The container in the pod tries to handle the <code class="language-plaintext highlighter-rouge">SIGTERM</code> signal, and if it doesn’t, the orchestrator will <code class="language-plaintext highlighter-rouge">SIGKILL</code> it.</p>
<h3 id="extra">Extra</h3>
<p>In my current project we use, amongst others, <a href="https://www.scala-sbt.org/">Sbt</a>. It has its own plugin for creating docker images - <a href="https://www.scala-sbt.org/sbt-native-packager/formats/docker.html">sbt-native-packager</a>, please be careful when choosing <code class="language-plaintext highlighter-rouge">Cmd</code> over <code class="language-plaintext highlighter-rouge">ExecCmd</code> :D.</p>
<p>I’m curious about, What other things are commonly overlooked while using docker? If you have an example, just comment or send an email.</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://docker.com">Docker</a></li>
<li><a href="https://stackoverflow.com/questions/42805750/dockerfile-cmd-shell-versus-exec-form">https://stackoverflow.com/questions/42805750/dockerfile-cmd-shell-versus-exec-form</a></li>
<li><a href="https://stackoverflow.com/questions/47904974/what-are-shell-form-and-exec-form?rq=1">https://stackoverflow.com/questions/47904974/what-are-shell-form-and-exec-form?rq=1</a></li>
<li><a href="http://goinbigdata.com/docker-run-vs-cmd-vs-entrypoint/">http://goinbigdata.com/docker-run-vs-cmd-vs-entrypoint/</a></li>
</ul>I containerize… Why I containerize you might ask? Because of reasons: CI/CD - having repeatable builds and tests, while almost totally ignoring underlying OS configuration automatically run software in a clustered and replicated scenarios manage software in a common and straightforward way, by using container orchestrators like Docker Swarm or KubernetesMy Domoticz deployment setup2019-01-03T00:00:00+00:002019-01-03T00:00:00+00:00https://gmaslowski.com/My-Domoticz-deployment-setup<blockquote>
<p>Disclaimer. I am using home automation software and hardware at home. But that’s not the only way for me to control my appliances. I always make sure that in the case of any home automation failure I am still able to manually control them.</p>
</blockquote>
<p>Everyone interested in some home automation projects surely stumbled upon <a href="https://www.domoticz.com">Domoticz</a> at some point. In general there are other solutions as well, but I won’t describe them here.</p>
<p>I’m using <em>Domoticz</em> more than two years now and I’d like to share in this post <em>how</em> do I currently manage hardware and software in the scope of my simple home automation. I’ll also try to explain <em>why</em> my configuration is setup how it is. But before going into solutions, let me explain my requirements and reasons for them to exist.</p>
<h2 id="what-are-my-requirements">What are my requirements?</h2>
<ul>
<li>
<p><em><strong>deployment automation</strong></em> - as a developer, I like to have my deployments automated, so that it really requires <em>from-minimal-to-no-effort</em> from my side to deploy newer versions of software that I’m running. SSH into a remote Linux server to deploy newer versions is too time consuming and basically not an option.</p>
</li>
<li>
<p><em><strong>RPI hardware</strong></em> - in recent years I gathered 5 Raspberry Pi’s (B+; 2B; 2 x 3B; Zero) which I was (and sometimes still am) using for various purposes like home media center, hackathons, servers, routers, learning etc. They’re placed together in rack case along with a NFS server in the attic. One could argue that I could place Domoticz on the server, but it already runs other software.
<img src="https://gmaslowski.com/assets/rpis-rack.jpg" alt="RasperryPi Rack" /></p>
</li>
<li>
<p><em><strong>backup and high availability</strong></em> - during my 2 years with Domoticz I tackled various problems. There were times, when I needed to configure fresh Domoticz (why? later on). I would like to minimize the risk of that. There were times, as well, when because of harwarde failure I was not able to control appliances at home. It’s not a real issue but an inconvenience, so minimizing that risk is a goal as well. Furthermore, I want to be able to just remove one or two RPIs from time to time, because of other activities which I mentioned earlier.</p>
</li>
</ul>
<h2 id="deployment-automation">Deployment automation</h2>
<p>There were times, when I was deploying (or I should rather say managing deployment) manually. Since the beginning, Domoticz at my home was running on a RPI, which made the process look more or less like this:</p>
<ul>
<li>ssh to the rpi</li>
<li>backup the database (Domoticz uses <a href="https://www.sqlite.org/index.html">SQLite</a>)</li>
<li>run <code class="language-plaintext highlighter-rouge">./update</code> scripts in domoticz package</li>
<li>tackle issues (if needed)</li>
<li>check if things are working</li>
</ul>
<p>This process was maybe not that hard, however it made me remember couple of things like IP addresses, scripts to run and backups to create. In order to automate this process I decided for using <a href="https://docker.com">Docker</a> images and <a href="https://docs.docker.com/engine/swarm/">Docker Swarm</a> for easing up the process.</p>
<h3 id="docker-abstraction">Docker abstraction</h3>
<p>I started with searching for an already existing image for <em>arm</em> architectures. I cannot remember if I found something useful, but I had some criterias like for example cyclic/periodic updates. I’m sure I haven’t found any image which had this in place already. So I decided to use my doubtful skills to automatically create and publish a docker image for every newest published Domoticz version. With some help of <a href="https://travis-ci.org/">Travis CI</a> I setup a simple <a href="https://github.com">GitHub</a> repository - <a href="https://github.com/gmaslowski/rpi-domoticz">gmaslowski/rpi-domoticz</a> (<em>post about building docker images with Travis CI can be found <a href="https://gmaslowski.com/automating-your-docker-images-build-and-deploy-with-travis-ci/">here</a></em>). The artifact which is produced from it (<em>always daily with newest Domoticz beta version, which is basically the current development and I think the only reasonable version to use</em>) is published to <a href="https://hub.docker.com">DockerHub</a> registry - <a href="https://hub.docker.com/r/gmaslowski/rpi-domoticz/">gmaslowski/rpi-domoticz</a>. That makes it really convenient to use. By default, with little help of Docker manifests, the image targets <em>arm</em> and <em>amd64</em> architectures. The image is the core prerequisite to use dockerized version of Domoticz, which can be run manually with <code class="language-plaintext highlighter-rouge">docker run</code>, <code class="language-plaintext highlighter-rouge">docker-compose</code> or deploy into Docker Swarm.</p>
<h3 id="container-orchestration">Container orchestration</h3>
<p>Having dockerized Domoticz, now came the time to deploy it. At home I run <em>Docker Swarm</em>. For some it might be an overhead, for some it might be weird to have a cluster at home… But I find it quite convenient to deploy software at home for purposes which I don’t want to have in the cloud. For example, file storage like <a href="https://www.dropbox.com">Dropbox</a> is too expensive (at least for me) and I try to avoid as much as possible storing private data, images, videos on the net. This makes me running <a href="https://nextcloud.com">Nextcloud</a>. Another good reason for running Swarm is that I work in IT. I like to, and I have to be somehow up to date with technology. At work in current project we switched from Docker Swarm to <a href="https://kubernetes.io/">Kubernetes</a> already, so I find it convenient to run Docker Swarm at home.</p>
<p>Having a Docker Swarm cluster in place, and with some help of <a href="https://about.gitlab.com/">GitLab</a> the configuration of the deployment looks like this:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">version</span><span class="pi">:</span> <span class="s1">'</span><span class="s">3.7'</span>
<span class="na">services</span><span class="pi">:</span>
<span class="na">domoticz</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">gmaslowski/rpi-domoticz:4.1030</span>
<span class="na">command</span><span class="pi">:</span> <span class="pi">[</span>
<span class="s2">"</span><span class="s">/domoticz/domoticz"</span><span class="pi">,</span>
<span class="s2">"</span><span class="s">-www"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">8080"</span><span class="pi">,</span>
<span class="s2">"</span><span class="s">-dbase"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/domoticzdb/domoticz.db"</span>
<span class="pi">]</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">/etc/timezone:/etc/timezone:ro</span>
<span class="pi">-</span> <span class="s">/etc/localtime:/etc/localtime:ro</span>
<span class="na">deploy</span><span class="pi">:</span>
<span class="na">replicas</span><span class="pi">:</span> <span class="s">1</span></code></pre></figure>
<p>With <code class="language-plaintext highlighter-rouge">.gitlab-ci.yml</code>:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">stages</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">software</span>
<span class="s">.job_template</span><span class="pi">:</span> <span class="nl">&home-job</span>
<span class="na">tags</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">deployer</span>
<span class="na">only</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">master</span>
<span class="na">when</span><span class="pi">:</span> <span class="s">always</span>
<span class="na">software</span><span class="pi">:</span>
<span class="s"><<</span><span class="pi">:</span> <span class="nv">*home-job</span>
<span class="na">stage</span><span class="pi">:</span> <span class="s">software</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">docker:18.09</span>
<span class="na">script</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">docker stack deploy -c software/automation/automation.yml home</span></code></pre></figure>
<p>Here it can be seen that, updating my Domoticz deployment boils down to changing this line <code class="language-plaintext highlighter-rouge">image: gmaslowski/rpi-domoticz:4.1030</code> to use the chosen Docker image version and push the change to master. GitLabCi and Gitlab Runner will make sure that new version gets deployed.</p>
<p>And that covers the topic of deployment automation.</p>
<h2 id="hardware-placement-and-restrictions">Hardware placement and restrictions</h2>
<p>As stated previously I have some RPIs which I’d like to use to run Domoticz. Basically what do I need? RPIs I already have, so the thing missing is making sure that the configuration (docker installed, user management etc.) is the same (or really similar) on every RPI and other hardware in the cluster. Additionally I’d like to avoid as much as possible any manual actions which require from me remembering any configurations. A fairly good example of that would be my router and AP configurations, when I really messed up a lot when after a factory upgrade my AP needed replacement. Because of no configuration stored anywhere but the AP I needed to recreate many settings (WiFi, accesses, QoS) from my head. A good point would be to automate that configuration and make applying it idempotent - I’m working on that and I think it can be a good option for another post. Let’s get back to the main topic.</p>
<h3 id="attic">Attic</h3>
<p>The hardware runs in the attic. I have the tendency to keep my travelling ;) there limited as much as possible. I wan’t to be able to (to some extent) manage my hardware remotely. Of course, it’s not possible to fix a physical error remotely, but removing the need to have a physical access to hardware is the goal.</p>
<h3 id="ansible-configuration">Ansible configuration</h3>
<blockquote>
<p>Before any more description, I’d like to made I thing clear, even for myself. For me it’s not about using Ansible - for me it’s about having <em>infrastructure/configuration-as-a-code</em> approach.</p>
</blockquote>
<p>My tool of choice for the time beeing is <a href="https://ansible.com">Ansible</a>. Why? Again, I’m in IT :D and:</p>
<ul>
<li>Ansible is one of the standards used in industry</li>
<li>Ansible doesn’t require any process running on the RPIs - everything is done over SSH</li>
<li>it allows me to setup my RPIs and servers to a required stage really fast</li>
<li>and of course automation and learning are one of my key goals</li>
</ul>
<p>I will not put all of my ansible configurations here, because the number of them is growing with every day - so much fun :). I will focus on what I configure with it:</p>
<ul>
<li>users, groups and accesses - I make sure that every piece of hardware I run at home has some preconfigured user with SSH keys so I don’t need to care about remembering passwords (with the exception of a specific user for normal keyboard/monitor access in case of weird troubleshooting - my hardware are not VMs)</li>
<li>basic software i.e.: vim, nettools and docker. Did you spot the problem already? I’ll elaborate more on that in the <em>what’s missing</em> section.</li>
<li>NFS configuration - more on that in <em>HA</em> section</li>
<li>Timezones - to be consistent when going through logs</li>
<li>Hostnames - to have consistent naming across my home LAN</li>
</ul>
<h2 id="high-availability">High Availability</h2>
<p>It happened to me couple of times that my home automation software was not fully working. The reasons were various. Let me enumerate them:</p>
<ul>
<li>one of the AP gone - most of the devices I control are connected to WiFi and as soon as the AP was gone theye were unable to connect to it. I cannot replace an AP in a second, but I can make sure that the process of it is as painless as possible. It’s not just replacing an AP device, it is its configuration which makes it take longer.</li>
<li>RPI freezes or doesn’t start after power outage - it’s not that common, but happened 2-3 times during the last year.</li>
<li>SD card failures - causing RPI not to start, so the issue is similar - happened to me once</li>
<li>corrupted Domoticz SQLite database file - happened to me once</li>
</ul>
<h3 id="rpis">RPIs</h3>
<p>In my experience, relying on one RPI for running Domoticz is not the way to go. Not only because of occasional failures, but also sometimes I’d like to detach a RPI from the cluster and use it for something else. So I already have 4 of them being able to run (amongst others) Domoticz. The only thing I need to make sure is that Docker Swarm places <code class="language-plaintext highlighter-rouge">domoticz service</code> only on them. That’s easy as applying a label onto each RPI Docker node i.e.: <code class="language-plaintext highlighter-rouge">rpi=true</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">greg@pc001:~|⇒ docker node <span class="nb">ls</span> <span class="nt">-q</span> | xargs docker node inspect <span class="nt">-f</span> <span class="s1">' []: '</span>
ljvzdp36900i75poqgxww5bic <span class="o">[</span>rpi001]: map[rpi:true]
qtrmc85j1dvw9w6jh7d1mlyn1 <span class="o">[</span>rpi002]: map[rpi:true]
quu4g00qb669bxe3ols5jtuo5 <span class="o">[</span>rpi003]: map[rpi:true]
al2orpsz28n0wnky2xxaa4tps <span class="o">[</span>rpi004]: map[rpi:true]</code></pre></figure>
<p>And then configure the placement in the <code class="language-plaintext highlighter-rouge">domoticz service</code> docker descriptor file:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">domoticz</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">gmaslowski/rpi-domoticz:4.1030</span>
<span class="na">command</span><span class="pi">:</span> <span class="pi">[</span>
<span class="s2">"</span><span class="s">/domoticz/domoticz"</span><span class="pi">,</span>
<span class="s2">"</span><span class="s">-www"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">8080"</span><span class="pi">,</span>
<span class="s2">"</span><span class="s">-dbase"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/domoticzdb/domoticz.db"</span>
<span class="pi">]</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">/etc/timezone:/etc/timezone:ro</span>
<span class="pi">-</span> <span class="s">/etc/localtime:/etc/localtime:ro</span>
<span class="na">deploy</span><span class="pi">:</span>
<span class="na">replicas</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">placement</span><span class="pi">:</span>
<span class="na">constraints</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">node.labels.rpi == </span><span class="no">true</span></code></pre></figure>
<h3 id="storage">Storage</h3>
<p>Remember that Ansible configured NFS? That’s good :). Docker has the option to mount a volume from a defined NFS storage. And that solves my storage availability problem. I have a server with mirrored disks for storing my private data. Additionally for the purpose of running Domoticz I setup a NFS on it (with Ansible of course). So now I don’t need to worry on which of the RPIs Domoticz starts - it will always use the same storage hence the same database. No need to synchronize or to copy data. How cool is that?</p>
<p>Here is the full snippet of my Domoticz deployment descriptor with NFS attached volume:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">version</span><span class="pi">:</span> <span class="s1">'</span><span class="s">3.7'</span>
<span class="na">services</span><span class="pi">:</span>
<span class="na">domoticz</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">gmaslowski/rpi-domoticz:4.1030</span>
<span class="na">command</span><span class="pi">:</span> <span class="pi">[</span>
<span class="s2">"</span><span class="s">/domoticz/domoticz"</span><span class="pi">,</span>
<span class="s2">"</span><span class="s">-www"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">8080"</span><span class="pi">,</span>
<span class="s2">"</span><span class="s">-dbase"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/domoticzdb/domoticz.db"</span>
<span class="pi">]</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">nfsdb:/domoticzdb/</span>
<span class="pi">-</span> <span class="s">nfsscripts:/domoticz/scripts/</span>
<span class="pi">-</span> <span class="s">/etc/timezone:/etc/timezone:ro</span>
<span class="pi">-</span> <span class="s">/etc/localtime:/etc/localtime:ro</span>
<span class="na">deploy</span><span class="pi">:</span>
<span class="na">replicas</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">placement</span><span class="pi">:</span>
<span class="na">constraints</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">node.labels.rpi == </span><span class="no">true</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="na">nfsdb</span><span class="pi">:</span>
<span class="na">driver</span><span class="pi">:</span> <span class="s">local</span>
<span class="na">driver_opts</span><span class="pi">:</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">nfs</span>
<span class="na">o</span><span class="pi">:</span> <span class="s">addr=<nfs.ip>,nolock,rw</span>
<span class="na">device</span><span class="pi">:</span> <span class="s2">"</span><span class="s">:/path/to/domoticzdb"</span>
<span class="na">nfsscripts</span><span class="pi">:</span>
<span class="na">driver</span><span class="pi">:</span> <span class="s">local</span>
<span class="na">driver_opts</span><span class="pi">:</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">nfs</span>
<span class="na">o</span><span class="pi">:</span> <span class="s">addr=<nfs.ip>,nolock,rw</span>
<span class="na">device</span><span class="pi">:</span> <span class="s2">"</span><span class="s">:/path/to/domoticz/scripts"</span></code></pre></figure>
<p>Besides I can remove (for other purposes) up too 3 RPIs from the cluster and still be sure that my home automation works. Yay!</p>
<h2 id="recap">Recap</h2>
<p>Let’s try to summarize this post in one sentence.</p>
<blockquote>
<blockquote>
<p>I automatically deploy Domoticz in an easily recoverable RPI cluster with external RAID-1 storage.</p>
</blockquote>
</blockquote>
<p>Wow. That’s a really short TL;DR version. To visualize a little bit of what I was describing please have a look at this picture:
<img src="https://gmaslowski.com/assets/rpi-domoticz.jpg" alt="Deployment Setup" /></p>
<h2 id="whats-still-missing">What’s still missing?</h2>
<ul>
<li><strong>How to tackle NFS failures?</strong> - So my NFS runs also in the attic on a server. The backup is in place because of the RAID-1 configured disks. But if the server fails, I’ll not be able to use Domoticz. So I’m wondering how to tackle this problem, or do I really need to tackle it? I can live with couple of days without home automation to replace the broken part and to recreate the server setup. Another option would be to use <a href="https://github.com/ClusterHQ/flocker">Flocker</a>.</li>
<li><strong>How to bind USB devices to Swarm Services?</strong> - I have couple of devices on 433MHz and 886Mhz radio. But in order to make it work I have USB connected transcievers. And that’s the problems. Actually two of them which I still haven’t work out. One of the problems is that the transcievers are connected physically to a RPI and Domoticz uses the USB. So I’d need to figure something out to be able to deploy Domoticz on other RPI and still be able to use that. Or just live with the limited functionality whenever this RPI is down for some reason. I’m started to be paranoid :D with all the HA idea :P. The second problem is that for a Swarm Service it is still not possible to use USB. There are workarounds though, more on that -> <a href="https://github.com/docker/swarmkit/issues/1244">https://github.com/docker/swarmkit/issues/1244</a></li>
<li><strong>Database backup</strong> - one could saym that I already have a backup with the mirrored disks. Not really. If a corruption of the database is considered than I basically still need to reconfigure the DB. However, not all is lost. Currently I’m working on a Docker CRON solution to backup the Domoticz database every day or so, and with Docker that should be quite easy.</li>
<li><strong>Egg and Chicken -> Docker vs Ansible</strong> - I run into this problem with still not clear solution. Let me explain. I run my Ansible idempotent tasks with GitLab CI Docker Runner. So obviously for running Ansible scripts I need Docker runtime :). So it doesn’t make sense to install Docker with Ansible - at least on the node on which the Runner runs. There’s still little manual action needed as it comes to the whole configuration. But this one I think I’ll leave. But that doesn’t imply that I cannot install docker on other devices from Ansible, so partially it still makes sense. There are solutions like <a href="https://www.ansible.com/products/tower">Ansible Tower</a> and I would need some time to grasp this as well.</li>
<li><strong>Static IP addresses maintained by Ansible</strong> - I already mentioned the issue I had with replacing the AP. My Router and AP both run <a href="https://dd-wrt.com/">DD-WRT</a> software/firmware. I recently found out that with <code class="language-plaintext highlighter-rouge">nvram</code> command tool it is possible to change any DD-WRT device settings - the same as you have with GUI. So a obvious next step for me would be to place the configuration with static IP leases, WiFi settings etc.. into the code. In this way I could sleep better having in mind that even a Router or AP failure would be easily resolvable. I could of course us the option of configuration backup of DD-WRT, but those to my knowledge will work only with the same DD-WRT version on the same hardware. And with my Ansible approach I would at least have that in code - which is easily understood in contrast to binary files.</li>
<li><strong>Docker Swarm masters</strong> - Not really sure if that’s an issue, but at least it is something I need to keep in mind to verify. I have 5 devices in the Docker Swarm cluster. I wanted to make sure that if one of the devices fail I can still manage the cluster, so I needed to make 3 of them masters. Why 3 and not 2? Well, Docker Swarm requires majority of masters to be present [(n+1)/2], in order to work. So 2 of my RPIs became masters, I need to remember never to detach those two :).</li>
</ul>
<p>I’m curious how people in generally deal with Domoticz, or any Home Automation solutions deployment, so that they keep working constantly. Do people care about HA, or just tackle failures whenever they appear?</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://domoticz.com">Domoticz</a></li>
<li><a href="https://docker.com">Docker Swarm</a></li>
<li><a href="https://ansible.com">Ansible</a></li>
<li><a href="https://github.com/docker/swarmkit/issues/1244">https://github.com/docker/swarmkit/issues/1244</a></li>
</ul>Disclaimer. I am using home automation software and hardware at home. But that’s not the only way for me to control my appliances. I always make sure that in the case of any home automation failure I am still able to manually control them.Creating RAID-1 on Linux without data loss2018-12-14T00:00:00+00:002018-12-14T00:00:00+00:00https://gmaslowski.com/creating-raid-1-on-linux-without-data-loss<p>At home I run a server with <a href="https://nextcloud.com/">Nextcloud</a> for keeping my family pictures, movies, documents and important stuff. Not only it serves as private storage, but also it’s important to me that I don’t loose the gathered over time memories. The whole server was running already on RAID-1 mirrored disks, but some time ago, one of them failed - and I needed a replacement.</p>
<p>So I bought a new non-SSD drive, with 5400rpm, fully enough for my needs: storage access, and the price.I got this 500GB drive for ~40$.</p>
<h2 id="initial-setup">Initial setup</h2>
<p>The server configuration before the disk failure, was rather easy. Two disks, one old 650GB WD connected externally to USB and a 750GB Seagate connected to SATA. There was a Linux setup RAID-1 on all 3 partitions: OS, swap, data. My oh my, nothing fancy, but gets the job done. Unfortunately, the SATA drive was the one to die :(, so I was left with a USB drive only. The changes I needed to do to keep the system running after the failure was to make sure that the server boots from USB drive. Fair enough.</p>
<h2 id="how-i-approached-raid">How I approached RAID</h2>
<p>So I bought a new drive. My WD holded something around 200GB of data, since last 6 years. Quick calculation and I got stingy buying a 500GB drive as replacement. I salvaged also a 160GB drive from some laptop I don’t use anymore. The reason for that was, that I wanted to get rid of mirroring the system, swap space and GRUB. I remember that maintaining that configuration and remembering that with every bootloader update, both disks need MBR changes was something I wanted to get rid of.</p>
<h3 id="os">OS</h3>
<p>And this is how I ended up installing newest Ubuntu on to the 160GB drive. Moved all of my settings from the old drive and started to feel happy about the end of crisis. Needed to install and configure Docker Swarm cluster again, but since all of my deployed apps (Nexcloud included) are dockerized and idempotent (infrastructure as code) I got everything running smooothly in (managable) notime. So what I got was 3 disks:</p>
<ul>
<li>160GB - OS - can fail and is easy to recover from in my setup</li>
<li>650GB - USB connected WD</li>
<li>500GB - SATA connected drive</li>
</ul>
<p>What’s next? Looks like just recovering RAID with <code class="language-plaintext highlighter-rouge">mdadm --add</code>.</p>
<h3 id="first-surprises">First surprise(s)</h3>
<p>And here it comes. New operating system means, there’s no RAID-1 configured. Oh ho, ok, I need to create RAID-1 again. Thought quickly and said that’s fine. Let’s do it. Unfortunately, it wouldn’t make sense to create RAID-1 from the existing disk, since the <code class="language-plaintext highlighter-rouge">/dev/sdb5</code> partition (the one with the data to be mirrored) was already bigger than the full size of the new disk. Damn - spend the 10$ more next time. I already saw the ~200GB of data I’d need to copy during new RAID setup. Damn it.</p>
<p>So first of all, create a new partition on the new disk with fdisk:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$:</span> fdisk /dev/sdc
n <span class="c"># - for creating new primary partition, go with defaults</span>
t <span class="c"># - change the type to Linux raid autodetect, fd</span>
w <span class="c"># - write changes</span></code></pre></figure>
<p>Now, since the partition is created the hard work starts and that’s just because I made a simple mistake. I started copying the files to this drive instead of just creating RAID-1 on top of it. It was the most stupid and not obvious mistake I could do. I mounted the new partition, and copied all of the 200GB data onto it. Than with fdisk I created partition on WD drive matching the new one with:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$:</span> fdisk /dev/sdb
d <span class="c"># - for partition deletion; deleted all of them</span>
w <span class="c"># - write changes</span>
<span class="nv">$:</span> fdisk <span class="nt">-d</span> /dev/sdc | fdisk /dev/sdb</code></pre></figure>
<h3 id="creating-raid-without-data-loss">Creating RAID without data loss</h3>
<p>I think this was the time I realized I’ve made my work doubled. And there was nothing left for me to do, but just to create the RAID volume/device and copy the data once again onto the new partition on old drive, since that was the one right now without the data, so I could create RAID on top of it and choose the filesystem type.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$:</span> mdadm <span class="nt">--create</span> /dev/md0 <span class="nt">--level</span> 1 <span class="nt">--raid-devices</span><span class="o">=</span>2 missing /dev/sdc1
<span class="nv">$:</span> mkfs.ext4 /dev/md0</code></pre></figure>
<ul>
<li>missing - means that we’ll add the second drive (which for one is the one with data) later on</li>
</ul>
<p>Now… mounting and copying the ~200GB back, but this time onto (not yet fully setup) RAID. This was the time I was a little bit sad, that my drives aren’t SSD :D. After the data was copied, adding the partition to RAID device was easy as:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$:</span> mdadm <span class="nt">--add</span> /dev/md0 /dev/sdb1</code></pre></figure>
<p>In order to check the RAID status in Ubuntu, just invoke:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$:</span> mdadm <span class="nt">-D</span> /dev/md0</code></pre></figure>
<h2 id="self-takeaways">Self takeaways</h2>
<p>Probably I should have executed the steps with more thinking attached to it :). If I’d remember that RAID recreation without data loss requires manual data copying, I’d just buy and 1TB drive (~50$) and leave the system and swap partitions untouched. Not really sure of that, but I’d think twice, because my private time is limited. The second, and huge mistake was actually copying the data without creating the RAID in the first place. Not really sure what I was thinking. Just forgot about that, I think I was stuck with the thinking that I need to use the “old drive” as the RAID basis, because it was in RAID in the old configuration. Echhh…</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://www.digitalocean.com/community/tutorials/how-to-manage-raid-arrays-with-mdadm-on-ubuntu-16-04">How To Manage RAID Arrays with mdadm on Ubuntu 16.04</a></li>
<li><a href="https://nextcloud.com/">Nextcloud</a></li>
<li><a href="https://debian-administration.org/article/238/Migrating_To_RAID1_Mirror_on_Sarge">Migrating To RAID1 Mirror on Sarge</a></li>
</ul>At home I run a server with Nextcloud for keeping my family pictures, movies, documents and important stuff. Not only it serves as private storage, but also it’s important to me that I don’t loose the gathered over time memories. The whole server was running already on RAID-1 mirrored disks, but some time ago, one of them failed - and I needed a replacement.Passing node labels to pods in Kubernetes2018-09-20T00:00:00+00:002018-09-20T00:00:00+00:00https://gmaslowski.com/kubernetes-node-label-to-pod<p>In my current project we faced the challenge of deploying <a href="http://cassandra.apache.org/">Cassandra</a> cluster in <a href="https://kubernetes.io/">Kubernetes</a>. We don’t use any of the cloud providers for hosting Cassandra nor Kubernetes. Since the beginning, there were almost
no problem with spinning a Cassandra cluster. Recently, however, because of our hardware setup, we faced the issue of making Cassandra rack aware on Kubernetes cluster.</p>
<h2 id="infrastructure">Infrastructure</h2>
<p>The setup is(n’t) straightforward. We have 6 VMs for Cassandra, which are grouped into 3 racks - 2 VMs per rack. All of the VMs for Cassandra are labeled in k8s, so that we guarantee with affinity rules, that
only Cassandra instances will be deployed there. Additionally the VMs are labeled with rack information: <code class="language-plaintext highlighter-rouge">rack-1</code>, <code class="language-plaintext highlighter-rouge">rack-2</code>, <code class="language-plaintext highlighter-rouge">rack-3</code>. This is precisely the information I needed to push down through Kubernetes
to Cassandra itself.</p>
<h2 id="kubernetes-and-downwardapi">Kubernetes and DownwardAPI</h2>
<p>After some quick investigation I found the <a href="https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#the-downward-api">Kubernetes DownwardAPI</a>. Without too much of a
view I was sure that I can use any label specified on node and put it into the container environment variable:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">env</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">VM_LABEL</span>
<span class="na">valueFrom</span><span class="pi">:</span>
<span class="na">fieldRef</span><span class="pi">:</span>
<span class="na">fieldPath</span><span class="pi">:</span> <span class="s">metadata.label[label/vm]</span></code></pre></figure>
<p>Someone should have seen my face when I found out that you can only reference some restricted metadata with the DownwardAPI, and node
labels isn’t one of them. There are even couple of issues and feature requests opened on how to pass through a node label into the pod:</p>
<ul>
<li><a href="https://stackoverflow.com/questions/36690446/inject-node-labels-into-kubernetes-pod">https://stackoverflow.com/questions/36690446/inject-node-labels-into-kubernetes-pod</a></li>
<li><a href="https://github.com/kubernetes/kubernetes/issues/62078">https://github.com/kubernetes/kubernetes/issues/62078</a></li>
</ul>
<h2 id="tryout-solution">Tryout solution</h2>
<p>So, ok, it’s not that easy but it’s not something that cannot be done right. In a moment I thought about using an <code class="language-plaintext highlighter-rouge">initContainer</code> to get the node label on which is the pod scheduled, and then add the label on to the pod.
Shouldn’t be that hard, right:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">apps/v1beta1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Deployment</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">node2pod</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">replicas</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">template</span><span class="pi">:</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">node2pod</span>
<span class="na">app</span><span class="pi">:</span> <span class="s">node2pod</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">initContainers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">node2pod</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">k8s-cluster-image</span> <span class="c1"># that's tricky; for deployment via Gitlab Runner we created an image for controlling our k8s cluster from </span>
<span class="c1"># outside; exactly this image is used here</span>
<span class="na">command</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">sh"</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">-c"</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">kubectl</span><span class="nv"> </span><span class="s">-n</span><span class="nv"> </span><span class="s">${NAMESPACE}</span><span class="nv"> </span><span class="s">label</span><span class="nv"> </span><span class="s">pods</span><span class="nv"> </span><span class="s">${POD_NAME}</span><span class="nv"> </span><span class="s">vm/rack=$(kubectl</span><span class="nv"> </span><span class="s">get</span><span class="nv"> </span><span class="s">no</span><span class="nv"> </span><span class="s">-Lvm/rack</span><span class="nv"> </span><span class="s">|</span><span class="nv"> </span><span class="s">grep</span><span class="nv"> </span><span class="s">${NODE_NAME}</span><span class="nv"> </span><span class="s">|</span><span class="nv"> </span><span class="s">awk</span><span class="nv"> </span><span class="s">'{print</span><span class="nv"> </span><span class="s">$6}')"</span>
<span class="na">env</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">NODE_NAME</span>
<span class="na">valueFrom</span><span class="pi">:</span>
<span class="na">fieldRef</span><span class="pi">:</span>
<span class="na">fieldPath</span><span class="pi">:</span> <span class="s">spec.nodeName</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">NAMESPACE</span>
<span class="na">valueFrom</span><span class="pi">:</span>
<span class="na">fieldRef</span><span class="pi">:</span>
<span class="na">fieldPath</span><span class="pi">:</span> <span class="s">metadata.namespace</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">POD_NAME</span>
<span class="na">valueFrom</span><span class="pi">:</span>
<span class="na">fieldRef</span><span class="pi">:</span>
<span class="na">fieldPath</span><span class="pi">:</span> <span class="s">metadata.name</span>
<span class="na">containers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">nginx</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">nginx</span> <span class="c1"># for the purpose of the presenting the solution the image doesn't matter</span>
<span class="na">env</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">RACK</span>
<span class="na">valueFrom</span><span class="pi">:</span>
<span class="na">fieldRef</span><span class="pi">:</span>
<span class="na">fieldPath</span><span class="pi">:</span> <span class="s">metadata.labels['vm/rack']</span></code></pre></figure>
<p>Well. Almost. Quite. But not what I’d expect. Though the pod was labeled:</p>
<figure class="highlight"><pre><code class="language-console" data-lang="console"><span class="gp">kubernetes@node1:~#</span><span class="w"> </span>kubectl describe pod node2pod-557fb46b67-6qrgf
<span class="go">Namespace: default
</span><span class="gp">Node: node7/<IP></span><span class="w">
</span><span class="go">Start Time: Wed, 19 Sep 2018 10:20:33 +0200
Labels: app=node2pod
name=node2pod
pod-template-hash=1139602623
vm/rack=rack-2 </span></code></pre></figure>
<p>the environment variable was empty inside the container. That’s due to the fact, that the resolution of env vars with DownwardAPI happens during pods scheduling and not execution. Dohhh. So another brainer. But fortunately with little help of a teammate of mine I finally made
it with the following approach</p>
<h2 id="solution">Solution</h2>
<p>Just as a reminder, the original idea was to pass a node label to container with Cassandra inside, so it can use that information
to configure Cassandra node with rack information. It’s also important to note that Cassandra is configured with multiple files,
and one of them is <code class="language-plaintext highlighter-rouge">cassandra-rackdc.properties</code> which is the place where the rack information should finally be stored. The solution is not that simple, so a picture describes it best, but in steps:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">configMap</code> is used to store generic <code class="language-plaintext highlighter-rouge">cassandra-rackdc.properties</code> which should be updated during deployment</li>
<li><code class="language-plaintext highlighter-rouge">initContainer</code> takes this (immutable) <code class="language-plaintext highlighter-rouge">configMap</code> and copies it onto a <a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-initialization/#create-a-pod-that-has-an-init-container">shared volume</a>, which is shared with the Cassandra container</li>
<li>container mounts the shared volume and uses <code class="language-plaintext highlighter-rouge">subPath</code> for mounting just one of the files; we don’t want to overwrite other files</li>
</ul>
<h3 id="drawing">Drawing</h3>
<p><img src="https://gmaslowski.com/assets/k8s-node2pod.png" alt="Solution" /></p>
<h3 id="the-full-blown-yaml">The full blown yaml</h3>
<p>For the purpose of readability, much configuration was removed</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="nn">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMap</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">cassandra-rackdc</span>
<span class="na">data</span><span class="pi">:</span>
<span class="s">cassandra-rackdc.properties</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">dc= datacenter</span>
<span class="s">rack= RACK</span>
<span class="s">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">apps/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">StatefulSet</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">cassandra</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">app</span><span class="pi">:</span> <span class="s">cassandra</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">podManagementPolicy</span><span class="pi">:</span> <span class="s">OrderedReady</span>
<span class="na">replicas</span><span class="pi">:</span> <span class="m">6</span>
<span class="na">selector</span><span class="pi">:</span>
<span class="na">matchLabels</span><span class="pi">:</span>
<span class="na">app</span><span class="pi">:</span> <span class="s">cassandra</span>
<span class="na">template</span><span class="pi">:</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">app</span><span class="pi">:</span> <span class="s">cassandra</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">cassandra</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">initContainers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">cassandra-rack-awareness</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">k8s-cluster-image</span>
<span class="na">command</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">sh"</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">-c"</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">cp</span><span class="nv"> </span><span class="s">/cassandra/cassandra-rackdc.properties</span><span class="nv"> </span><span class="s">/shared/cassandra-rackdc.properties</span><span class="nv"> </span><span class="s">&&</span><span class="nv"> </span><span class="se">\
</span> <span class="s">sed</span><span class="nv"> </span><span class="s">-i.bak</span><span class="nv"> </span><span class="s">s/RACK/$(kubectl</span><span class="nv"> </span><span class="s">get</span><span class="nv"> </span><span class="s">no</span><span class="nv"> </span><span class="s">-Lvm/rack</span><span class="nv"> </span><span class="s">|</span><span class="nv"> </span><span class="s">grep</span><span class="nv"> </span><span class="s">${NODE_NAME}</span><span class="nv"> </span><span class="s">|</span><span class="nv"> </span><span class="s">awk</span><span class="nv"> </span><span class="s">'{print</span><span class="nv"> </span><span class="s">$6}')/g</span><span class="nv"> </span><span class="s">/shared/cassandra-rackdc.properties"</span>
<span class="na">env</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">NODE_NAME</span>
<span class="na">valueFrom</span><span class="pi">:</span>
<span class="na">fieldRef</span><span class="pi">:</span>
<span class="na">fieldPath</span><span class="pi">:</span> <span class="s">spec.nodeName</span>
<span class="na">volumeMounts</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">cassandra-rackdc</span>
<span class="na">mountPath</span><span class="pi">:</span> <span class="s">/cassandra/</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">shared</span>
<span class="na">mountPath</span><span class="pi">:</span> <span class="s">/shared/</span>
<span class="na">containers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">cassandra</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">own-cassandra-image</span>
<span class="na">env</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">CASSANDRA_SEEDS</span>
<span class="na">value</span><span class="pi">:</span> <span class="s2">"</span><span class="s">cassandra-0.cassandra.default.svc.cluster.local,cassandra-1.cassandra.default.svc.cluster.local,cassandra-2.cassandra.default.svc.cluster.local"</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">CASSANDRA_ENDPOINT_SNITCH</span>
<span class="na">value</span><span class="pi">:</span> <span class="s2">"</span><span class="s">GossipingPropertyFileSnitch"</span>
<span class="na">volumeMounts</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">shared</span>
<span class="na">mountPath</span><span class="pi">:</span> <span class="s">/etc/cassandra/cassandra-rackdc.properties</span>
<span class="na">subPath</span><span class="pi">:</span> <span class="s">cassandra-rackdc.properties</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">cassandra-rackdc</span>
<span class="na">configMap</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">cassandra-rackdc</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">shared</span>
<span class="na">emptyDir</span><span class="pi">:</span> <span class="pi">{}</span></code></pre></figure>
<p>Uff and yay!. The following is the proof that 4 of the nodes were up with proper rack settings:</p>
<figure class="highlight"><pre><code class="language-console" data-lang="console"><span class="gp">root@cassandra-0:/#</span><span class="w"> </span>nodetool status
<span class="go">Datacenter: datacenter
==============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.33.128.4 108.63 KiB 256 51.4% 6f535d65-4076-469e-953b-f4676ed6b54a rack-1
UN 10.35.128.4 103.64 KiB 256 49.3% a7849387-2a55-448b-893c-b6d219a065f6 rack-2
UN 10.44.0.6 108.62 KiB 256 50.5% 2d1741cb-adff-4486-b1a8-b3b0fba410d2 rack-1
UN 10.43.64.4 69.94 KiB 256 48.9% 131d4fc5-60ec-4944-aa29-sfbbfb23a706 rack2</span></code></pre></figure>
<p>Another job done!</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://kubernetes.io/">Kubernetes</a></li>
<li><a href="http://cassandra.apache.org/">Cassandra</a></li>
<li><a href="https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#the-downward-api">DownwardAPI</a></li>
<li><a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-initialization/#create-a-pod-that-has-an-init-container">https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-initialization/#create-a-pod-that-has-an-init-container</a></li>
<li><a href="https://github.com/kubernetes/kubernetes/issues/62078">https://github.com/kubernetes/kubernetes/issues/62078</a></li>
<li><a href="https://stackoverflow.com/questions/36690446/inject-node-labels-into-kubernetes-pod">https://stackoverflow.com/questions/36690446/inject-node-labels-into-kubernetes-pod</a></li>
<li><a href="https://gist.github.com/gmaslowski/117f3535173d733e007d0c6c83564888">https://gist.github.com/gmaslowski/117f3535173d733e007d0c6c83564888</a></li>
</ul>In my current project we faced the challenge of deploying Cassandra cluster in Kubernetes. We don’t use any of the cloud providers for hosting Cassandra nor Kubernetes. Since the beginning, there were almost no problem with spinning a Cassandra cluster. Recently, however, because of our hardware setup, we faced the issue of making Cassandra rack aware on Kubernetes cluster.rpi-domoticz2017-12-01T07:59:00+00:002017-12-01T07:59:00+00:00https://gmaslowski.com/domoticz-rpiTelemetry Rocks!2017-11-01T07:59:00+00:002017-11-01T07:59:00+00:00https://gmaslowski.com/telemetry-rocks