Daniel Hoelbling-Inzko talks about programming

Upping Apache PoolingHttpClientConnectionManager pool limits

Imagine configuring a HTTP Connection pool and setting setMaxTotal to 50. Reasonable assumption would be that henceforth 50 concurrent connections will be made by the HttpClient upstream.

Well not in Java-land - here you'll get exactly 2 connections going out - apparently regardless of what you set as maximum total connections.

Turns out there is a second setting on the PoolingHttpClientConnectionManager that's called maxPerRoute and that controls how many connections you can make to the same host/url combination. Since in our current setup we mostly query one endpoint over and over again the maxTotal setting is pretty useless and the limiting factor will be the maxPerRoute.

Thankfully there is a setDefaultMaxPerRoute which can be tweaked, or there is the ability to specify individual limits per upstream route with setMaxPerRoute

The final code in question is:

PoolingHttpClientConnectionManager poolingConnectionManager = new PoolingHttpClientConnectionManager();

To debug the issue of slow responding upstream clients I also wrote a little go webserver called blackhole that does exactly what the name implies: It accepts any HTTP connection and swallows it for 100 seconds. This makes it easy to test your code against slow responding HTTP servers (like when under duress or if the system becomes unresponsive).

Filed under java, apache

Convert a millisecond precision unix timestamp to Time in go

It's no real secret that I do love the programming language Go. So I was really delighted to see that Go apparently does all the right things when it comes to their time package that handles time zones etc correctly by default as opposed to be something bolted on after the fact like most other languages.

But for some unknown reason it's just way too complex to convert a millisecond resolution Unix timestamp to time.Time. The built-in time.Unix() function only supports second and nanosecond precision.

This means that you either have to multiply the millis to nanoseconds or split them into seconds and nanoseconds. So obviously my naive implementation was:

time.Unix(0, timestamp * int64(1000000))

But that code looked ugly to me - especially if you have to do this a few times around the codebase - so I wrote a function.

But for some reason I also decided to benchmark my function as I am working on a performance critical piece of code right now. And it turns out that the simple multiplication to turn millis into nanos is 2x slower than dividing the millis into seconds and then turning the remainder into nanos.

time.Unix(ms/int64(millisInSecond), (ms%int64(millisInSecond))*int64(nsInSecond))


goos: darwin
goarch: amd64
pkg: github.com/tigraine/go-timemillis
BenchmarkMult-8         2000000000               0.50 ns/op
BenchmarkDiv-8          2000000000               0.25 ns/op

So I packaged my findings into a library which is now available on GitHub: go-timemilli

Filed under golang, time

Golang hidden gems: testing.T.Log

One thing I love about Go is it's build chain and overall ease of use. Some things take time to get used to, but the lightning fast builds and the convention-based testing Go offers are addicting right from the start.

Today I found another hidden Gem I think is just genius: testing.T.Log(). Ok I admit, not the most sexy method to get excited about - but bear with me for a moment. Imagine the following code.

func TestSomething(t *testing.T) {
  t.Log("Hello World")

What's the output? If you'd expect Hello World you are mistaken. The output is exactly nothing :)

testing.T.Log() only prints something if a testing.T.Error or testing.T.Fatal occurred. Brilliant! Nothing is more annoying than chatty test suites where your actual problem is buried in 2-3 megabytes of meaningless debug statements! And this solves the problem really elegantly. You can log as much debug info as you want and it will only surface if the test actually failed.

Filed under golang, go, testing

Golang: int is not a type

Today I ran into a very interesting compiler error in my go program: int is not a type. Although the same lines of code that didn't compile worked a few minutes earlier.

It took me 20 minutes to figure it out. I was already halfway through a forum post on the golangbridge forums and trying to put together a minimal example when I noticed the problem. Apparently I had a typo in mit func init() and had inadvertently called it func int()!

The go compiler didn't even complain with me overriding one of it's core types.

Filed under golang

Generating synthetic CPU load on Linux

While working on some alerting and metric collection about our infrastructure at Bitmovin I wanted to test out if the alerts I configured are actually triggered when a server experiences high CPU load.

I came across this beautiful Stackoverflow Answer that did exactly what I needed:

seq 3 | xargs -P0 -n1 md5sum /dev/zero

This command will saturate 3 cores with 100% user load until you cancel the command with CRTL+C.

Filed under linux, devops, server

Compiling vim8 with python support on Ubuntu

Today I took a day off from work so as always when I try some new stuff I end up spending 2 hours on my Vim configuration before actually getting something done. So todays two hours where spent on getting Vim compiled with python3 support.

First off - do use Vim8 - it's awesome and do compile it from source. It's rather simple and saves you from outdated packages on Ubuntu :).

Now my issue today was that I tried enabling python2 and python3 support at the same time. For no apparent reason the following configuration did always result in a vim binary that thought it had python support - but didn't.

./configure --with-features=huge \
            --enable-multibyte \
            --enable-rubyinterp=yes \
            --enable-pythoninterp=yes \
            --with-python-config-dir=/usr/lib/python2.7/config-x86_64-linux-gnu \
            --enable-python3interp=yes \
            --with-python3-config-dir=/usr/lib/python3.5/config-3.5m-x86_64-linux-gnu \
            --enable-perlinterp=yes \
            --enable-luainterp=yes \
            --enable-cscope --prefix=/usr \

Running vim --version resulted in +python/dyn and +python3/dyn so I thought - cool it's working.. Until I started vim and was greeted by:

Sorry, this command is disabled, the Python library could not be loaded.

To make things more interesting :echo has('python') did return 0 too - although the Vim was built with python support (and --enable-fail-if-missing is supposed to fail if python can't be linked).

So after trying around a bit and not getting anywhere I decided to just remove the python3 support from the configure line and voila - python is statically linked and working.. Yay!

./configure --with-features=huge \
            --enable-multibyte \
            --enable-rubyinterp=yes \
            --enable-pythoninterp=yes \
            --with-python-config-dir=/usr/lib/python2.7/config-x86_64-linux-gnu \
            --enable-perlinterp=yes \
            --enable-luainterp=yes \
            --enable-cscope --prefix=/usr \
Filed under vim, python, tools

Configuring Kong health-checks in Kubernetes

The first rule of cloud computing should be: Always have a health check!

Why? Well - without them your cluster will not know if the application is actually up or still starting/terminating or anywhere inbetween. As long as there are livenessProbes and readinessProbes Kubernetes can make sure no traffic gets routed to your app before it is really ready. And even more important: It will restart services and reschedule them once your health checks start going sideways.

But here is another insight into health checks: Do performance testing on them.

During the last couple of days I've had Kubernetes kill and restart perfectly healthy Kong Api-Gateway pods because apparently the /status route in Kong does some pretty expensive queries on the backend. Kong apparently thinks it's cool to do a SELECT COUNT(*) on most of it's tables to tell you how many consumers it has registered, how many oauth_tokens there are etc.. All totally irrelevant information for a health check - but it's still the only endpoint I was able to hit that would actually terminate on kong itself (anything else would also kill Kong if the upstream service is having a problem). And /status sounded like a reasonable endpoint for health-checking.

Now with Postgres that kind of queries would not really be a terrible problem (still not good), but for Cassandra it's pretty catastrophic since it's not really meant to do aggregation queries without a partition key. Looking at the code reveals the problem - and so once there was some moderate pressure, the slow queries would time out and Kubernetes would think the Kong pod was dead (although it was still serving requests) and killed it. Yay!

So the solution here was to move away from a httpGet liveness & readinessProbe to a exec probe. Exec probes are a one of my favorite feature Kubernetes - instead of doing Network calls to check if something is up it will just do a docker exec and determine based on the return code of the program executed if the pod is healthy or not.

And coincidentally Kong comes with a commandline utility called kong health that does exactly what it's named for - and is lightning fast with no database involved :).

Here is the relevant yaml configuration:

       - kong                                                                                                                                                                                               
       - health 
Filed under kubernetes, devops

Enable code coverage reports in create-react-app projects

create-react-app is a nice and easy way to bootstrap a new React.js project with some sane defaults and most of the tedious configuration required to enable Webpack building of Babeljs etc..

One thing I was missing from the generated configs though is how to output code coverage. Turns out it's rather simple - locate your package json and add the following line under scripts:

    "coverage": "node scripts/test.js --env=jsdom --coverage"

This way you can run yarn coverage or npm coverage and get a nicely formatted output with your coverage data. You can read more about the jest cli options in the docs

Filed under reactjs, testing, tools, javascript

Naming tests is more important than what they do

Why are we writing tests? There are numerous reasons, but to me the primary one is that I can go into a codebase even after I have forgotten everything about it and make changes without fear of breaking 20 things at once.

One of the major antipatterns I see regularly is the dreaded testMethodWorks() testcase:

public void testCreateUser() throws Exception {
  User user = userService.createUser("foo", "bar");
  assertEquals(user.getUsername(), "foo");
  assertEquals(user.getPassword(), "bar");

  User invalidUser = userService.createUser("bla", "");

  User someOtherTest = userService
  ..(goes on for another 20 cases)...

The example is somewhat contrived, but you get the idea. A testcase that checks 30 different (marginally related) things and will potentially fail for even more reasons. Of course that one testcase validates that the createUser() method works - and especially when a lot of setup is involved in your testcase it's convenient to just use the stuff that's already there.

But by doing so you are sacrificing a major benefit of tests: Readability through naming. If every testcase is simply named after the method it's testing, you end up with a completely useless test class that has exactly the same informative value as the class under test. Why would I bother reading the test if I could just look at the code that's doing stuff? It's probably shorter than the test case!

Imagine you come into a new codebase and whenever something breaks you first have to read through the test code. Looking at each jUnit stacktrace to figure out which assertion blew up - just so you can figure out what the test was actually doing and why that's a bad thing. Yikes.

Now I won't advocate the "one assertion per test" mantra - that's going overboard and usually leads to unmaintainable tests. But at the very least group your tests not by method but by use case. If a test fails it should be for one reason and that reason damn well ought to be in the test name. Not because nobody likes to read code - but because the first thing each testrunner will report is the name of the test that failed.

It's much easier to figure out what is going on if you get a


failure rather than a simple testCreateUser().

Seriously - I didn't even have to explain to you what my use case was - but if this test blows up you will immediately know it's a ACL issue and that it's manifesting itself by not returning a 403 StatusCode. If there was a second testcase called testCreateUserWithoutAdminCredentialsDoesNotInsertUserIntoDatabase you'll also not look at all the different corners of my repository why we got a record too many in some assertThat(repository.getAll().size(), equals(0)); but rather just ignore that failure as it's clearly an ACL issue not a database related thing. By splitting things into multiple testcases we also get the added benefit of predictable states. A test that did not correctly clean up some shared resource (in-memory-db etc..) will not create false positive in line 100 of your testMethodWorks() case but should be contained by your Transactional Testrunner or your setup/teardown methods.

So I propose three simple things that should always be in the testname - regardless of how the test is written or what you are testing:

  • Method under test (createUser)
  • Context the test was run (WithValidAdminCredentials)
  • Expected outcome of the test (ReturnsUserAsJson)

And you end up with createUserWIthValidAdminCredentialsReturnsUserAsJson and alongside you'd naturally get a second testcase called createUserWithValidAdminCredentialsInsertsUserIntoDatabase.

Keep that in mind and you'll make your life much easier for yourself when you have to update something in your codebase a few months down the road - once you have forgotten everything that was going through your head right now :)

Filed under code, style, testing

Git isn't magic

One thing I am seeing over the years is that Git has become too commonplace - too ubiquituous so that people don't care how it actually works or why it works that way. It's just this magical thing that keeps your sourcecode safe - and as long as push and pull work they just don't care why or how it works. Until git pull messes up your history, or you accidentally merged into the wrong branch or just a case of something went wrong™ and then they hope their Git client can bail them out.

So here is a very short guide on why Git is the way it is and why you should not be afraid of anything when it comes to working with git.

First thing to understand is that Git stores all of your committed files in a blob storage living inside your .git folder. Each and every file you ever committed got slashed into tiny files and placed in the .git folder. This is important because Git does not think in deltas or changes like other SCM systems but it thinks in snapshots. Each commit you do will commit the whole file to the blob storage. To save space git will internally map your files to multiple smaller files and have a blob object reference these smaller chunks (everything is addressed as a hash in git). So whenever you change something in that file git will not save the whole file a second time, but rather create a new blob object that references all the old chunks, alongside the new one you just changed.

Now that we know that Git has a storage that can return you every file ever committed in your repository we can talk about the commit and the tree. Files in itself are not very useful as we usually have a lot of these in our repository, so there is a object above the blobs that describes which versions of each blob belong together into a directory tree. Aptly named tree this is just a file that is more or less a mapping from the filename to the underlying blob hash. This is also important because if we change a file in a huge directory tree, a new version of that tree will just change one of these references to a new blob object (leaving all the other ones pointing to the old objects) - and that blob object will possible reference older chunks alongside the few changes we did in that file.

Why is this important? This also explains why git is so lightning fast. For every checkout operation git has to just look at the tree object, go into it's blob storage and store each file at the name it was referenced in the tree. Unlike other delta based SCMs where you had to get the last snapshot and then replay all changes since that snapshot to get to the current version. For git this is a constant operation that always takes the same amount of time - no matter which commit you check out.

The last concept to understand with git is the commit. A commit is basically a very small file that contains a few important pieces of information. It contains the SHA1 hash of the tree it is representing, the name of the author, the committer and the commit message. The git commit hash is exactly what the name implies: A shas1sum over this commit file. Thus a commit hash uniquely represents a commit, which unambiguously and cryptographically securely references a tree which represents (using the same sha1 concept) all files. If I tell you to check out version **d0ce065** of the project there is no doubt about what you are getting. Either your commit sha1 matches it's commit files contents or not. There is no way to corrupt that unit of your source code. Everything attached to a commit is securely identified by that one commit (since blobs, trees and chunks are all also in turn identified by their SHA1 sums).

So we end with a diagram like this (graphic by Scott Shacon):


What is missing from this picture is the history. Where are the other commits? This is no SCM yet. Well I left out one very important detail in my previous description. The commit file also contains the sha1 of it's parent commit. So if you remember that everything about a git commit is a sha1 hash, one commit also cryptographically secures the whole git history before it and makes it impossible to miss a commit or add ones without all hashes upstream to change. Again this is best described with a diagram:


So this now also explains why in every git presentation ever there where arrows between the commits that always went backwards. Git is a acyclic directed graph. This also means that git commits only know about their parents - never about their children. So if you did something you are not proud of, just check out the parent commit and continue working there - your child commit will linger on in the git object storage for some time, but eventually git will forget about it and it's gone.

No branches?

If you think about all of the above for a second you will realize branches are there just for human comprehension. They are not needed in git. Everything git needs to know or reason about (and you) is stored in single commit sha1 hashes. The git datastructure has no concept what master or develop means, it can only reason about 9b09404973ca743d0f5e11367034250dff219637.

Branches are just cosmetic to make us stupid humans not have to remember sha1sums when we work with git. A branch is nothing more than a file that lives inside .git/refs/heads that contains exactly the shasum of the commit your branch is currently pointing to. Branches are pointers - they serve no (real) purpose and can be deleted at no cost without anything happening to the commit tree. Try this:

$ git clone https://github.com/dotless/dotless.git
$ cd dotless/.git/refs/heads/
$ cat master
=> 9b09404973ca743d0f5e11367034250dff219637 <= This is the commit sha1.

Go ahead and do a git checkout 9b09404973ca743d0f5e11367034250dff219637 and you will get exactly the same working directory as if you had done git checkout master. This is so important because most of the time when stuff goes wrong with git people don't realize that having a commit on a branch where it does not belong is no big deal - just delete the branch and re-create it at the point where it actually should be.

Coincidentally remote branches are also just the same thing: pointers. Locally you can merge with your origin/master exactly because when you do a git pull or git fetch git updates the files inside .git/remotes/origin to the sha1sums that are on the server. These things are just local operations since all changes of the server are already living inside your .git directory.

So if you ever accidentally deleted a branch and forgot the commit sha1 it was at - that branch is not lost. It's still right there in your .git folder - you just have no clue how to ask git for it. That's the point where you can just run git reflog and you will see all the recent pointer/branch changes you did recently along with their sha1 sum for you to check out and recover.

$ git reflog
858b6db HEAD@{0}: merge laedit/Adding-support-for-'unit'-function: Merge made by recursive.
715db85 HEAD@{1}: checkout: moving from laedit-Adding-support-for-unit-function to master
715db85 HEAD@{2}: checkout: moving from master to laedit-Adding-support-for-unit-function
715db85 HEAD@{3}: merge nevett: Fast-forward
2a933ed HEAD@{4}: checkout: moving from nevett to master
715db85 HEAD@{5}: pull https://github.com/Nevett/dotless.git fix-mixin-important-recursive: Fast-forward
2a933ed HEAD@{6}: checkout: moving from master to nevett
2a933ed HEAD@{7}: checkout: moving from d1d2a822561fcd2c52a54b6bb8799000e4efecf9 to master
d1d2a82 HEAD@{8}: checkout: moving from MarkOSIndustries-master to d1d2a822561fcd2c52a54b6bb8799000e4efecf9

As you can see, each operation I did (even when I did not commit anything) is noted here and I can find the missing sha1sum to recover.

Filed under git

My Photography business


dynamic css for .NET