Tigraine

Daniel Hoelbling-Inzko talks about programming

Why I love Go error handling

One thing that Go almost forces you to do is to explicitly handle each and every error that any random part of the system might create. This has one very obvious side effect of making even simple code quite long and peppered with if err != nil statements:

func writeFile() error {  

   fileName := "test.txt"  
   f, err := os.Open(fileName)  
   if err != nil {  
      return fmt.Errorf("failed to open file %s: %w", fileName, err)  
   }  
   defer f.Close()  

   d1 := []byte("hello\ngo\n")  
   _, err = f.Write(d1)  
   if err != nil {  
      return fmt.Errorf("unable to write to file %s: %w", fileName, err)  
   }  
   return nil  
}

As you can see a simple open-file-and-write requires 2 error checks that add almost 50% of lines of code to this rather simple method. Because of this most Java/C#/C++ people you show Go code to almost always react with aversion and distaste and never give the language another chance (although I think this is now changing gradually).

But I actually think this is Go's biggest strength and a boon to developers. By having errors be so "in your face" - you have to do something about exceptions. In languages with Exceptions traversing up the call stack it's all too easy to just expect someone up the food chain to catch your exception. All too often that doesn't happen or if it does it's a very generic "catch-everything" block that can only log the problem without having any chance to actually recover from it.

Go in contrast makes you think about every error in detail and how it affects the current control flow. A classic example of this would be a for loop that calls some method. I've seen all too often bugs because people didn't put a try catch inside the loop, so the first problem that arises (and most likely it's a very rare thing that happens) stops execution of the loop and you are then wondering why you're missing half your data or something like that. If you have to really think hard about each error, you're much more likely to also think about how it's affecting the code you're currently writing, so in Go I find myself writing continue and break a lot more frequently than I usually do in Java/Kotlin.

Another case where conscious error handling is very handy in my opinion is when making an application resilient to failures downstream (see my recent post on Bulkheads). Only if you have useful error handling in place on all levels of the application can you start building logic that responds to these errors (without having to go on a archeological excavation of the whole call stack).

Obviously you have to have some discipline in your errors, just doing return err won't do you any favours here. But I find the way Go requires error handling also tends to promote more deliberate throwing of errors that carry actually useful information up the call stack (because you need that info to handle them up there). If that's then in place you can also make much better decisions on how to treat these errors in failure scenarios and when deciding if a CircuitBreaker should trip or not etc.

Filed under golang, errors

A bulkhead in Go is really just a semaphore

When looking to build a software system that's resilient in the face of failure there are a bunch of useful concepts and components that all need to work together to achieve that goal.

One of these tools is Bulkheading).

Bulkheads in traditional shipbuilding are a means to keep water that's entering the vessel in one compartment from flooding the whole ship and sinking it. Translated to software it's pretty similar: You try to compartmentalise the application so failures in one part don't adversely affect the rest of the application.

A classic example of why this is important would be a database that's acting up and starts responding slowly to queries.

By itself that would not be a problem - a slow requests would run into a timeout and the application would gracefully handle that down the line. It does become a problem though if the clients continue hammering that service with more and more queries while the database is slow. The slow responses end up blocking resources in the application and given high enough timeouts and enough incoming requests there is a real risk of the application running out of resources and crashing.

The other issue in such a scenario is that once the database starts becoming unstable/slow, adding more queries just equates to kicking someone that's already down. There is a high chance that the added queries will just make matters worse and cause a struggling database to shut down completely.

The solution to this is to introduce a maximum number of concurrent requests that the application is allowed to send to the database. Once the DB starts getting slow the incoming requests are not immediately submitted to the DB but actually have to wait until another active request is done. By putting a maximum wait time on this you can essentially limit the number of in-flight requests to a known quantity that will prevent your service from consuming all available resources and crashing. And you get to degrade the service gracefully.

Why not use a normal timeout? Timeouts are a static upper bound while latency is rarely uniform. Putting a timeout on an operation that during normal operation responds between 5ms and 10s will usually call for a timeout of 15-20 seconds depending on how generous you are. With a 20 second timeout and a quite moderate 300 operations per second you end up at a respectable 6.000 in-flight requests that tie up resources in your application. In Java-Land that would already spell doom for your application's threadpools. So in addition to maximum duration timeouts we need something more - and that something is a Bulkhead.

After having used the excellent Resilience4J library in Java to "failure-proof" a service that was having spotty collaborators we then moved on to some Go services to do the same. We expected to find a lot of libraries providing Bulkheading, but we couldn't really find one that's maintained and confidence inspiring.

So we looked at alternatives. Remembering that a Bulkhead isn't anything super fancy we looked at the Go standard library and hit gold in the golang.org/x/sync/semaphore package. Specifically the Weighted semaphore implementation is essentially all you need for a Bulkhead. A bulkhead in Go is simply a Semaphore, with all the relevant timeout features being enabled by the clever use of the context package. It doesn't come with monitoring out of the box like maybe Resilience4J does - but that's easy to layer on top and the API ends up being very simple:

sem := semaphore.NewWeighted(5) // allow 5 concurrent calls
go func() {
        ctx, _ := context.WithTimeout(context.TODO(), 1*time.Second)
        // Acquire the semaphore
        err := sem.Acquire(ctx, 1)
        if err != nil {
            // bulkhead is full and we timed out
            return
        }
        defer sem.Release(1)

        // do work
}()

As you can see since semaphore supports context we can very easily add our maximum waiting time for the bulkhead via the context.WithTimeout and we've essentially implemented a Bulkhead but with the standard library and quite straightforward idiomatic Go syntax.

Filed under go, resilience

My Photography business

Projects

dynamic css for .NET

Archives

more