Everyday programming in Haskell

This is a collection of personal notes on day-to-day programming in Haskell. It is one of those cliche "X in production" articles, which I'd normally avoid, but since it's the primary language I use professionally, and since Haskell-related articles, blogs, and communities tend to focus on entry-level materials and on related theories, it seems that writing these more practical notes down may be useful.

Why Haskell

In this particular setting, reliability was a concern: I was rather tired of hunting both actual bugs in programs written in C, Perl, Python, and PHP, and the bugs that were reported as noticed somewhere in the whole system (with many unreliable external components). Given sufficient time and fixed requirements, it is viable to write reliable software in virtually any common language (assuming that the compiler and libraries aren't too buggy), but usually there's not enough time and the requirements keep changing. A nice architecture helps to not break the system when features are added, and to keep it simple enough to maintain and quickly refactor without breaking, but a nice type system and simple semantics are useful for that too.

There are dependently-typed languages suitable for verification, which I poked as a hobby for a couple of years before switching to Haskell, but unfortunately they are not nearly as mature. Then there are languages with more arbitrary typing and semantics (mostly imperative ones), which would rather stay in the way and not help that much. And languages which are even less mainstream than Haskell, with fewer libraries that are readily available (but needed when unexpected new features should be added quickly). Speaking of libraries, often I have to implement uncommon network protocols and other things involving parsing, and Haskell parsing libraries are among the best parsing tools I'm aware of. So it seemed (and still does) like a sensible compromise, being pretty good in every category I care about for these programs.

Maintainability

Haskell code is relatively easy to refactor and maintain in general, and hard to break by accident. But it's also hard to edit if one isn't familiar with Haskell, and since it is relatively uncommon, it may be challenging to find a Haskell programmer; that's a major obstacle for adoption (which in turn keeps it relatively uncommon).

In part to mitigate that, and in part to get a decent system regardless of a language, I find it useful to follow Unix philosophy by making individual well-defined components close to what one could reasonably expect to find in standard system repositories if it was already implemented (and more commonly needed): separate programs that do their job without any particular system in mind, using text streams. That way, in the worst case it would still be viable to rewrite individual components in another language, without touching the rest, as well as to interact with those components from other languages. That's opposed to a common in-house or enterprise software practice, where the custom programs are special snowflakes that don't follow standards and conventions, and possibly just coupled into a single monolith.

General good practices also apply: comprehensive documentation, clean and simple code, and minimal dependencies would make it less of a headache to maintain both for oneself and for the potential future maintainers.

Code simplicity

I think it won't be very controversial to say that Haskell-specific "code simplicity" means that there's no complicated type tricks, not much of abstract algebra, no Template Haskell, no GHC Generics, no DSLs, -Wall is used, and maybe just a few common language extensions. It's not just that a novice programmer may not be familiar with them, but also that after not touching those for a while, it may be challenging to debug or edit non-trivial uses of those on your own. As the Brian Kernighan's quote goes, "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"

Safe Haskell also checks for some of that (and is nice in general), but unfortunately not all the common libraries are "safe" in that sense.

Minimal dependencies

Haskell makes it easy to abstract things, as well as to grab and compose different pieces of code. It's nice, but leads to huge dependency hierarchies; dependencies come with bugs and bloat, and should be maintained too. By default Cabal and GHC would also link Haskell libraries statically, so that could easily lead to a huge codebase that doesn't get updated with the rest of the system.

Sometimes I use in-place FFI (which is nice and handy with GHC and Cabal), in both hobby and work projects: a polished C library is often more complete and reliable than a Haskell reimplementation (or even bindings), has fewer dependencies, and if you only need a few functions, it's not much of additional code. Given all the text conversions, imports, and kinds of errors one has to deal with while using Haskell libraries, it may even require less code to use a C library.

Sticking to lower-level Haskell libraries may be preferable too: higher-level ones tend to introduce bugs, restrict what one can do, and introduce more dependencies (well, same as in any other language). Another obvious trick is to implement small functions even if they are available in libraries, also a more common practice in C. It is good to reuse code, but perhaps not to the extreme where a program is just a lot of libraries stitched together.

As an example, I've used those approaches in pgxhtml, after prototyping it with high-level libraries.

Tools and infrastructure

By default Cabal would pull dependencies from Hackage and link them statically. Possibly using a sandbox, while Stack would also pull GHC and use Stackage. I think it's awkward, but perhaps useful in fighting the dependency hell while using cutting edge software.

But GHC supports shared Haskell libraries, a program can be built with cabal install --enable-executable-dynamic to use those, and Debian repositories include a lot of Haskell libraries (as well as regular ghc and cabal-install). So one can use a system package manager and repositories to both install and update everything. I'm in the slow process of switching to use that.

I'm packaging software into Debian package archives with dpkg-deb(1), and listing dependencies from system repositories in the control file. Perhaps Cabal is unnecessary in such a setting, but still handy as a backup and for building on different systems.

Emacs haskell-mode is nice and sufficient for active programming with REPL, though there are other packages with additional features. Haddock (a documentation tool) is not bad, but unfortunately the generated documentation isn't very readable in lightweight browsers, without CSS (and possibly JS). Profiling and debugging aren't as nice as with C, of course, but usable. Testing libraries are handy (though I don't use them often). State of the Haskell ecosystem is a nice summary of tools and libraries.

String types

There are CString, String, Data.ByteString (lazy or strict, Char8 or Word8), Data.Text, and awkward but common conversions between them, because different libraries use different types. That's yet another reason to avoid dependencies.

Data.ByteString (strict, Word8) is the closest out of commonly used ones to CString, which is what's used by C and Unix (i.e., the outside world), so I think it makes sense to view it as the default; String is in the base library and can be used for Unicode text manipulations; Data.Text is there for efficient Unicode string storage and manipulation.

Error handling

The situation is outlined in the "control flow" note: there are multiple ways to handle errors, and different libraries use different ones. The "outside world" usually uses return codes, but in Haskell unchecked built-in exceptions always win and can happen unexpectedly. And there are asynchronous exceptions, coming from outside (e.g., other threads). So one has to handle them anyway, and could as well throw them too. I find it unnecessarily messy, but there it is.

GHC RTS, concurrency, FFI, and POSIX

GHC alone has single-threaded and multi-threaded runtime systems, "safe" and "unsafe" foreign calls (as well as other modifiers one may need to have in mind, such as "interruptible"), bound and unbound threads. It's potentially nice, but quite a bit more complicated (and in some cases has more overhead) than just calling functions from C, even concurrently.

The most straightforward (that is, resembling system threads, not requiring to interact with GHC's event manager explicitly) combination is perhaps multi-threaded RTS with "safe" calls (particularly for blocking functions).

As of 2019, there's still no complete and consistent bindings to POSIX functions, but there are attempts to make such bindings.

Summary

There is plenty of warts, awkwardness, and imperfections, but it applies to virtually any non-trivial technology. The language semantics are nice, relative to other somewhat common languages; GHC is a good compiler; fine system (POSIX and GNU/Linux distributions) integration is achievable.

I guess Rust may be the next best option for projects with similar requirements these days, though even my hobby experience with it is very limited.