UP | HOME

Logging

Being unable to find any useful "best practices" or guidelines for logging, here I've tried to put something like that together.

1 Technologies

Syslog and journald are the primary ways to write logs on unixy systems nowadays. Using custom databases, manually managing logfiles, and otherwise reinventing the wheel leads to maintenance hell for both programming and administration.

There is plenty of fancy logging libraries, though I prefer simple ones: libsystemd-journal combined with formatting, abstracted into a function that also passes some system-specific environment variables into journald fields:

logM :: MonadIO m => Priority -> Format (m ()) a -> a

Pretty much like syslog function in C, but safe. Type-safe, that is; journald is more buggy than common syslog daemons.

2 Log contents

It's tricky to decide what to write apart from error messages: audit trails are useful, but it would be impractical to log a message before and/or after each action that could fail1. One way is to let them evolve: if a debug or info message was not helpful for a while, it should be removed (or limited to debug mode if it's present).

There are log scopes, stack traces, and memory dumps to write more information (particularly context) just on error; they are more complicated, possibly less reliable, and sometimes less useful than simple audit trails, though may be useful in addition to those.

It's also pretty hard to make the logs readable by users/administrators: there are libraries that produce error messages for programmers on one side, and users who can't read even carefully prepared messages on the other.

3 Structure and analysis

To analyze logs reliably (i.e., without relying on NLP too much), they should be structured. There are standard fields (timestamp, priority, process, etc), possibly custom fields, and a non-structured textual message; while the fields are useful, most of the program-specific information is still in text. Some of that text comes from libraries or external sources, and can't be structured with reasonable effort, but we still can try to keep a structure for what we compose.

A single software structure — unless it describes a whole language — would be language-specific and project-specific, inflexible. Message identifiers would be cumbersome to maintain and not syslog-friendly. It may be useful to store variable parameters in custom fields, though it would require to define a large set of standard fields to make it generic, would be tricky to write from a shell script (i.e., without special support), and neither it is syslog-friendly.

A language that is both computer-readable and human-readable, and capable of knowledge representation, would be handy here, as well as for various other purposes. But that's hard, and writing textual logs in a language other than English would be awkward, potentially leading to a system only maintainable by its author.

3.1 Quotation

One easy step towards structured logging is to enquote variables, for instance:

Received `42` bytes from `1.2.3.4:56`
User `foo` has logged in

In Haskell with the formatting library, I'm using custom formatters for that (they also escape those quotes with backslashes), though it can be done even in a shell script rather easily. If there are multiline variables, they could be moved out of the first line in order to simplify analysis, and all message types could be extracted with a basic sed script then:

#!/bin/sed -rf

# Remove date (journalctl -o short-iso), server name, PID
s/^[^ ]* [^ ]* ([^[]*)\[[0-9]*\]:/\1/

# Skip if it's not the first (date) line
T skip

# Hide the variables
s/([^\\]|^)`([^`\\]|\\.)*`/\1_/g

# End
p
: skip
d

Or s/^[^ ]* [^ ]* [^[]*\[[0-9]*\]: // to remove program names as well. And a shell script to retrieve logs may look like this:

#!/bin/sh
journalctl -a -o short-iso --since $1 \
           _UID=$(id -u user1) + _UID=$(id -u user2)

So that get-logs.sh -6h | hide-variables.sed | sort | uniq would extract a nice list of message templates, e.g.:

Received _ bytes from _
User _ has logged in

And uniq -c | sort -gbr would summarize those logs, e.g.:

12 Received _ bytes from _
 3 User _ has logged in

Now we can translate logs into a different language or format, annotate, filter with grep or logcheck, make it more accessible for those who normally don't read logs, or even perform some statistical analysis using those parameters, and/or message frequences and times.

Most of those operations would still be project-specific (though not affecting programs apart from log analyzers), manually built on top of the extracted message templates, and requiring maintenance/updates whenever the messages get updated.

3.2 Sentence structure

In order to assist filtering, the textual messages should be unified in their structure. Using the main verb in its base form would help to grep easily, what suggests the "(program) managed/failed/planning to <verb> <tail>" form. That's also helpful to ensure that the messages are more or less meaningful to others: compare "time: <time>" with "managed to receive time: <time>". Yet those "managed to" prefixes are rather verbose, so a better option might be to use "[+]", "[-]", and "[ ]" to indicate success, failure, and intention. Success and failure could be inferred from priorities, but it's tricky to separate successful actions from plans using standard priorities alone, though one could pretend that planning is an action, or reserve "debug" for plans and "info" for actions.

Then the above example may be rewritten as:

[+] receive _ bytes from _
[+] authenticate user _

Footnotes:

1

Along those lines, there is event sourcing. And for some time I used to write all the transmitted packets into a database (Cassandra), but then switched to writing last network session dumps into per-peer files; both were handy occasionally. Perhaps log splitting (maybe in combination with additional scripts) could be used to achieve something similar by the means of standard loggers, but without polluting regular logs.