Time series data storage

There are multiple lists of time series databases around; awesome-time-series-database is a curated and relatively complete one. I spent some time choosing a DBMS for such a task, and summarising the things I've learned here.

The primary issue with a relatively large amount of time series data (dozens of billions of rows) and relatively limited resources (no dozens of Tios of RAM, not even that amount of SSD storage) is reading from a disk: the queries are usually simple, but plain reading is the bottleneck. With wider rows (dozens of columns, or some columns storing arrays), a column-oriented DMBS looks like a good idea, if only some of the columns are needed for typical queries. There's a few more things specific to time series data, e.g., a lot of querying by range and insertions, but rare removal or updates. Specialized DBMSes tend to acknowledge that, supposedly optimizing for it, and even providing handy facilities for building reports based on TS data.

Some types of DBMSes that at least don't waste much of one's time:

Easy to filter out.
For some reason they stay in the lists, but relying on an abandoned DBMS to store your data is unwise, unless you have enough time/resources to continue development.
When there's just a paper saying how great it is, and a small short-living repository suggesting to try their Docker image. Might be fun, of course, but not for production.
Even NSA and GCHQ publish open-source analytical DBMSes (it must be somewhat amusing to store data in those), as well as all kinds of advertisement companies do, and I guess there are less known projects led by enthusiasts. It is possible that some of those are good, but it would take a while to find and try out enough to find anything good, and even then there would be a risk of those getting abandoned.
Simply was not suitable in my case, since there's too much data for memory (and I probably wouldn't look for a fast one if the amounts of data were that small).

The remaining ones tend to have smaller communities than general-purpose DBMSes, and generally be worse in everything except for supposed performance for time series data. I've tried just a few, since there's not much to try after filtering them, and it takes quite some time even to fill them with test data:

Looks too entreprise-y, but is in Debian repositories, so it's easy to try. Or so I thought: insertion turned out to be too slow, inserting about 100 minutes of test data in an hour. Maybe the issue was in the client library, and that's after decreasing batches to 1000 records at a time, since otherwise it was failing with timeouts. It looked handy, but I gave up on benchmarking it and moved on to others. There also is an issue with clustering (or, rather, a lack of it in the libre version), which is scary when the data is close to not fitting on a single machine.
Not a specialized TSDB, so didn't hope for it to outperform the specialized ones, but at that point I've decided to actually try it, to compare others to it: it is a good RDBMS, and was good for many years, in many aspects. Initially I was going to use JSON fields, but columns (including array columns) turned out to consume about twice less disk space. It was rather slow for large selections with just a primary key, but considerably faster with index-only scans.
KairosDB + Cassandra
I expected this one to be fast, but it turned out to be a bit slower than Postgres with just a primary key. Actually it was a relief, since there are a few things I didn't quite like about those (mostly the documentation and Java, perhaps). Apparently a built-in web UI is among its prominent features.
ClickHouse (with MergeTree)
Made by Yandex. The build process, the documentation, the functionality – pretty much everything is rather poor, but the performance is indeed fine: like Postgres with index-only scans (sometimes a little faster, sometimes a little slower, but roughly that). Takes less disk space than Postgres does. Maybe it's better for larger workloads, but I haven't noticed anything outstanding in its performance, while it's generally less flexible and less feature-rich than Postgres.

I've planned to use a specialized DBMS instead of Postgres if the former would outperform the latter considerably, but that didn't happen: while the bottleneck is disk I/O, there's not much to improve for a DBMS. Aggregation alone would increase performance with any DBMS by more than a thousand times in this case, and then there are RAID 0, sharding, possibly hardware updates.

Some approaches that I found useful with PostgreSQL: