UP | HOME

Time series data storage

There are multiple lists of time series databases around; awesome-time-series-database is a curated and relatively complete one. I've spent some time choosing a DBMS, so will outline the image I've got here.

The primary issue with a relatively large amount of time series data (dozens of billions of rows) and relatively limited resources (no dozens of Tios of RAM, not even that amount of SSD storage) is reading from a disk: the queries are usually simple, but plain reading is the bottleneck. With wider rows (dozens of columns, or some columns storing arrays), a column-oriented DMBS looks like a good idea, if only some of the columns are needed for typical queries. There's a few more things specific to time series data, e.g. a lot of querying by range and insertions, but rare removal or updates. Specialized DBMSes tend to acknowledge that, supposedly optimizing for it, and even providing handy facilities for building reports based on TS data.

Some types of DBMSes that at least don't waste much of one's time:

The remaining ones tend to have smaller communities than general-purpose DBMSes, and generally be worse in everything except for supposed performance for time series data. I've tried just a few, since there's not much to try after filtering them, and it takes quite some time even to fill them with test data:

I've planned to use a specialized DBMS instead of Postgres if the former would outperform the latter considerably, what didn't happen: while the bottleneck is disk I/O, there's not much to improve for a DBMS. Aggregation alone would increase performance with any DBMS by more than a thousand times in this case, and then there are RAID 0, sharding, possibly hardware updates.

Though this is just the initial benchmarking; will update this note later, once will set everything.

Footnotes:

1

Not that I needed it for a hobby project, but those heavily marketing-oriented websites usually suggest a single commercial company staying behind the development (well, it is clear in this case), what often leads to other unpleasant issues – from poor quality to being abandoned or getting otherwise unusable.