================ An okay Sunday ================ Today goes pretty good for me so far, ignoring awful background things (including that there's still a war going on, Internet access is crippled more and more, apparently recently TSPU temporarily messed up IPsec around here, possibly in the attempt to block more VPNs). I didn't sleep well, but that was because I was rather excited about possibly finally setting a high-availability cluster at work: things keep breaking there (malfunctioned a bit today, too), the users are unhappy, so the managers asked me to look into increasing reliability, which is something I'd gladly do. I started by asking fellow programmers to compose lists of services they develop and maintain (composed such a list myself as an example, and an UML deployment diagram with data flows too, in a few hours), and the managers -- for lists of past major issues. Haven't received anything in almost 2 weeks though, and asked managers about reliability requirements on top of that. But I am familiar with some parts of the system (those that I maintain, interact with, or helped to debug), as well as some of the issues, and can guess that it's best to minimize downtime of all the services as much as possible with reasonable effort and hardware. So now I'm considering options: eyeing Pacemaker, PAF for PostgreSQL with streaming replication (or maybe pgpool-II), DRBD to handle the flow of files uploaded via FTP, possibly with GFS2 and in an active/active configuration (with that fancy multicast + sorting out on hosts); configuring failover of systemd services should be easy. Maybe could do DNS-based load balancing (and hopefully a kind of failover) too, with clusters in different places and behind different addresses. Though before all that, will have to attempt to poke everyone to simplify the system, since currently the issues tend to arise on a long path through which data goes, composed of hacks, legacy bits, not-quite-ready-though-in-development-for-a-long-time things that should replace the legacy bits, and so on: sorting it out should increase reliability by itself, and such a messy system would be hard to set for redundancy. So, that's the stuff I'm rather excited to play with, and somewhat hopeful that the system at work will be simplified on top of that. Other than that, did some physical exercises today, sorted out some clothes, did a bit of cleaning and laundry, had a nice steak for lunch. Now it's sunny outside, which is fairly uncommon for winter here, so it is relatively notable. Hopefully this day will stay about as nice. ---- :Date: 2023-02-12