Personal data storage

These are my data storage notes, targeting primarily personal data backups: regular files (documents, photo and music collections, not databases), moderate volume, added or edited rarely, backups are managed manually.

General approach

The "3-2-1 rule" for backups suggests to keep at least 3 copies of data, on at least 2 different storage devices, with at least one copy off-site.

The regular infosec CIA triad (confidentiality, integrity, availability) is desirable and fairly straightforward to apply. We'll need encryption, so that lost or decommissioned drives won't leak personal data (i.e., crypto-shredding can be employed); integrity checking, so that we'll either read back the data that was written or detect data corruption; varied and common technologies (hardware interfaces, drivers, filesystems, file formats), so that there will be a good chance that at least some of the backups can be accessed with reasonable effort in different situations in the future.

The technologies covered here are usable for both backups and working storage. I prefer to use more general tools, since they tend to be better maintained, and learning them usually is a more useful time investment than learning specialized backup systems.

Hardware

Reliable computer hardware is desirable to minimize errors and hardware failures: an UPS, ECC memory, and quality hardware (including storage) in general. External HDDs (and maybe SSDs) are cheap and handy for both local and off-site backups: they provide interfaces different than the primary internal drives do, are easy to transfer, to plug into different machines, and to keep unplugged.

Backup operating system

I find it useful (for the peace of mind, at least) to set a bootable operating system on at least one of the backup drives, with all the necessary software to read the backups. So there's usually EFI system partition (ESP), an unencrypted partition for /boot (GRUB2 can handle encrypted ones, but it won't make much difference), an encrypted partition for the rest of the system (to prevent possible data leaks via cache, for instance, after backups are accessed from it), and a separate encrypted partition for the backup itself.

When installing a system using an installer, on a machine with more than one disk and some existing systems present, the installer would often use a seemingly random ESP on one of the internal disks, instead of the one on the backup drive. Fixing it may involve booting via the GRUB shell after GRUB fails to find or access its config from the /boot partition, remounting (and fixing in /etc/fstab) /boot/efi/, to point to the correct drive's ESP, and then running grub-install to install it there. Also removing undesirable directories from ESP manually, and adjusting things with efibootmgr. Or one can opt for a more involved/manual installation, setting it properly at once: see, for instance, "Installing Debian GNU/Linux from a Unix/Linux System" and "Full disk encryption, including /boot: Unlocking LUKS devices from GRUB".

Setups

I do partitioning with fdisk, mostly because other common tools (or at least their fancy user interfaces) tend to be buggy. fdisk is nice, commonly available, and works well.

RAID 1 is nice to set if there are spare disks, but usually not as critical for redundant personal backups as it is, for instance, for a production server.

As of 2021 and for Linux-based systems, some of the common software options are:

Below are notes and command cheatsheets for the setups I use.

LUKS and ext4

This is probably the most basic and widely supported setup for Linux-based systems. Only authenticated integrity checks are supported by cryptsetup (and those are experimental), so no CRC and no recovery from minor errors without RAID, apparently. CRC won't be useful for repairs on top of an encrypted partition either. Perhaps dm-integrity can be set separately to use CRC32C, but that would complicate the setup.

Initial setup:

cryptsetup luksFormat --integrity hmac-sha256 /dev/sdXY
cryptsetup open /dev/sdXY backup2
mkfs.ext4 /dev/mapper/backup2
cryptsetup close backup2
mkdir /var/lib/backup2

A typical session:

cryptsetup open /dev/sdXY backup2
mount -t ext4 /dev/mapper/backup2 /var/lib/backup2/
# synchronize backups
umount /var/lib/backup2/
cryptsetup close backup2

For RAID with mdadm, see "dm-crypt + dm-integrity + dm-raid = awesome!".

ZFS

ZFS is not modular like LUKS and friends, there are license compatibility issues, and it's generally rather unusual, but apparently a good filesystem containing all the features needed here.

Initial setup:

# Install zfsutils-linux
apt install zfsutils-linux
# Find a partition ID
ls -l /dev/disk/by-id/ | grep sda4
# Use that ID to create a single-device pool. The "mirror" keyword
# should be added to set RAID 1.
zpool create tank usb-WD_Elements_...-part4
# Create an encrypted file system.
mkdir /var/lib/backup/
zfs create -o encryption=on -o keyformat=passphrase -o mountpoint=/var/lib/backup tank/backup

ZFS comes with its own mounting and unmounting commands, and if it's to be used from different systems, the pools should be exported and imported (or just force-imported). A typical session, assuming that it's used from different systems:

# List pools available for import
zpool import
# Import the pool
zpool import tank
# Mount an encrypted file system
zfs mount -l tank/backup
# (Synchronize backups here)
# Unmount the file system (or it'll happen on export)
zfs unmount tank/backup
# Unmount the pool too (also unnecessary to do manually though)
zfs unmount tank
# Export the pool
zpool export tank

Other useful tools

S.M.A.R.T. monitoring and testing can be done with smartmontools, and usually supported even by external and older USB drives.

I normally use just rsync -a, but rsync can also compare files using checksums (the -c option), forcing reading and integrity checks in addition to comparing file contents, which is nice to run occasionally.

For data erasure, dd is handy for wiping both disks and partitions (before decomissioning drives, or if there were unencrypted partitions before), e.g.:

dd status=progress if=/dev/urandom of=/dev/sdX bs=1M
dd status=progress if=/dev/urandom of=/dev/sdXY bs=1M