UP | HOME

File storage

Corrupted disks, lost data, screwed filesystems, lost files without backups – all those things are pretty sad, and are not too rare. So, I'm going to write down some notes on setting a relatively reliable file storage, with explanations available to newbie users, and covering both hardware and software. The technologies are pretty much the same for home computers and servers, so there will be no distinction – just a common Lignux system is assumed when it comes to software.

Disclaimer: I'm not a system administrator, just occasionally maintaining my personal computers and some servers – so it may or may not be a good source of information on this topic.

1 Hardware

1.1 Physical disks

Apparently Western Digital disks are more reliable than Seagate, and there are HDD reliability statistics around, which suggest that HGST (which are a part of WD now) are even better.

Obviously, backups will require space, so a reliable storage takes a few times more space – depending on a setup.

1.2 RAID

RAID may be either hardware or software, and perhaps a hardware RAID 1 (disk mirroring) is the nicest solution for a few reasons:

  • It's very simple. Simple things tend to be done right, therefore reliable.
  • Software RAIDs consume additional system resources and may be affected by software bugs1.
  • RAID 5 is a simple way to save space, but it is prone to bugs2, may leave you with an exotic format3, and apparently the disks are prone to dying at about the same time. While a simply mirrored disk can be plugged into any other computer anytime.

Once everything is plugged, RAID setup is pretty easy in UEFI (or BIOS on older machines): there's an intuitive UI, and you just group physical devices into RAID arrays, which are later seen from the operating system as regular single disks.

A hardware RAID is not always available, but it's usually supported by workstation motherboards, and servers usually have it.

1.3 Hot spare

Optionally and depending on how much you care about availability, a hot spare is a neat thing to use: once a mirrored disk dies, the data from the other one gets mirrored to a new disk that just waited there all along.

2 Software

There are efforts to make software newbie-friendly, but usually they lead to the situation where newbies don't use it anyway, advanced users can't do what they want to do and have to learn new tools on each system, and the software itself may be buggy and/or overcomplicated under the hood. Even if you are not used to console, chances are that it will still be easier to use standard TUI tools. They are also usable via SSH, or when a windowing system is not set yet or dead already, making them available more often, so they will be used here.

2.1 Partitioning

2.1.1 Things to know

There's a few utilities, perhaps fdisk(8), cfdisk(8), and parted(8) being the most popular ones. They are easy to use (none of those require to read man pages, even), but before jumping in, there's a few basic things to know:

  • Disk devices (physical disks) live in /dev/sdX, where X is a letter.
  • Partitions live in /dev/sdXY, where X is a disk letter, and Y is a partition number. They are roughly like subdisks: you can do pretty much anything with one, without affecting the others.
  • A partition table is a small amount of metadata that says what partitions are available, where they are on the disk, etc. There's a few variations, but GUID Partition Table (GPT) is usually used as of 2016.

Once upon a time, there was a distinction between primary and logical partitions, but it doesn't matter now. Some software, which supports obsolete partition tables, will mention them, but it can usually be ignored with GPT.

2.1.2 Partitions

One may consider setting an EFI System Partition (or a BIOS boot partition) on more than one disk, maybe along with alternative bootloaders and operating systems at once: it doesn't take much space, but may come in handy if a boot disk will die. ArchWiki provides useful information on UEFI bootloaders, including partitioning instructions and requirements.

Modern filesystems allow to add new devices to existing filesystems, so it's fine to leave some space unpartitioned to distribute it later, judging by the actual usage – since it rarely can be predicted accurately.

Partitioning strategies may vary a lot: if one plans to use a single filesystem on a disk, it's possible even to set a filesystem on a disk directly, without any partitioning at all. And if one wants a few filesystems (to mount them with different options and/or at different times, to not lose all the data at once, to use different filesystems, etc) on that disk, one should partition it accordingly: say, separate partitions for /home, for /var altogether and/or some big chunks of it – such as a database.

2.1.3 Programs

  1. fdisk

    Just run fdisk /dev/sdX, and it helps along the way, always reminding about the m command to get help. The common steps are:

    1. Print the partition table (p) to check it.
    2. If it's the initial formatting (not just adding new partitions), create a GPT (g).
    3. Create a new partition (n), what will ask you to:
      • Pick any partition number to refer to it later, the default is just the next unused in order.
      • Pick the first sector, the default is also usually good if you don't care about physical on-disk placement.
      • Set a size (or an end sector).
    4. Optionally, change the filesystem type (t): the default is usually "Linux filesystem", but sometimes you want to select e.g. "EFI System" or "Linux swap".
    5. Print (p) again to verify that everything is right.
    6. Write the changes to disk (w), or quit without writing anything (q).
  2. parted

    parted is also easy to use, though I'm finding it to be more awkward: legacy primary/logical/extended partitions in non-interactive versions of commands, writing the changes at once without an explicit write command, and various other unpleasant bits. Though it does the job.

  3. cfdisk

    cfdisk is curses-based, and is somewhat nice, probably a bit more user-friendly – though the purple color it uses for "free space" is rather ugly and hard to read on regular black background, and its current version from Debian stable repositories (2.25.2) segfaults when trying to invoke help.

2.2 Filesystems

Once we are done with partitions, it's time to create filesystems. mkfs(8) does that, but it's better to check man pages for specific variations, such as mkfs.btrfs(8).

Filesystems are quite different, providing different sets of features, but btrfs is among the nice ones with fancy features, including snapshots, so I'll write about it. mkfs.btrfs /dev/sdXY creates a filesystem on a single device (usually a disk partition), the commonly used -L (or --label) option allows to label it, and other options can be found on its man page or on btrfs wiki.

2.2.1 Mounting

Basic manual mounting is done with mount(8), as mount /dev/sdXY /path/to/mount/point; unmounting – with umount(8), just umount /path/to/mount/point. To mount it automatically on boot, or simply with mount -a, a single line should be added into /etc/fstab. It explains the format already, and refers to the fstab(5) man page, but to make this section complete:

  1. Check the partition UUID (an unique identifier used to refer to it) with blkid /dev/sdXY.
  2. Add a line like this into /etc/fstab:

    UUID=<uuid here> <path to mount point here> btrfs defaults 0 0
    

2.2.2 Snapshots

Here is a non-essential, but fun and useful part.

Snapshots are awesome: they capture the state of a subvolume (possibly the whole filesystem) without full copying unless you update everything (copy-on-write): one can create them before particular actions (such as system update) or regularly, to be able to rollback if something goes wrong, or simply to inspect previous states, easily looking into some points in the past.

Regular read-only snapshots are very useful as backups, too. Let's try them with a btrfs filesystem mounted into /mnt/foo/. The simplest way is to do something like this:

# cd /mnt/foo
# ls
# touch f1
# mkdir snapshot
# btrfs subvolume snapshot -r /mnt/foo /mnt/foo/snapshot/$(date -uIseconds)
Create a readonly snapshot of '/mnt/foo' in '/mnt/foo/snapshot/2016-08-15T00:55:10+0000'
# touch f2
# ls
f1  f2  snapshot
# ls snapshot/2016-08-15T00\:55\:10+0000/
f1  snapshot
# rm f1
# ls
f2  snapshot
# ls snapshot/2016-08-15T00\:55\:10+0000/
f1  snapshot

Now, if anything goes wrong, we can get the old versions of files from the snapshot; a quick way to copy them is with the following command:

cp --reflink=always -a /mnt/foo/snapshot/2016-08-15T00:55:10+0000/. /mnt/foo

Possibly removing the existing ones beforehand, to get a clean copy of the previous state. Although such copying will still take some time: may be about 10 minutes for 400G. A nicer, and more proper way, is to only store snapshots and subvolumes in the btrfs root itself4, and to store files in a subvolume, which can be easily replaced with a snapshot when the time comes.

Let's cleanup the snapshot:

# btrfs subvolume delete snapshot/2016-08-15T00\:55\:10+0000
Transaction commit: none (default)
Delete subvolume '/mnt/foo/snapshot/2016-08-15T00:55:10+0000'

And start anew, storing files in a subvolume this time:

# btrfs subvolume create snapshot/first
Create subvolume 'snapshot/first'
# mv f2 snapshot/first/
# btrfs subvolume list .
ID 256 gen 32 top level 5 path snapshot/first
# btrfs subvolume set-default 256 .
# cd .. && umount foo && mount /dev/sda1 foo && cd foo && ls
f2

To get back to btrfs root, set-default 0 could be used. If we had a lot of files by the time when we've decided to use it that way, perhaps creating a snapshot subvolume would have been faster, but that's fine for an example.

Now, if we want to take snapshots without remounting, but to store them in root, we should mount the root elsewhere:

# mkdir /mnt/bar
# mount /dev/sda1 /mnt/bar -o 'subvolid=0'
# ls /mnt/bar
snapshot

And to take a read-only snapshot:

# btrfs subvolume snapshot -r /mnt/bar/snapshot/first/ /mnt/bar/snapshot/second
Create a readonly snapshot of '/mnt/bar/snapshot/first/' in '/mnt/bar/snapshot/second'
# touch f3
# ls
f2  f3
# ls /mnt/bar/snapshot/second
f2
# ls /mnt/bar/snapshot/first
f2  f3

Let's finally lose the data and rollback to the snapshot:

# rm *
# btrfs subvolume snapshot /mnt/bar/snapshot/second /mnt/bar/snapshot/third
Create a snapshot of '/mnt/bar/snapshot/second' in '/mnt/bar/snapshot/third'
# btrfs subvolume list .
ID 256 gen 41 top level 5 path snapshot/first
ID 257 gen 42 top level 5 path snapshot/second
ID 258 gen 42 top level 5 path snapshot/third
# btrfs subvolume set-default 258 .
# cd .. && umount foo && mount /dev/sda1 foo && cd foo && ls
f2
# btrfs subvolume delete /mnt/bar/snapshot/first
Transaction commit: none (default)
Delete subvolume '/mnt/bar/snapshot/first'

Our data is back, it's done almost instantly (not just because there's a single empty file: snapshots simply work that way), and the old snapshot is removed, not taking any space anymore. We also didn't change our read-only snapshot (ID 257): it's just as if bad things didn't happen (well, apart from losing f3, which wasn't backed up – though if the problem was just with f2, we could have merged the files from different snapshots).

Subvolumes can be renamed easily, with mv:

# mv /mnt/bar/snapshot/second/ /mnt/bar/snapshot/2nd
# btrfs subvolume list .
ID 257 gen 42 top level 5 path snapshot/2nd
ID 258 gen 45 top level 5 path snapshot/third

So composing a basic shell script (to set as a cron job) for snapshot rotation is trivial: say, delete backup-old, move backup-new into backup-old, take a new snapshot into backup-new. Or make a nicer and longer queue with a few more lines of code.

2.2.3 Incremental backups

There's a few scenarios in which having file backups is handy. While snapshots help to capture a moment in time into a file archive, they can also be used for incremental backups: just changes between two read-only snapshots can be stored into a file and sent to another filesystem. Let's create an initial backup file, a new snapshot (since we only have one read-only snapshot at this point), and an incremental backup file:

# mkdir /var/backup
# btrfs send -f /var/backup/2nd /mnt/bar/snapshot/2nd
At subvol /mnt/bar/snapshot/2nd
# btrfs subvolume snapshot -r /mnt/bar/snapshot/third /mnt/bar/snapshot/4th
# btrfs send -f /var/backup/2nd-4th -p /mnt/bar/snapshot/2nd /mnt/bar/snapshot/4th
At subvol /mnt/bar/snapshot/4th

To receive it into a newly created filesystem mounted into /mnt/baz/:

# btrfs receive -f /var/backup/2nd /mnt/baz/
At subvol 2nd
# btrfs receive -f /var/backup/2nd-4th /mnt/baz/
At snapshot 4th
# ls /mnt/baz/
2nd  4th
# btrfs subvolume list /mnt/baz/
ID 256 gen 10 top level 5 path 2nd
ID 257 gen 11 top level 5 path 4th
# ls /mnt/baz/4th/
f2  f4
# ls /mnt/baz/2nd/
f2

We're getting all the same snapshots on the second filesystem.

2.3 DVCS

Sometimes backing up whole filesystems (or subvolumes/directories) at once may be handy, but other times DVCS (git, darcs, mercurial, etc) are much more appropriate: though they should not be considered "backup systems with timestamps", they are very useful in regular work, and it wouldn't harm to push the changes on a remote server even if you are developing something locally: github and VPS are there for hobby projects, work servers are usually used for work projects, and just a neighbouring home machine may serve as a small backup server.

3 Putting it all together

  • RAID helps if a HDD dies.
  • Snapshots help if the data gets corrupted by software.
  • Incremental backups help to keep copies on remote servers, useful if a server dies in fire.

And it's all pretty easy to set.

Footnotes:

1

Hardware and its firmware are usually less volatile, and one may hope that basic mirroring is done right and/or well-tested there.

2

It doesn't seem like it should be, since it's also pretty simple, but it is: every bit of complexity is turned by humans into bugs.

3

Because why would we use simple XOR when we can do something crazy? See also: "RAID 5 is no longer recommended for any business critical information on any drive type".