I’ve written a lot lately about ZFS, and one of its very nice features is the ability to make snapshots that are lightweight, space-efficient, and don’t hurt performance (unlike, say, LVM snapshots).
ZFS also has “zfs send” and “zfs receive” commands that can send the content of the snapshot, or a delta between two snapshots, as a data stream – similar in concept to an amped-up tar file. These can be used to, for instance, very efficiently send backups to another machine. Rather than having to stat() every single file on a filesystem as rsync has to, it sends effectively an intelligent binary delta — which is also intelligent about operations such as renames.
Since my last search for backup tools, I’d been using BackupPC for my personal systems. But since I switched them to ZFS on Linux, I’ve been wanting to try something better.
There are a lot of tools out there to take ZFS snapshots and send them to another machine, and I summarized them on my wiki. I found zfSnap to work well for taking and rotating snapshots, but I didn’t find anything that matched my criteria for sending them across the network. It seemed par for the course for these tools to think nothing of opening up full root access to a machine from others, whereas I would much rather lock it down with command= in authorized_keys.
So I wrote my own, called simplesnap. As usual, I wrote extensive documentation for it as well, even though it is very simple to set up and use.
So, with BackupPC, a backup of my workstation took almost 8 hours. (Its “incremental” might take as few as 3 hours) With ZFS snapshots and simplesnap, it takes 25 seconds. 25 seconds!
So right now, instead of backing up once a day, I back up once an hour. There’s no reason I couldn’t back up every 5 minutes, in fact. The data consumes less space, is far faster to manage, and doesn’t require a nightly hours-long cleanup process like BackupPC does — zfs destroy on a snapshot just takes a few seconds.
I use a pair of USB disks for backups, and rotate them to offsite storage periodically. They simply run ZFS atop dm-crypt (for security) and it works quite well even on those slow devices.
Although ZFS doesn’t do file-level dedup like BackupPC does, and the lz4 compression I’ve set ZFS to use is less efficient than the gzip-like compression BackupPC uses, still the backups are more space-efficient. I am not quite sure why, but I suspect it’s because there is a lot less metadata to keep track of, and perhaps also because BackupPC has to store a new copy of a file if even a byte changes, whereas ZFS can store just the changed blocks.
Incidentally, I’ve packaged both zfSnap and simplesnap for Debian and both are waiting in NEW.