Backing up every few minutes with simplesnap

February 13th, 2014

I’ve written a lot lately about ZFS, and one of its very nice features is the ability to make snapshots that are lightweight, space-efficient, and don’t hurt performance (unlike, say, LVM snapshots).

ZFS also has “zfs send” and “zfs receive” commands that can send the content of the snapshot, or a delta between two snapshots, as a data stream – similar in concept to an amped-up tar file. These can be used to, for instance, very efficiently send backups to another machine. Rather than having to stat() every single file on a filesystem as rsync has to, it sends effectively an intelligent binary delta — which is also intelligent about operations such as renames.

Since my last search for backup tools, I’d been using BackupPC for my personal systems. But since I switched them to ZFS on Linux, I’ve been wanting to try something better.

There are a lot of tools out there to take ZFS snapshots and send them to another machine, and I summarized them on my wiki. I found zfSnap to work well for taking and rotating snapshots, but I didn’t find anything that matched my criteria for sending them across the network. It seemed par for the course for these tools to think nothing of opening up full root access to a machine from others, whereas I would much rather lock it down with command= in authorized_keys.

So I wrote my own, called simplesnap. As usual, I wrote extensive documentation for it as well, even though it is very simple to set up and use.

So, with BackupPC, a backup of my workstation took almost 8 hours. (Its “incremental” might take as few as 3 hours) With ZFS snapshots and simplesnap, it takes 25 seconds. 25 seconds!

So right now, instead of backing up once a day, I back up once an hour. There’s no reason I couldn’t back up every 5 minutes, in fact. The data consumes less space, is far faster to manage, and doesn’t require a nightly hours-long cleanup process like BackupPC does — zfs destroy on a snapshot just takes a few seconds.

I use a pair of USB disks for backups, and rotate them to offsite storage periodically. They simply run ZFS atop dm-crypt (for security) and it works quite well even on those slow devices.

Although ZFS doesn’t do file-level dedup like BackupPC does, and the lz4 compression I’ve set ZFS to use is less efficient than the gzip-like compression BackupPC uses, still the backups are more space-efficient. I am not quite sure why, but I suspect it’s because there is a lot less metadata to keep track of, and perhaps also because BackupPC has to store a new copy of a file if even a byte changes, whereas ZFS can store just the changed blocks.

Incidentally, I’ve packaged both zfSnap and simplesnap for Debian and both are waiting in NEW.

Categories: Linux

Tags: , , , Leave a comment

Comments Feed5 Comments

  1. István Pongrácz

    Hi,
    Thanks for your tool!
    I tried it and I found that, at the initial backup (first run) it failed, because the first dataset on activehost did not exist in the backuphost.
    Example:
    activehost, dataset required to backup: tank/home/user/Dropbox’
    The invocation on the backuphost should be this one:
    simplesnap –host activehost –setname mainset –store zbackup/backup –sshcmd “ssh -i /root/.ssh/id_rsa_simplesnap”

    The error message in the syslog:
    Apr 3 20:01:30 fp3pro01 simplesnap[9294]: Invoked as: /usr/local/sbin/simplesnap –host 192.168.0.114 –setname mainset –store zbackup/backup –sshcmd ssh -i /root/.ssh/id_rsa_simplesnap
    Apr 3 20:01:31 fp3pro01 simplesnap[9294]: Store zbackup/backup is mounted at /zbackup/backup
    Apr 3 20:01:31 fp3pro01 simplesnap[9294]: Running /sbin/zfs create zbackup/backup/192.168.0.114
    Apr 3 20:01:31 fp3pro01 simplesnap[9294]: /sbin/zfs exited successfully
    Apr 3 20:01:31 fp3pro01 simplesnap[9294]: Lock obtained at /zbackup/backup/192.168.0.114/.lock with dotlockfile
    Apr 3 20:01:31 fp3pro01 simplesnap[9294]: Finding remote datasets to back up
    Apr 3 20:01:31 fp3pro01 simplesnap[9294]: Running ssh -i /root/.ssh/id_rsa_simplesnap 192.168.0.114 simplesnapwrap listfs
    Apr 3 20:01:32 fp3pro01 simplesnap[9294]: ssh exited successfully
    Apr 3 20:01:32 fp3pro01 simplesnap[9294]: Running ssh -i /root/.ssh/id_rsa_simplesnap 192.168.0.114 simplesnapwrap sendback mainset tank/home/user/Dropbox
    Apr 3 20:01:32 fp3pro01 simplesnap[9294]: Running /sbin/zfs receive -F zbackup/backup/192.168.0.114/tank/home/user/Dropbox
    Apr 3 20:01:33 fp3pro01 simplesnap[9294//sbin/zfs]: cannot open ‘zbackup/backup/192.168.0.114/tank/home/user/Dropbox’: dataset does not exist
    Apr 3 20:01:33 fp3pro01 simplesnap[9294//sbin/zfs]: cannot receive new filesystem stream: dataset does not exist
    Apr 3 20:01:33 fp3pro01 simplesnap[9294]: /sbin/zfs exited with error 1

    I assumed the simplesnap will create the missing datasets on the backuphost, but as I checked the code, it will not?
    Am I correct?
    Thank you again!
    István

    Reply

  2. John Goerzen

    It sounds as if you excluded tank/home/user on the remote side. Did you set the org.complete.simplesnap:exclude property there? If so, then it’s doing what you asked it to (you could zfs create zbackup/backup/192.168.0.114/tank, tank/home, tank/home/user, etc. on the local)

    Reply

    István Pongrácz Reply:

    Thanks for your quick answer!
    Correct, I excluded everything else :)
    Ok, in this case this was a user error. I tried to avoid the manual dataset creation on the backuphost.
    Anyway, as the final structure will be different, there will be no problem with it, but in this test system it will create 200GB vs. 2GB if I enable the user’s home, too :)
    Thanks!

    Reply

  3. Stan

    John,

    I run Nas4free and want to send snapshots to a remote Freenas box and your description of your script looks enticing! I’d love to try it out. Would you please point me at it?

    Thanks,

    Stan

    Reply

Leave a comment

 

Feed

http://changelog.complete.org / Backing up every few minutes with simplesnap