Search for Backup Tools

Since the last time I went looking for backup software, I’ve still be using rdiff-backup.

It’s nice, except for one thing: it always keeps an uncompressed copy of your current state on the disk. This is becoming increasingly annoying.

I did some tests with dar and BackupPC, and both saved considerable disk space over rdiff-backup. The problem with dar, or compressed full/incrementals with tar, is that eventually you have to make a new full backup. You have to do that, *then* delete all your old fulls and incrementals, so there will be times when you have to store a full backup twice.

The hardlinking approach sounds good. It’s got a few problems, too. One is that it can lose metadata about, ironically enough, hard links. Another is that few of the hard linking programs offer a compressed on-disk format. Here’s what I’ve been looking at:

BackupPC

Nice on the service. I’m a bit annoyed that it’s web-driven rather than commandline-driven, but I can look past that. I can also look past that it won’t let me clamp down on ssh access as much as I’d like.

BackupPC writes metadata to disk alongside files, so it can restore hard links, symlinks, device entries, and the like. It also has the nice feature of being able to hard link identical files across machines, so if you’re backing up /usr on a bunch of machines and have the same files installed, you save space. Nice.

BackupPC also can compress the files on your disk. It uses pre-compression md5sums for identifying files to hard link, which is nice.

Here’s where I get nervous.

BackupPC doesn’t just use regular compression, from say gzip or bzip2. It uses its own low-level algorithm centered around the Perl deflate library. And it does it in a nonstandard way owing to a supposed memory issue with zlib. Why they don’t just pipe it through gzip or equivalent is beyond me.

This means that, first off, it’s using a nonstandard compression format, which makes me nervous to begin with. If that weren’t annoying enough, you have to install Perl plus a bunch of modules to extract the thing. This makes me nervous too.

Dirvish

Doesn’t support compression.

faubackup

Doesn’t support compression.

rdup

Supports compression and encryption. Does not preserve ownership of things unless the destination filesystem does (meaning you must run as root to store your backups.)

Killer lack of feature: it does not preserve knowledge about what was hardlinked on the source system, so when you restore your backup, all hardlinks are lost. Epic fail.

rsnapshot

Doesn’t support compression.

StoreBackup

Does support compression, appears to restore metadata in a sane way. Supports backing up to a different machine on the LAN, but only if you set up NFS. Looks inappropriate for doing backups over VPN. Comprehensive, though confusing, manual. Looks like an oddball design with an oddball manual.

So, any suggestions?

17 thoughts on “Search for Backup Tools

  1. Write a script that decompresses the backup, run rdiff-backup, and compress again.

    I have done similar things and it works perfectly (and let’s hope Murphy doesn’t get involved if I need the backup :-) )

  2. stlman says:

    There is a perl script called flexbackup. It is very good and… flexible :-) It supports many archivers and compressors and adding your own isn’t too hard either.
    However, it hasn’t been developed for few years, there isn’t anything it lacked for me.

    I converted to rsync(1) incremental backup with hard links because it was quite annoying to search for files in eg. 20GiB bzipped tar. (3 MB/s)

    http://freshmeat.net/projects/flexbackup/

  3. toupeira says:

    At my workplace we’re also using rdiff-backup to manage almost 3TB now, and rsync to replicate the backups to other, identical systems. But currently we’re considering moving everything to a central OpenSolaris storage system, and just do backups using ZFS’ snapshots, which are basically instantaneous.

    1. stlman says:

      Tell me Mr. Toupeira, what good is a backup when you’re unable to restore it?

      Snapshots are not backups they’re just checkpoints which prevent overwriting blocks “created” before the checkpoint. If some data in such block are to be updated copy-on-write is performed hence the instantaneousness. Snapshot are very useful for backups because you get consistent, even on application level, filesystem state. But they are NOT in any way safer than the data you work with. They are on the same hardware.

      1. thedward says:

        One of the neat things about ZFS is zfs send / zfs receive; You can serialize and deserialize snapshots or diffs between snapshots. It makes it easy to synchronize your local snapshots with a remote system.

      2. toupeira says:

        I’m aware how ZFS snapshots work, and they *can* be restored. We’re planning to still have 3 separate systems and replicate the data, using zfs send/receive as thedward mentioned.

  4. alex says:

    rdiff-backup on a compressed FS?

  5. John,

    I think your nervousness over BackupPC may be unwarranted.

    Looking at the code, what it’s producing is simply headerless gzip compression—anything that provides an interface to zlib should be able to chew on it.

    The allowances it’s making for memory usage issues would appear to have no more impact than perhaps lowering the compression level by flushing excessively.

    Oh, and it appears it appends a record of rsync meta-information that is presumably there to allow it to avoid unnecessary transfers without having to decompress unchanged files.

    Honestly, I might look at using it for myself

  6. chris burkhardt says:

    duplicity is by the same author as rdiffbackup, but instead of keeping a mirror of the current state, everything is tar’d and gpg’d:

    [url=http://duplicity.nongnu.org/]http://duplicity.nongnu.org/[/url]

    1. chris burkhardt says:

      Oh, I just read the other entry you referenced and see you’ve already considered and rejected duplicity.

  7. ramune says:

    There’s also Bacula. We currently use it at work and it supports restoring hard links, manages pools of media, can auto-label tapes, back up to files (ala VTL), and so on.

    The main gripe I have is the user interface. It’s pretty much unusable without running under rlfe. Despite that, though, I found its features and reliability good enogh to stick with, despite the klunky interface.

    It is lacking a bit in configuration options and tweaks compared to commercial products, but I found it more reliable than any other open-source backup software out there.

  8. solrize says:

    I just use tar and/or rsync, but have been wanting to look at veracity:

    http://taobackup.org

  9. Miek Gieben says:

    Hi,

    I’m the author of rdup and rdup DOES support hardlinks as of version 0.6.0. This ofcourse only works when the hardlinked files are all contained in your
    backup.

  10. Kai Hendry says:

    I stole some ideas from Stuart Langridge.

    Here are the backup scripts I use at work.

    Ok, it doesn’t use compression for storage, though disk space is cheap and I’d rather have fast non-CPU intensive backup runs.

  11. Alec Berryman says:

    I’ve been looking around for an alternative to rsnapshot. I think that gibak is promising – http://eigenclass.org/hiki/gibak-backup-system-introduction.

    It’s based on git, appears to get all the metadata issues right, stores no uncompressed copies (just use a bare repository). It doesn’t do compression, but you can use encfs or something similar for that.

    The downside is that it’s rough and inflexible. I think it will take a nontrivial amount of work to have it do daily/weekly/monthly schemes, expire content, not just backup the home directory, and other important features.

  12. Theo Band says:

    Dump and restore is what I have used for several years without any problem. All data resides on a ext3 filesystem. Dump creates full compressed backups and can also make increment backups. Before I dump the filesystem, I first make a snapshot using LVM to have a consistent state of the filesystem during the backup. I am now playing with rdiff-backup. Advantage is that data can be quickly found and retrieved, which is more cumbersome with the compressed dumps.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.