A person can find all sorts of implementations of backups using hardlink trees to save space for incrementals. Some of them are fairly rudimentary, using rsync --link-dest. Others, like BackupPC, are more sophisticated, doing file-level dedup to a storage pool indexed by a hash.
While these are fairly space-efficient, they are really inefficient in other ways, because they create tons of directory entries. It would not be surprising to find millions of directory entries consumed very quickly. And while any given backup set can be deleted without impact on the others, the act of doing so can be very time-intensive, since often a full directory tree is populated with every day’s backup.
Much better is possible on modern filesystems. ZFS has been around for quite awhile now, and is stable on Solaris, FreeBSD and derivatives, and Linux. btrfs is also being used for real workloads and is considered stable on Linux.
Both have cheap copy-on-write snapshot operations that would work well with a simple rsync --inplace to achieve the same effect ad hardlink farms, but without all the performance penalties. When creating and destroying snapshots is a virtually instantaneous operation, and the snapshots work at a file block level instead of an entire file level, and preserve changing permissions and such as well (which rsync --link-dest can have issues with), why are we not using it more?
BackupPC has a very nice scheduler, a helpful web interface, and a backend that doesn’t have a mode to take advantage of these more modern filesystems. The only tool I see like this is dirvish, which someone made patches for btrfs snapshots three years ago that never, as far as I can tell, got integrated.
A lot of folks are rolling a homegrown solution involving rsync and snapshots. Some are using zfs send / btrfs send, but those mechanisms require the same kind of FS on the machine being backed up as on the destination, and do not permit excluding files from the backup set.
Is this an area that needs work, or am I overlooking something?
Incidentally, hats off to liw’s obnam. It doesn’t exactly do this, but sort of implements its own filesystem with CoW semantics.