Results with btrfs and zfs

The recent news that openSUSE considers btrfs safe for users prompted me to consider using it. And indeed I did. I was already familiar with zfs, so considered this a good opportunity to experiment with btrfs.

btrfs makes an intriguing filesystem for all sorts of workloads. The benefits of btrfs and zfs are well-documented elsewhere. There are a number of features btrfs has that zfs lacks. For instance:

  • The ability to shrink a device that’s a member of a filesystem/pool
  • The ability to remove a device from a filesystem/pool entirely, assuming enough free space exists elsewhere for its data to be moved over.
  • Asynchronous deduplication that imposes neither a synchronous performance hit nor a heavy RAM burden
  • Copy-on-write copies down to the individual file level with cp --reflink
  • Live conversion of data between different profiles (single, dup, RAID0, RAID1, etc)
  • Live conversion between on-the-fly compression methods, including none at all
  • Numerous SSD optimizations, including alignment and both synchronous and asynchronous TRIM options
  • Proper integration with the VM subsystem
  • Proper support across the many Linux architectures, including 32-bit ones (zfs is currently only flagged stable on amd64)
  • Does not require excessive amounts of RAM

The feature set of ZFS that btrfs lacks is well-documented elsewhere, but there are a few odd btrfs missteps:

  • There is no way to see how much space subvolume/filesystem is using without turning on quotas. Even then, it is cumbersome and not reported with df like it should be.
  • When a maxmium size for a subvolume is set via a quota, it is not reported via df; applications have no idea when they are about to hit the maximum size of a filesystem.

btrfs would be fine if it worked reliably. I should say at the outset that I have never lost any data due to it, but it has caused enough kernel panics that I’ve lost count. I several times had a file that produced a panic when I tried to delete it, several times when it took more than 12 hours to unmount a btrfs filesystem, behaviors where hardlink-heavy workloads take days longer to complete than on zfs or ext4, and that’s just the ones I wrote about. I tried to use btrfs balance to change the metadata allocation on the filesystem, and never did get it to complete; it seemed to go into an endless I/O pattern after the first 1GB of metadata and never got past that. I didn’t bother trying the live migration of data from one disk to another on this filesystem.

I wanted btrfs to work. I really, really did. But I just can’t see it working. I tried it on my laptop, but had to turn of CoW on my virtual machine’s disk because of the rm bug. I tried it on my backup devices, but it was unusable there due to being so slow. (Also, the hardlink behavior is broken by default and requires btrfstune -r. Yipe.)

At this point, I don’t think it is really all that worth bothering with. I think the SuSE decision is misguided and ill-informed. btrfs will be an awesome filesystem. I am quite sure it will, and will in time probably displace zfs as the most advanced filesystem out there. But that time is not yet here.

In the meantime, I’m going to build a Debian Live Rescue CD with zfsonlinux on it. Because I don’t ever set up a system I can’t repair.

20 thoughts on “Results with btrfs and zfs

  1. Thomas Gouverneur says:

    The fact that zfs is now more than 10 yrs old is also an advantage over zfs. Probably also why it work better then btrfs nowadays :-) i remember the begining of zfs and that wasn’t all pink either :-D

  2. Steven C. says:

    Bear in mind that SuSE may have a more recent btrfs implementation than Debian kernels, and/or many patches on top of it. My first and last experience of btrfs was in Debian squeeze, and I lost data due to a filesystem that after the first unclean shutdown, would trigger a panic on mount, and would indeterministically trigger segfaults in the only recovery tool available at the time.

    btw., a wheezy GNU/kFreeBSD install CD can be used to manually rescue a ZFS pool if needed – just drop to a shell after loading the zfs d-i components. I’m only mentioning it as an option for anyone who ends up in this situation; a pre-prepared Live CD with zfsonlinux would be ideal though in this case.

  3. jrssnet says:

    John, what version of the Linux kernel and of btrfs-progs (aka sometimes as btrfs-tools) were you using? I’ve been testing pretty heavily using Ubuntu Precise with a 3.11 kernel (from Saucy) and the version of btrfs-tools from Debian Sid (b/c the one in Ubuntu is from 2010!) and I have had MUCH more stable results than you’re reporting here, in the course of flogging the bloody hell out of btrfs TRYING to kill it, and having it in “test production” on my main work laptop and main work workstation, and recently on one client’s VM hosting servers.

    1. jgoerzen says:

      I was testing with 3.10 and 3.11 in Debian, with the latest tools. But from what I can tell, the bugs I have run into have not been fixed or even border on design deficiencies. The only way to see how much space a subvolume uses is to turn on quotas (qgroups), and qgroups seem to be quite buggy (try btrfs quota enable on your vm hosting server and you’ll see what I mean.)

      My test using btrfs for backuppc storage did not go well at all. It has maybe 10 million files, each with between 10 and 40 hardlinks to it. My problems with umount taking 12+ hours, balance never finishing, etc. happened there.

      But even on the laptop running 3.11, I had a kernel oops when running btrfs fi defrag -clzo.

      There are also well-understood issues with commands failing with ENOSPC even when there is space available on disk, etc.

      I do not buy the argument that it is suddenly production-ready with one latest kernel rev or one more batch of fixes. It will get there, but not this fast.

      I have no doubt that your experience is accurate. But stable for a filesystem doesn’t just mean stable for one; it should mean stable for all.

  4. Hi John,

    we are a bit astonished at your poor BTRFS experience. We use BTRFS (admittedly on a self-compiled 3.12 kernel and with current btrfs-tools) on our backup SAN to backup 230+ TB of data using compressed snapshots (see https://admin.phys.ethz.ch/lvm2FS/#phd-bkp-gw) and so far BTRFS has done a magnificent job. If we do have problems, it’s always hardware or iSCSI related, and even then BTRFS has been very lenient and stable. As someone else suggested earlier: BTRFS progress is so fast that you probably need to use a current kernel.

    thanks,
    -Christian


    Dr. Christian Herzog support: +41 44 633 26 68
    IT Services Group, HPT H 8 voice: +41 44 633 39 50
    Department of Physics, ETH Zurich
    8093 Zurich, Switzerland http://nic.phys.ethz.ch/

  5. Joel Johnson says:

    I’m using btrfs on a desktop single drive volume, however I’ll add an additional strong caution for using it in any RAID setup – in short, DON’T.

    My experiences with trying it on a simple mirror on two drives are in the following list thread:

    http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg26778.html

  6. jrssnet says:

    No argument about space reporting. =) That’s not really a “design deficiency” problem, though, it’s more split halfway in between “this is a lot more complex on a next-gen filesystem than it was on traditional filesystems” and “devs aren’t really interested in tackling that yet because they have more interesting stuff to do.” (I smile, but I’m serious here…)

    The kernel panics you’re reporting are what was surprising me. I haven’t encountered anything like that, even when running scrubs against several terabytes of data, or rebalancing filesystems from one RAID implementation to the other.

    The only “hard error” I’ve personally encountered so far has been in doing a btrfs send | btrfs receive between two boxes with a little under a terabyte of data; one time – and only one time – the sending machine crashed the btrfs send process with an inexplicable ioctl cannot allocate memory error (inexplicable b/c this is a 32GB box with only 9-ish GB used).

    My personal take on where it is right now is it’s ready for “production testing” – I have it on two of my own machines that I rely on, and have it at one client – but I don’t think it’s ready for production *in scale* just yet, or at the very least *I’m* not ready to support it in production in scale.

    The rock-and-hard-place dilemma is that if your choice is btrfs or [anything else other than zfs], a LOT of hassle starts looking worth it… and distributing zfs presents thorny problems because of license conflicts, so… *sigh*.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.