Tag Archives: btrfs

More Topics on Store-And-Forward (Possibly Airgapped) ZFS and Non-ZFS Backups with NNCP

Note: this is another article in my series on asynchronous communication in Linux with UUCP and NNCP.

In my previous post, I introduced a way to use ZFS backups over NNCP. In this post, I’ll expand on that and also explore non-ZFS backups.

Use of nncp-file instead of nncp-exec

The previous example used nncp-exec (like UUCP’s uux), which lets you pipe stdin in, then queues up a request to run a given command with that input on a remote. I discussed that NNCP doesn’t guarantee order of execution, but that for the ZFS use case, that was fine since zfs receive would just fail (causing NNCP to try again later).

At present, nncp-exec stores the data piped to it in RAM before generating the outbound packet (the author plans to fix this shortly) [Update: This is now fixed; use -use-tmp with nncp-exec!). That made it unusable for some of my backups, so I set it up another way: with nncp-file, the tool to transfer files to a remote machine. A cron job then picks them up and processes them.

On the machine being backed up, we have to find a way to encode the dataset to be received. I chose to do that as part of the filename, so the updated simplesnap-queue could look like this:

#!/bin/bash

set -e
set -o pipefail

DEST="`echo $1 | sed 's,^tank/simplesnap/,,'`"
FILE="bakfsfmt2-`date "+%s.%N".$$`_`echo "$DEST" | sed 's,/,@,g'`"

echo "Processing $DEST to $FILE" >&2
# stdin piped to this
zstd -8 - \
  | gpg --compress-algo none --cipher-algo AES256 -e -r 012345...  \
  | su nncp -c "/usr/local/nncp/bin/nncp-file -nice B -noprogress - 'backupsvr:$FILE'" >&2

echo "Queued $DEST to $FILE" >&2

I’ve added compression and encryption here as well; more on that below.

On the backup server, we would define a different incoming directory for each node in nncp.hjson. For instance:

host1: {
...
   incoming: "/var/local/nncp-bakcups-incoming/host1"
}

host2: {
...
   incoming: "/var/local/nncp-backups-incoming/host2"
}

I’ll present the scanning script in a bit.

Offsite Backup Rotation

Most of the time, you don’t want just a single drive to store the backups. You’d like to have a set. At minimum, one wouldn’t be plugged in so lightning wouldn’t ruin all your backups. But maybe you’d store a second drive at some other location you have access to (friend’s house, bank box, etc.)

There are several ways you could solve this:

  • If the remote machine is at a location with network access and you trust its physical security (remember that although it will store data encrypted at rest and will transport it encrypted, it will — in most cases — handle un-encrypted data during processing), you could of course send NNCP packets to it over the network at the same time you send them to your local backup system.
  • Alternatively, if the remote location doesn’t have network access or you want to keep it airgapped, you could transport the NNCP packets by USB drive to the remote end.
  • Or, if you don’t want to have any kind of processing capability remotely — probably a wise move — you could rotate the hard drives themselves, keeping one plugged in locally and unplugging the other to take it offsite.

The third option can be helped with NNCP, too. One way is to create separate NNCP installations for each of the drives that you store data on. Then, whenever one is plugged in, the appropriate NNCP config will be loaded and appropriate packets received and processed. The neighbor machine — the spooler — would just store up packets for the offsite drive until it comes back onsite (or, perhaps, your airgapped USB transport would do this). Then when it’s back onsite, all the queued up ZFS sends get replayed and the backups replicated.

Now, how might you handle this with NNCP?

The simple way would be to have each system generating backups send them to two destinations. For instance:

zstd -8 - | gpg --compress-algo none --cipher-algo AES256 -e -r 07D5794CD900FAF1D30B03AC3D13151E5039C9D5 \
  | tee >(su nncp -c "/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress - 'backupdisk1:$FILE'") \
        >(su nncp -c "/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress - 'backupdisk2:$FILE'") \
   > /dev/null

You could probably also more safely use pee(1) (from moreutils) to do this.

This has an unfortunate result of doubling the network traffic from every machine being backed up. So an alternative option would be to queue the packets to the spooling machine, and run a distribution script from it; something like this, in part:

INCOMINGDIR="/var/local/nncp-bakfs-incoming"
LOCKFILE="$INCOMINGDIR/.lock"
printf -v EVAL_SAFE_LOCKFILE '%q' "$LOCKFILE"
if dotlockfile -r 0 -l -p "${LOCKFILE}"; then
  logit "Lock obtained at ${LOCKFILE} with dotlockfile"
  trap 'ECODE=$?; dotlockfile -u '"${EVAL_SAFE_LOCKFILE}"'; exit $ECODE' EXIT INT TERM
else
  logit "Could not obtain lock at $LOCKFILE; $0 likely already running."
  exit 0
fi


logit "Scanning queue directory..."
cd "$INCOMINGDIR"
for HOST in *; do
   cd "$INCOMINGDIR/$HOST"
   for FILE in bakfsfmt2-*; do
           if [ -f "$FILE" ]; then
                   for BAKFS in backupdisk1 backupdisk2; do
                           runcommand nncp-file -nice B+5 -noprogress "$FILE" "$BAKFS:$HOST/$FILE"
                   done
                   runcommand rm "$FILE"
           else
                   logit "$HOST: Skipping $FILE since it doesn't exist"
           fi
   done

done
logit "Scan complete."

Security Considerations

You’ll notice that in my example above, the encryption happens as the root user, but nncp is called under su. This means that even if there is a vulnerability in NNCP, the data would still be protected by GPG. I’ll also note here that many sites run ssh as root unnecessarily; the same principles should apply there. (ssh has had vulnerabilities in the past as well). I could have used gpg’s built-in compression, but zstd is faster and better, so we can get good performance by using fast compression and piping that to an algorithm that can use hardware acceleration for encryption.

I strongly encourage considering transport, whether ssh or NNCP or UUCP, to be untrusted. Don’t run it as root if you can avoid it. In my example, the nncp user, which all NNCP commands are run as, has no access to the backup data at all. So even if NNCP were compromised, my backup data wouldn’t be. For even more security, I could also sign the backup stream with gpg and validate that on the receiving end.

I should note, however, that this conversation assumes that a network- or USB-facing ssh or NNCP is more likely to have an exploitable vulnerability than is gpg (which here is just processing a stream). This is probably a safe assumption in general. If you believe gpg is more likely to have an exploitable vulnerability than ssh or NNCP, then obviously you wouldn’t take this particular approach.

On the zfs side, the use of -F with zfs receive is avoided; this could lead to a compromised backed-up machine generating a malicious rollback on the destination. Backup zpools should be imported with -R or -N to ensure that a malicious mountpoint property couldn’t be used to cause an attack. I choose to use “zfs receive -u -o readonly=on” which is compatible with both unmounted backup datasets and zpools imported with -R (or both). To access the data in a backup dataset, you would normally clone it and access it there.

The processing script

So, put this all together and look at an example of a processing script that would run from cron as root and process the incoming ZFS data.

#!/bin/bash
set -e
set -o pipefail

# Log a message
logit () {
   logger -p info -t "`basename "$0"`[$$]" "$1"
}

# Log an error message
logerror () {
   logger -p err -t "`basename "$0"`[$$]" "$1"
}

# Log stdin with the given code.  Used normally to log stderr.
logstdin () {
   logger -p info -t "`basename "$0"`[$$/$1]"
}

# Run command, logging stderr and exit code
runcommand () {
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
}

STORE=backups/simplesnap
INCOMINGDIR=/backups/nncp/incoming

if ! [ -d "$INCOMINGDIR" ]; then
        logerror "$INCOMINGDIR doesn't exist"
        exit 0
fi

LOCKFILE="/backups/nncp/.nncp-backups-zfs-scan.lock"
printf -v EVAL_SAFE_LOCKFILE '%q' "$LOCKFILE"
if dotlockfile -r 0 -l -p "${LOCKFILE}"; then
  logit "Lock obtained at ${LOCKFILE} with dotlockfile"
  trap 'ECODE=$?; dotlockfile -u '"${EVAL_SAFE_LOCKFILE}"'; exit $ECODE' EXIT INT TERM
else
  logit "Could not obtain lock at $LOCKFILE; $0 likely already running."
  exit 0
fi

EXITCODE=0


cd "$INCOMINGDIR"
logit "Scanning queue directory..."
for HOST in *; do
    HOSTPATH="$INCOMINGDIR/$HOST"
    # files like backupsfmt2-134.13134_dest
    for FILE in "$HOSTPATH"/backupsfmt2-[0-9]*_?*; do
        if [ ! -f "$FILE" ]; then
            logit "Skipping non-existent $FILE"
            continue
        fi

        # Now, $DEST will be HOST/DEST.  Strip off the @ also.
        DEST="`echo "$FILE" | sed -e 's/^.*backupsfmt2[^_]*_//' -e 's,@,/,g'`"

        if [ -z "$DEST" ]; then
            logerror "Malformed dest in $FILE"
            continue
        fi
        HOST2="`echo "$DEST" | sed 's,/.*,,g'`"
        if [ -z "$HOST2" ]; then
            logerror "Malformed DEST $DEST in $FILE"
            continue
        fi

        if [ ! "$HOST" = "$HOST2" ]; then
            logerror "$DIR: $HOST doesn't match $HOST2"
            continue
        fi

        logit "Processing $FILE to $STORE/$DEST"
            if runcommand gpg -q -d < "$FILE" | runcommand zstdcat | runcommand zfs receive -u -o readonly=on "$STORE/$DEST"; then
                logit "Successfully processed $FILE to $STORE/$DEST"
                runcommand rm "$FILE"
        else
                logerror "FAILED to process $FILE to $STORE/$DEST"
                EXITCODE=15
        fi

Applying These Ideas to Non-ZFS Backups

ZFS backups made our job easier in a lot of ways:

  • ZFS can calculate a diff based on an efficiently-stored previous local state (snapshot or bookmark), rather than a comparison to a remote state (rsync)
  • ZFS "incremental" sends, while less efficient than rsync, are reasonably efficient, sending only changed blocks
  • ZFS receive detects and enforces that the incremental source on the local machine must match the incremental source of the original stream, enforcing ordering
  • Datasets using ZFS encryption can be sent in their encrypted state
  • Incrementals can be done without a full scan of the filesystem

Some of these benefits you just won't get without ZFS (or something similar like btrfs), but let's see how we could apply these ideas to non-ZFS backups. I will explore the implementation of them in a future post.

When I say "non ZFS", I am being a bit vague as to whether the source, the destination, or both systems are running a non-ZFS filesystem. In general I'll assume that neither are ZFS.

The first and most obvious answer is to just tar up the whole system and send that every day. This is, of course, only suitable for small datasets on a fast network. These tarballs could be unpacked on the destination and stored more efficiently via any number of methods (hardlink trees, a block-level deduplicator like borg or rdedup, or even just simply compressed tarballs).

To make the network trip more efficient, something like rdiff or xdelta could be used. A signature file could be stored on the machine being backed up (generated via tee/pee at stream time), and the next run could simply send an rdiff delta over NNCP. This would be quite network-efficient, but still would require reading every byte of every file on every backup, and would also require quite a bit of temporary space on the receiving end (to apply the delta to the previous tarball and generate a new one).

Alternatively, a program that generates incremental backup files such as rdup could be used. These could be transmitted over NNCP to the backup server, and unpacked there. While perhaps less efficient on the network -- every file with at least one modified byte would be retransmitted in its entirety -- it avoids the need to read every byte of unmodified files or to have enormous temporary space. I should note here that GNU tar claims to have an incremental mode, but it has a potential data loss bug.

There are also some tools with algorithms that may apply well in this use care: syrep and fssync being the two most prominent examples, though rdedup (mentioned above) and the nascent asuran project may also be combinable with other tools to achieve this effect.

I should, of course, conclude this section by mentioning btrfs. Every time I've tried it, I've run into serious bugs, and its status page indicates that only some of them have been resolved. I would not consider using it for something as important as backups. However, if you are comfortable with it, it is likely to be able to run in more constrained environments than ZFS and could probably be processed in much the same way as zfs streams.

Debian-Live Rescue image with ZFS On Linux; Ditched btrfs

I’m a geek. I enjoy playing with different filesystems, version control systems, and, well, for that matter, radios.

I have lately started to worry about the risks of silent data corruption, and as such, looked to switch my personal systems to either ZFS or btrfs, both of which offer built-in checksumming of all data and metadata. I initially opted for btrfs, because of its tighter integration into the Linux kernel and ability to shrink an existing btrfs filesystem.

However, as I wrote last month, that experiment was not a success. I had too many serious performance regressions and one too many kernel panics and decided it wasn’t worth it. And that the SuSE people got it wrong, deeply wrong, when they declared btrfs ready for production. I never lost any data, to its credit. But it simply reduces uptime too much.

That left ZFS. Before I build a system, I always want to make sure I can repair it. So I started with the Debian Live rescue image, and added the zfsonlinux.org repository to it, along with some key packages to enable the ZFS kernel modules, GRUB support, and initramfs support. The resulting image is described, and can be downloaded from, my ZFS Rescue Disc wiki page, which also has a link to my source tree on github.

In future blog posts in the series, I will describe the process of converting existing Debian installations to use ZFS, of getting them to boot from ZFS, some bugs I encountered along the way, and some surprising performance regressions in ZFS compared to ext4 and btrfs.

Results with btrfs and zfs

The recent news that openSUSE considers btrfs safe for users prompted me to consider using it. And indeed I did. I was already familiar with zfs, so considered this a good opportunity to experiment with btrfs.

btrfs makes an intriguing filesystem for all sorts of workloads. The benefits of btrfs and zfs are well-documented elsewhere. There are a number of features btrfs has that zfs lacks. For instance:

  • The ability to shrink a device that’s a member of a filesystem/pool
  • The ability to remove a device from a filesystem/pool entirely, assuming enough free space exists elsewhere for its data to be moved over.
  • Asynchronous deduplication that imposes neither a synchronous performance hit nor a heavy RAM burden
  • Copy-on-write copies down to the individual file level with cp --reflink
  • Live conversion of data between different profiles (single, dup, RAID0, RAID1, etc)
  • Live conversion between on-the-fly compression methods, including none at all
  • Numerous SSD optimizations, including alignment and both synchronous and asynchronous TRIM options
  • Proper integration with the VM subsystem
  • Proper support across the many Linux architectures, including 32-bit ones (zfs is currently only flagged stable on amd64)
  • Does not require excessive amounts of RAM

The feature set of ZFS that btrfs lacks is well-documented elsewhere, but there are a few odd btrfs missteps:

  • There is no way to see how much space subvolume/filesystem is using without turning on quotas. Even then, it is cumbersome and not reported with df like it should be.
  • When a maxmium size for a subvolume is set via a quota, it is not reported via df; applications have no idea when they are about to hit the maximum size of a filesystem.

btrfs would be fine if it worked reliably. I should say at the outset that I have never lost any data due to it, but it has caused enough kernel panics that I’ve lost count. I several times had a file that produced a panic when I tried to delete it, several times when it took more than 12 hours to unmount a btrfs filesystem, behaviors where hardlink-heavy workloads take days longer to complete than on zfs or ext4, and that’s just the ones I wrote about. I tried to use btrfs balance to change the metadata allocation on the filesystem, and never did get it to complete; it seemed to go into an endless I/O pattern after the first 1GB of metadata and never got past that. I didn’t bother trying the live migration of data from one disk to another on this filesystem.

I wanted btrfs to work. I really, really did. But I just can’t see it working. I tried it on my laptop, but had to turn of CoW on my virtual machine’s disk because of the rm bug. I tried it on my backup devices, but it was unusable there due to being so slow. (Also, the hardlink behavior is broken by default and requires btrfstune -r. Yipe.)

At this point, I don’t think it is really all that worth bothering with. I think the SuSE decision is misguided and ill-informed. btrfs will be an awesome filesystem. I am quite sure it will, and will in time probably displace zfs as the most advanced filesystem out there. But that time is not yet here.

In the meantime, I’m going to build a Debian Live Rescue CD with zfsonlinux on it. Because I don’t ever set up a system I can’t repair.