In my previous articles in the series on asynchronous communication with the modern NNCP tool, I talked about its use for asynchronous, potentially airgapped, backups. The first article, How & Why To Use Airgapped Backups laid out the foundations for this. Now let’s dig into the details.
Today’s post will cover ZFS, because it has a lot of features that make it very easy to support in this setup. Non-ZFS backups will be covered later.
The setup is actually about as simple as it is for SSH, but since people are less familiar with this kind of communication, I’m going to try to go into more detail here.
Assumptions
I am assuming a setup where:
- The machines being backed up run ZFS
- The disk(s) that hold the backups are also running ZFS
- zfs send / receive is desired as an efficient way to transport the backups
- The machine that holds the backups may have no network connection whatsoever
- Backups will be sent encrypted over some sort of network to a spooling machine, which temporarily holds them until they are transported to the destination backup system and ingested there. This system will be unable to decrypt the data streams it temporarily stores.
Hardware
Let’s start with hardware for the machine to hold the backups. I initially considered a Raspberry Pi 4 with 8GB of RAM. That would probably have been a suitable machine, at least for smaller backup sets. However, none of the Raspberry Pi machines support hardware AES encryption acceleration, and my Pi4 benchmarks as about 60MB/s for AES encryption. I want my backups to be encrypted, and decided this would just be too slow for my purposes. Again, if you don’t need encrypted backups or don’t care that much about performance — may people probably fall into this category — you can have a fully-functional Raspberry Pi 4 system for under $100 that would make a fantastic backup server.
I wound up purchasing a Qotom-Q355G4 micro PC with a Core i5 for about $315. It has USB 3 ports and is designed as a rugged, long-lasting system. I have been using one of their older Celeron-based models as my router/firewall for a number of years now and it’s been quite reliable.
For backup storage, you can get a USB 3 external drive. My own preference is to get a USB 3 “toaster” (device that lets me plug in SATA drives) so that I have more control over the underlying medium and can save the expense and hassle of a bunch of power supplies. In a future post, I will discuss drive rotation so you always have an offline drive.
Then, there is the question of transport to the backup machine. A simple solution would be to have a heavily-firewalled backup system that has no incoming ports open but makes occasional outgoing connections to one specific NNCP daemon on the spooling machine. However, for airgapped operation, it would also be very simple to use nncp-xfer to transport the data across on a USB stick or some such. You could set up automounting for a specific USB stick – plug it in, all the spooled data is moved over, then plug it in to the backup system and it’s processed, and any outbound email traffic or whatever is copied to the USB stick at that point too. The NNCP page has some more commentary about this kind of setup.
Both are fairly easy to set up, and NNCP is designed to be transport-agnostic, so in this article I’m going to focus on how to integrate ZFS with NNCP.
Operating System
Of course, it should be no surprise that I set this up on Debian.
As an added step, I did all the configuration in Ansible stored in a local git repo. This adds a lot of work, but it means that it is trivial to periodically wipe and reinstall if any security issue is suspected. The git repo can be copied off to another system for storage and takes the system from freshly-installed to ready-to-use state.
Security
There is, of course, nothing preventing you from running NNCP as root. The zfs commands, obviously, need to be run as root. However, from a privilege separation standpoint, I have chosen to run everything relating to NNCP as a nncp user. NNCP already does encryption, but if you prefer to have zero knowledge of the data even to NNCP, it’s trivial to add gpg to the pipeline as well, and in fact I’ll be demonstrating that in a future post for other reasons.
Software
Besides NNCP, there needs to be a system that generates the zfs send streams. For this project, I looked at quite a few. Most were designed to inspect the list of snapshots on a remote end, compare it to a list on the local end, and calculate a difference from there. This, of course, won’t work for this situation.
I realized my own simplesnap project was very close to being able to do this. It already used an algorithm of using specially-named snapshots on the machine being backed up, so never needed any communication about what snapshots were present where. All it needed was a few more options to permit sending to a stream instead of zfs receive. I made those changes and they are available in simplesnap 2.0.0 or above. That version has also been uploaded to sid, and will work fine as-is on buster as well.
Preparing NNCP
I’m going to assume three hosts in this setup:
- laptop is the machine being backed up. Of course, you may have quite a few of these.
- spooler holds the backup data until the backup system picks it up
- backupsvr holds the backups
The basic NNCP workflow documentation covers the basic steps. You’ll need to run nncp-cfgnew on each machine. This generates a basic configuration, along with public and private keys for that machine. You’ll copy the public key sets to the configurations of the other machines as usual. On the laptop, you’ll add a via line like this:
backupsvr: { id: .... exchpub: ... signpub: ... noisepub: ... via: ["spooler"]
This tells NNCP that data destined for backupsvr should always be sent via spooler first.
You can then arrange for the nncp-daemon to run on the spooler, and nncp-caller or nncp-call on the backupsvr. Or, alternatively, airgapped between the two with nncp-xfer.
Generating Backup Data
Now, on the laptop, install simplesnap (2.0.0 or above). Although you won’t be backing up to the local system, simplesnap still maintains a hostlock in ZFS. Prepate a dataset for it:
zfs create tank/simplesnap zfs set org.complete.simplesnap:exclude=on tank/simplesnap
Then, create a script /usr/local/bin/runsimplesnap like this:
#!/bin/bash set -e simplesnap --store tank/simplesnap --setname backups --local --host `hostname` \ --receivecmd /usr/local/bin/simplesnap-queue \ --noreap su nncp -c '/usr/local/nncp/bin/nncp-toss -noprogress -quiet' if ip addr | grep -q 192.168.65.64; then su nncp -c '/usr/local/nncp/bin/nncp-call -noprogress -quiet -onlinedeadline 1 spooler' fi
The call to simplesnap sets it up to send the data to simplesnap-queue, which we’ll create in a moment. The –receivmd, plus –noreap, sets it up to run without ZFS on the local system.
The call to nncp-toss will process any previously-received inbound NNCP packets, if there are any. Then, in this example, we do a very basic check to see if we’re on the LAN (checking 192.168.65.64), and if so, will establish a connection to the spooler to transmit the data. If course, you could also do this over the Internet, with tor, or whatever, but in my case, I don’t want to automatically do this in case I’m tethered to mobile. I figure if I want to send backups in that case, I can fire up nncp-call myself. You can also use nncp-caller to set up automated connections on other schedules; there are a lot of options.
Now, here’s what /usr/local/bin/simplesnap-queue looks like:
#!/bin/bash set -e set -o pipefail DEST="`echo $1 | sed 's,^tank/simplesnap/,,'`" echo "Processing $DEST" >&2 # stdin piped to this su nncp -c "/usr/local/nncp/bin/nncp-exec -nice B -noprogress backupsvr zfsreceive '$DEST'" >&2 echo "Queued for $DEST" >&2
This is a pretty simple script. simplesnap will call it with a path based on the –store, with the hostname after; so, for instance, tank/simplesnap/laptop/root or some such. This script strips off the leading tank/simplesnap (which is a local fragment), leaving the host and dataset paths. Then it just pipes it to nncp-exec. -nice B classifies it as low-priority bulk data (so if you have some more important interactive data, it would be sent first), then passes it to whatever the backupsvr defines as zfsreceive.
Receiving ZFS backups
In the NNCP configuration on the recipient’s side, in the laptop section, we define what command it’s allowed to run as zfsreceive:
exec: { zfsreceive: ["/usr/bin/sudo", "-H", "/usr/local/bin/nncp-zfs-receive"] }
We authorize the nncp user to run this under sudo in /etc/sudoers.d/local–nncp:
Defaults env_keep += "NNCP_SENDER" nncp ALL=(root) NOPASSWD: /usr/local/bin/nncp-zfs-receive
The NNCP_SENDER is the public key ID of the sending node when nncp-toss processes the incoming data. We can use that for sanity checking later.
Now, here’s a basic nncp-zfs-receive script:
#!/bin/bash set -e set -o pipefail STORE=backups/simplesnap DEST="$1" # now process stdin runcommand zfs receive -o readonly=on -x mountpoint "$STORE/$DEST"
And there you have it — all the basics are in place.
Update 2020-12-30: An earlier version of this article had “zfs receive -F” instead of “zfs receive -o readonly=on -x mountpoint”. These changed arguments are more robust.
Update 2021-01-04: I am now recommending “zfs receive -u -o readonly=on”; see my successor article for more.
Enhancements
You could enhance the nncp-zfs-receive script to improve logging and error handling. For instance:
#!/bin/bash set -e set -o pipefail STORE=backups/simplesnap # $1 will be the host/dataset DEST="$1" HOST="`echo "$1" | sed 's,/.*,,g'`" if [ -z "$HOST" ]; then echo "Malformed command line" exit 5 fi # Log a message logit () { logger -p info -t "`basename "$0"`[$$]" "$1" } # Log an error message logerror () { logger -p err -t "`basename "$0"`[$$]" "$1" } # Log stdin with the given code. Used normally to log stderr. logstdin () { logger -p info -t "`basename "$0"`[$$/$1]" } # Run command, logging stderr and exit code runcommand () { logit "Running $*" if "$@" 2> >(logstdin "$1") ; then logit "$1 exited successfully" return 0 else RETVAL="$?" logerror "$1 exited with error $RETVAL" return "$RETVAL" fi } exiterror () { logerror "$1" echo "$1" 1>&2 exit 10 } # Sanity check if [ "$HOST" = "laptop" ]; then if [ "$NNCP_SENDER" != "12345678" ]; then exiterror "Host $HOST doesn't match sender $NNCP_SENDER" fi else exiterror "Unknown host $HOST" fi runcommand zfs receive -F "$STORE/$DEST"
Now you’ll capture the ZFS receive output in syslog in a friendly way, so you can look back later why things failed if they did.
Further notes on NNCP
nncp-toss will examine the exit code from an invocation. If it is nonzero, it will keep the command (and associated stdin) in the queue and retry it on the next invocation. NNCP does not guarantee order of execution, so it is possible in some cases that ZFS streams may be received in the wrong order. That is fine here; zfs receive will exit with an error, and nncp-toss will just run it again after the dependent snapshots have been received. For non-ZFS backups, a simple sequence number can handle this issue.
John Goerzen: Airgapped / Asynchronous Backups with #ZFS over #NNCP changelog.complete.org/archives/10175…
Thanks for the details! Assuming 1 doesn’t have the asynchronous requirement, how is this superior to simply having a cron job enable the NIC, pull backups from machines (& update/patch the server), & then disable the NIC? That seems much easier IMO
That is pretty much what this would be described as. NNCP gives an asynchronous transport all the way to the spooler machine, and then a transport-agnostic (LAN, airgapped, whatever) from there to the processing machine. I’m also going to write a followup article about separate backup disks and how they can be handled from the spooler machine perspective.
That is pretty much what this would be described as. NNCP gives an asynchronous transport all the way to the spooler machine, and then a transport-agnostic (LAN, airgapped, whatever) from there to the processing machine. I’m also going to write a followup article w/more
OK, & yeah please do. Pretty informative so far
Note: this is another article in my series on asynchronous communication in Linux with UUCP and NNCP.
In my previous post, I introduced a way to use ZFS backups over NNCP. In this post, I’ll expand on that and also explore non-ZFS backups.
Use of nncp-file instead of nncp-exec
The previous example used nncp-exec (like UUCP’s uux), which lets you pipe stdin in, then queues up a request to run a given command with that input on a remote. I discussed that NNCP doesn’t guarantee order of execution, but that for the ZFS use case, that was fine since zfs receive would just fail (causing NNCP to try again later).
At present, nncp-exec stores the data piped to it in RAM before generating the outbound packet (the author plans to fix this shortly) [Update: This is now fixed; use -use-tmp with nncp-exec!). That made it unusable for some of my backups, so I set it up another way: with nncp-file, the tool to transfer files to a remote machine. A cron job then picks them up and processes them.
On the machine being backed up, we have to find a way to encode the dataset to be received. I chose to do that as part of the filename, so the updated simplesnap-queue could look like this:
#!/bin/bash
set -e
set -o pipefail
DEST=”`echo $1 | sed ‘s,^tank/simplesnap/,,’`”
FILE=”bakfsfmt2-`date “+%s.%N”.$$`_`echo “$DEST” | sed ‘s,/,@,g’`”
echo “Processing $DEST to $FILE” >&2
# stdin piped to this
zstd -8 –
| gpg –compress-algo none –cipher-algo AES256 -e -r 012345…
| su nncp -c “/usr/local/nncp/bin/nncp-file -nice B -noprogress – ‘backupsvr:$FILE'” >&2
echo “Queued $DEST to $FILE” >&2
I’ve added compression and encryption here as well; more on that below.
On the backup server, we would define a different incoming directory for each node in nncp.hjson. For instance:
host1: {
…
incoming: “/var/local/nncp-bakcups-incoming/host1”
}
host2: {
…
incoming: “/var/local/nncp-backups-incoming/host2”
}
I’ll present the scanning script in a bit.
Offsite Backup Rotation
Most of the time, you don’t want just a single drive to store the backups. You’d like to have a set. At minimum, one wouldn’t be plugged in so lightning wouldn’t ruin all your backups. But maybe you’d store a second drive at some other location you have access to (friend’s house, bank box, etc.)
There are several ways you could solve this:
If the remote machine is at a location with network access and you trust its physical security (remember that although it will store data encrypted at rest and will transport it encrypted, it will — in most cases — handle un-encrypted data during processing), you could of course send NNCP packets to it over the network at the same time you send them to your local backup system.
Alternatively, if the remote location doesn’t have network access or you want to keep it airgapped, you could transport the NNCP packets by USB drive to the remote end.
Or, if you don’t want to have any kind of processing capability remotely — probably a wise move — you could rotate the hard drives themselves, keeping one plugged in locally and unplugging the other to take it offsite.
The third option can be helped with NNCP, too. One way is to create separate NNCP installations for each of the drives that you store data on. Then, whenever one is plugged in, the appropriate NNCP config will be loaded and appropriate packets received and processed. The neighbor machine — the spooler — would just store up packets for the offsite drive until it comes back onsite (or, perhaps, your airgapped USB transport would do this). Then when it’s back onsite, all the queued up ZFS sends get replayed and the backups replicated.
Now, how might you handle this with NNCP?
The simple way would be to have each system generating backups send them to two destinations. For instance:
zstd -8 – | gpg –compress-algo none –cipher-algo AES256 -e -r 07D5794CD900FAF1D30B03AC3D13151E5039C9D5
| tee >(su nncp -c “/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress – ‘backupdisk1:$FILE'”)
>(su nncp -c “/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress – ‘backupdisk2:$FILE'”)
> /dev/null
You could probably also more safely use pee(1) (from moreutils) to do this.
This has an unfortunate result of doubling the network traffic from every machine being backed up. So an alternative option would be to queue the packets to the spooling machine, and run a distribution script from it; something like this, in part:
INCOMINGDIR=”/var/local/nncp-bakfs-incoming”
LOCKFILE=”$INCOMINGDIR/.lock”
printf -v EVAL_SAFE_LOCKFILE ‘%q’ “$LOCKFILE”
if dotlockfile -r 0 -l -p “${LOCKFILE}”; then
logit “Lock obtained at ${LOCKFILE} with dotlockfile”
trap ‘ECODE=$?; dotlockfile -u ‘”${EVAL_SAFE_LOCKFILE}”‘; exit $ECODE’ EXIT INT TERM
else
logit “Could not obtain lock at $LOCKFILE; $0 likely already running.”
exit 0
fi
logit “Scanning queue directory…”
cd “$INCOMINGDIR”
for HOST in *; do
cd “$INCOMINGDIR/$HOST”
for FILE in bakfsfmt2-*; do
if [ -f “$FILE” ]; then
for BAKFS in backupdisk1 backupdisk2; do
runcommand nncp-file -nice B+5 -noprogress “$FILE” “$BAKFS:$HOST/$FILE”
done
runcommand rm “$FILE”
else
logit “$HOST: Skipping $FILE since it doesn’t exist”
fi
done
done
logit “Scan complete.”
Security Considerations
You’ll notice that in my example above, the encryption happens as the root user, but nncp is called under su. This means that even if there is a vulnerability in NNCP, the data would still be protected by GPG. I’ll also note here that many sites run ssh as root unnecessarily; the same principles should apply there. (ssh has had vulnerabilities in the past as well). I could have used gpg’s built-in compression, but zstd is faster and better, so we can get good performance by using fast compression and piping that to an algorithm that can use hardware acceleration for encryption.
I strongly encourage considering transport, whether ssh or NNCP or UUCP, to be untrusted. Don’t run it as root if you can avoid it. In my example, the nncp user, which all NNCP commands are run as, has no access to the backup data at all. So even if NNCP were compromised, my backup data wouldn’t be. For even more security, I could also sign the backup stream with gpg and validate that on the receiving end.
I should note, however, that this conversation assumes that a network- or USB-facing ssh or NNCP is more likely to have an exploitable vulnerability than is gpg (which here is just processing a stream). This is probably a safe assumption in general. If you believe gpg is more likely to have an exploitable vulnerability than ssh or NNCP, then obviously you wouldn’t take this particular approach.
On the zfs side, the use of -F with zfs receive is avoided; this could lead to a compromised backed-up machine generating a malicious rollback on the destination. Backup zpools should be imported with -R or -N to ensure that a malicious mountpoint property couldn’t be used to cause an attack. I choose to use “zfs receive -u -o readonly=on” which is compatible with both unmounted backup datasets and zpools imported with -R (or both). To access the data in a backup dataset, you would normally clone it and access it there.
The processing script
So, put this all together and look at an example of a processing script that would run from cron as root and process the incoming ZFS data.
#!/bin/bash
set -e
set -o pipefail
# Log a message
logit () {
logger -p info -t “`basename “$0″`[$$]” “$1”
}
# Log an error message
logerror () {
logger -p err -t “`basename “$0″`[$$]” “$1”
}
# Log stdin with the given code. Used normally to log stderr.
logstdin () {
logger -p info -t “`basename “$0″`[$$/$1]”
}
# Run command, logging stderr and exit code
runcommand () {
logit “Running $*”
if “$@” 2> >(logstdin “$1”) ; then
logit “$1 exited successfully”
return 0
else
RETVAL=”$?”
logerror “$1 exited with error $RETVAL”
return “$RETVAL”
fi
}
STORE=backups/simplesnap
INCOMINGDIR=/backups/nncp/incoming
if ! [ -d “$INCOMINGDIR” ]; then
logerror “$INCOMINGDIR doesn’t exist”
exit 0
fi
LOCKFILE=”/backups/nncp/.nncp-backups-zfs-scan.lock”
printf -v EVAL_SAFE_LOCKFILE ‘%q’ “$LOCKFILE”
if dotlockfile -r 0 -l -p “${LOCKFILE}”; then
logit “Lock obtained at ${LOCKFILE} with dotlockfile”
trap ‘ECODE=$?; dotlockfile -u ‘”${EVAL_SAFE_LOCKFILE}”‘; exit $ECODE’ EXIT INT TERM
else
logit “Could not obtain lock at $LOCKFILE; $0 likely already running.”
exit 0
fi
EXITCODE=0
cd “$INCOMINGDIR”
logit “Scanning queue directory…”
for HOST in *; do
HOSTPATH=”$INCOMINGDIR/$HOST”
# files like backupsfmt2-134.13134_dest
for FILE in “$HOSTPATH”/backupsfmt2-[0-9]*_?*; do
if [ ! -f “$FILE” ]; then
logit “Skipping non-existent $FILE”
continue
fi
# Now, $DEST will be HOST/DEST. Strip off the @ also.
DEST=”`echo “$FILE” | sed -e ‘s/^.*backupsfmt2[^_]*_//’ -e ‘s,@,/,g’`”
if [ -z “$DEST” ]; then
logerror “Malformed dest in $FILE”
continue
fi
HOST2=”`echo “$DEST” | sed ‘s,/.*,,g’`”
if [ -z “$HOST2” ]; then
logerror “Malformed DEST $DEST in $FILE”
continue
fi
if [ ! “$HOST” = “$HOST2” ]; then
logerror “$DIR: $HOST doesn’t match $HOST2”
continue
fi
logit “Processing $FILE to $STORE/$DEST”
if runcommand gpg -q -d
Applying These Ideas to Non-ZFS Backups
ZFS backups made our job easier in a lot of ways:
ZFS can calculate a diff based on an efficiently-stored previous local state (snapshot or bookmark), rather than a comparison to a remote state (rsync)
ZFS “incremental” sends, while less efficient than rsync, are reasonably efficient, sending only changed blocks
ZFS receive detects and enforces that the incremental source on the local machine must match the incremental source of the original stream, enforcing ordering
Datasets using ZFS encryption can be sent in their encrypted state
Incrementals can be done without a full scan of the filesystem
Some of these benefits you just won’t get without ZFS (or something similar like btrfs), but let’s see how we could apply these ideas to non-ZFS backups. I will explore the implementation of them in a future post.
When I say “non ZFS”, I am being a bit vague as to whether the source, the destination, or both systems are running a non-ZFS filesystem. In general I’ll assume that neither are ZFS.
The first and most obvious answer is to just tar up the whole system and send that every day. This is, of course, only suitable for small datasets on a fast network. These tarballs could be unpacked on the destination and stored more efficiently via any number of methods (hardlink trees, a block-level deduplicator like borg or rdedup, or even just simply compressed tarballs).
To make the network trip more efficient, something like rdiff or xdelta could be used. A signature file could be stored on the machine being backed up (generated via tee/pee at stream time), and the next run could simply send an rdiff delta over NNCP. This would be quite network-efficient, but still would require reading every byte of every file on every backup, and would also require quite a bit of temporary space on the receiving end (to apply the delta to the previous tarball and generate a new one).
Alternatively, a program that generates incremental backup files such as rdup could be used. These could be transmitted over NNCP to the backup server, and unpacked there. While perhaps less efficient on the network — every file with at least one modified byte would be retransmitted in its entirety — it avoids the need to read every byte of unmodified files or to have enormous temporary space. I should note here that GNU tar claims to have an incremental mode, but it has a potential data loss bug.
There are also some tools with algorithms that may apply well in this use care: syrep and fssync being the two most prominent examples, though rdedup (mentioned above) and the nascent asuran project may also be combinable with other tools to achieve this effect.
I should, of course, conclude this section by mentioning btrfs. Every time I’ve tried it, I’ve run into serious bugs, and its status page indicates that only some of them have been resolved. I would not consider using it for something as important as backups. However, if you are comfortable with it, it is likely to be able to run in more constrained environments than ZFS and could probably be processed in much the same way as zfs streams.