Tag Archives: asynchronous

How Gapped is Your Air?

Sometimes we want better-than-firewall security for things. For instance:

  1. An industrial control system for a municipal water-treatment plant should never have data come in or out
  2. Or, a variant of the industrial control system: it should only permit telemetry and monitoring data out, and nothing else in or out
  3. A system dedicated to keeping your GPG private keys secure should only have material to sign (or decrypt) come in, and signatures (or decrypted data) go out
  4. A system keeping your tax records should normally only have new records go in, but may on occasion have data go out (eg, to print a copy of an old record)

In this article, I’ll talk about the “high side” (the high-security or high-sensitivity systems) and the “low side” (the lower-sensitivity or general-purpose systems). For the sake of simplicity, I’ll assume the high side is a single machine, but it could as well be a whole network.

Let’s focus on examples 3 and 4 to make things simpler. Let’s consider the primary concern to be data exfiltration (someone stealing your data), with a secondary concern of data integrity (somebody modifying or destroying your data).

You might think the safest possible approach is Airgapped – that is, there is literal no physical network connection to the machine at all. This help! But then, the problem becomes: how do we deal with the inevitable need to legitimately get things on or off of the system? As I wrote in Dead USB Drives Are Fine: Building a Reliable Sneakernet, by using tools such as NNCP, you can certainly create a “sneakernet”: using USB drives as transport.

While this is a very secure setup, as with most things in security, it’s less than perfect. The Wikipedia airgap article discusses some ways airgapped machines can still be exploited. It mentions that security holes relating to removable media have been exploited in the past. There are also other ways to get data out; for instance, Debian ships with gensio and minimodem, both of which can transfer data acoustically.

But let’s back up and think about why we think of airgapped machines as so much more secure, and what the failure modes of other approaches might be.

What about firewalls?

You could very easily set up high-side machine that is on a network, but is restricted to only one outbound TCP port. There could be a local firewall, and perhaps also a special port on an external firewall that implements the same restrictions. A variant on this approach would be two computers connected directly by a crossover cable, though this doesn’t necessarily imply being more secure.

Of course, the concern about a local firewall is that it could potentially be compromised. An external firewall might too; for instance, if your credentials to it were on a machine that got compromised. This kind of dual compromise may be unlikely, but it is possible.

We can also think about the complexity in a network stack and firewall configuration, and think that there may be various opportunities to have things misconfigured or buggy in a system of that complexity. Another consideration is that data could be sent at any time, potentially making it harder to detect. On the other hand, network monitoring tools are commonplace.

On the other hand, it is convenient and cheap.

I use a system along those lines to do my backups. Data is sent, gpg-encrypted and then encrypted again at the NNCP layer, to the backup server. The NNCP process on the backup server runs as an untrusted user, and dumps the gpg-encrypted files to a secure location that is then processed by a cron job using Filespooler. The backup server is on a dedicated firewall port, with a dedicated subnet. The only ports allowed out are for NNCP and NTP, and offsite backups. There is no default gateway. Not even DNS is permitted out (the firewall does the appropriate redirection). There is one pinhole allowed out, where a subset of the backup data is sent offsite.

I initially used USB drives as transport, and it had no network connection at all. But there were disadvantages to doing this for backups – particularly that I’d have no backups for as long as I’d forget to move the drives. The backup system also would have clock drift, and the offsite backup picture was more challenging. (The clock drift was a problem because I use 2FA on the system; a password, plus a TOTP generated by a Yubikey)

This is “pretty good” security, I’d think.

What are the weak spots? Well, if there were somehow a bug in the NNCP client, and the remote NNCP were compromised, that could lead to a compromise of the NNCP account. But this itself would accomplish little; some other vulnerability would have to be exploited on the backup server, because the NNCP account can’t see plaintext data at all. I use borgbackup to send a subset of backup data offsite over ssh. borgbackup has to run as root to be able to access all the files, but the ssh it calls runs as a separate user. A ssh vulnerability is therefore unlikely to cause much damage. If, somehow, the remote offsite system were compromised and it was able to exploit a security issue in the local borgbackup, that would be a problem. But that sounds like a remote possibility.

borgbackup itself can’t even be used over a sneakernet since it is not asynchronous. A more secure solution would probably be using something like dar over NNCP. This would eliminate the ssh installation entirely, and allow a complete isolation between the data-access and the communication stacks, and notably not require bidirectional communication. Logic separation matters too. My Roundup of Data Backup and Archiving Tools may be helpful here.

Other attack vectors could be a vulnerability in the kernel’s networking stack, local root exploits that could be combined with exploiting NNCP or borgbackup to gain root, or local misconfiguration that makes the sandboxes around NNCP and borgbackup less secure.

Because this system is in my basement in a utility closet with no chairs and no good place for a console, I normally manage it via a serial console. While it’s a dedicated line between the system and another machine, if the other machine is compromised or an adversary gets access to the physical line, credentials (and perhaps even data) could leak, albeit slowly.

But we can do much better with serial lines. Let’s take a look.

Serial lines

Some of us remember RS-232 serial lines and their once-ubiquitous DB-9 connectors. Traditionally, their speed maxxed out at 115.2Kbps.

Serial lines have the benefit that they can be a direct application-to-application link. In my backup example above, a serial line could directly link the NNCP daemon on one system with the NNCP caller on another, with no firewall or anything else necessary. It is simply up to those programs to open the serial device appropriately.

This isn’t perfect, however. Unlike TCP over Ethernet, a serial line has no inherent error checking. Modern programs such as NNCP and ssh assume that a lower layer is making the link completely clean and error-free for them, and will interpret any corruption as an attempt to tamper and sever the connection. However, there is a solution to that: gensio. In my page Using gensio and ser2net, I discuss how to run NNCP and ssh over gensio. gensio is a generic framework that can add framing, error checking, and retransmit to an unreliable link such as a serial port. It can also add encryption and authentication using TLS, which could be particularly useful for applications that aren’t already doing that themselves.

More traditional solutions for serial communications have their own built-in error correction. For instance, UUCP and Kermit both were designed in an era of noisy serial lines and might be an excellent fit for some use cases. The ZModem protocol also might be, though it offers somewhat less flexibility and automation than Kermit.

I have found that certain USB-to-serial adapters by Gearmo will actually run at up to 2Mbps on a serial line! Look for the ones on their spec pages with a FTDI chipset rated at 920Kbps. It turns out they can successfully be driven faster, especially if gensio’s relpkt is used. I’ve personally verified 2Mbps operation (Linux port speed 2000000) on Gearmo’s USA-FTDI2X and the USA-FTDI4X. (I haven’t seen any single-port options from Gearmo with the 920Kbps chipset, but they may exist).

Still, even at 2Mbps, speed may well be a limiting factor with some applications. If what you need is a console and some textual or batch data, it’s probably fine. If you are sending 500GB backup files, you might look for something else. In theory, this USB to RS-422 adapter should work at 10Mbps, but I haven’t tried it.

But if the speed works, running a dedicated application over a serial link could be a nice and fairly secure option.

One of the benefits of the airgapped approach is that data never leaves unless you are physically aware of transporting a USB stick. Of course, you may not be physically aware of what is ON that stick in the event of a compromise. This could easily be solved with a serial approach by, say, only plugging in the cable when you have data to transfer.

Data diodes

A traditional diode lets electrical current flow in only one direction. A data diode is the same concept, but for data: a hardware device that allows data to flow in only one direction.

This could be useful, for instance, in the tax records system that should only receive data, or the industrial system that should only send it.

Wikipedia claims that the simplest kind of data diode is a fiber link with transceivers connected in only one direction. I think you could go one simpler: a serial cable with only ground and TX connected at one end, wired to ground and RX at the other. (I haven’t tried this.)

This approach does have some challenges:

  • Many existing protocols assume a bidirectional link and won’t be usable

  • There is a challenge of confirming data was successfully received. For a situation like telemetry, maybe it doesn’t matter; another observation will come along in a minute. But for sending important documents, one wants to make sure they were properly received.

In some cases, the solution might be simple. For instance, with telemetry, just writing out data down the serial port in a simple format may be enough. For sending files, various mitigations, such as sending them multiple times, etc., might help. You might also look into FEC-supporting infrastructure such as blkar and flute, but these don’t provide an absolute guarantee. There is no perfect solution to knowing when a file has been successfully received if the data communication is entirely one-way.

Audio transport

I hinted above that minimodem and gensio both are software audio modems. That is, you could literally use speakers and microphones, or alternatively audio cables, as a means of getting data into or out of these systems. This is pretty limited; it is 1200bps, and often half-duplex, and could literally be disrupted by barking dogs in some setups. But hey, it’s an option.

Airgapped with USB transport

This is the scenario I began with, and named some of the possible pitfalls above as well. In addition to those, note also that USB drives aren’t necessarily known for their error-free longevity. Be prepared for failure.

Concluding thoughts

I wanted to lay out a few things in this post. First, that simply being airgapped is generally a step forward in security, but is not perfect. Secondly, that both physical and logical separation matter. And finally, that while tools like NNCP can make airgapped-with-USB-drive-transport a doable reality, there are also alternatives worth considering – especially serial ports, firewalled hard-wired Ethernet, data diodes, and so forth. I think serial links, in particular, have been largely forgotten these days.

Note: This article also appears on my website, where it may be periodically updated.

Building an Asynchronous, Internet-Optional Instant Messaging System

I loaded up this title with buzzwords. The basic idea is that IM systems shouldn’t have to only use the Internet. Why not let them be carried across LoRa radios, USB sticks, local Wifi networks, and yes, the Internet? I’ll first discuss how, and then why.

How do set it up

I’ve talked about most of the pieces here already:

So, putting this together:

  • All Delta Chat needs is access to a SMTP and IMAP server. This server could easily reside on localhost.
  • Existing email servers support transport of email using non-IP transports, including batch transports that can easily store it in files.
  • These batches can be easily carried by NNCP, Syncthing, Filespooler, etc. Or, if the connectivity is good enough, via traditional networking using Yggdrasil.
    • Side note: Both NNCP and email servers support various routing arrangements, and can easily use intermediary routing nodes. Syncthing can also mesh. NNCP supports asynchronous multicast, letting your messages opportunistically find the best way to their destination.

OK, so why would you do it?

You might be thinking, “doesn’t asynchronous mean slow?” Well, not necessarily. Asynchronous means “reliability is more important than speed”; that is, slow (even to the point of weeks) is acceptable, but not required. NNCP and Syncthing, for instance, can easily deliver within a couple of seconds.

But let’s step back a bit. Let’s say you’re hiking in the wilderness in an area with no connectivity. You get back to your group at a campsite at the end of the day, and have taken some photos of the forest and sent them to some friends. Some of those friends are at the campsite; when you get within signal range, they get your messages right away. Some of those friends are in another country. So one person from your group drives into town and sits at a coffee shop for a few minutes, connected to their wifi. All the messages from everyone in the group go out, all the messages from outside the group come in. Then they go back to camp and the devices exchange messages.

Pretty slick, eh?


Note: this article also has a more permanent home on my website, where it may be periodically updated.

Dead USB Drives Are Fine: Building a Reliable Sneakernet

“OK,” you’re probably thinking. “John, you talk a lot about things like Gopher and personal radios, and now you want to talk about building a reliable network out of… USB drives?”

Well, yes. In fact, I’ve already done it.

What is sneakernet?

Normally, “sneakernet” is a sort of tongue-in-cheek reference to using disconnected storage to transport data or messages. By “disconnect storage” I mean anything like CD-ROMs, hard drives, SD cards, USB drives, and so forth. There are times when loading up 12TB on a device and driving it across town is just faster and easier than using the Internet for the same. And, sometimes you need to get data to places that have no Internet at all.

Another reason for sneakernet is security. For instance, if your backup system is online, and your systems being backed up are online, then it could become possible for an attacker to destroy both your primary copy of data and your backups. Or, you might use a dedicated computer with no network connection to do GnuPG (GPG) signing.

What about “reliable” sneakernet, then?

TCP is often considered a “reliable” protocol. That means that the sending side is generally able to tell if its message was properly received. As with most reliable protocols, we have these components:

  1. After transmitting a piece of data, the sender retains it.
  2. After receiving a piece of data, the receiver sends an acknowledgment (ACK) back to the sender.
  3. Upon receiving the acknowledgment, the sender removes its buffered copy of the data.
  4. If no acknowledgment is received at the sender, it retransmits the data, in case it gets lost in transit.
  5. It reorders any packets that arrive out of order, so that the recipient’s data stream is ordered correctly.

Now, a lot of the things I just mentioned for sneakernet are legendarily unreliable. USB drives fail, CD-ROMs get scratched, hard drives get banged up. Think about putting these things in a bicycle bag or airline luggage. Some of them are going to fail.

You might think, “well, I’ll just copy files to a USB drive instead of move them, and once I get them onto the destination machine, I’ll delete them from the source.” Congratulations! You are a human retransmit algorithm! We should be able to automate this!

And we can.

Enter NNCP

NNCP is one of those things that almost defies explanation. It is a toolkit for building asynchronous networks. It can use as a carrier: a pipe, TCP network connection, a mounted filesystem (specifically intended for cases like this), and much more. It also supports multi-hop asynchronous routing and asynchronous meshing, but these are beyond the scope of this particular article.

NNCP’s transports that involve live communication between two hops already had all the hallmarks of being reliable; there was a positive ACK and retransmit. As of version 8.7.0, NNCP’s ACKs themselves can also be asynchronous – meaning that every NNCP transport can now be reliable.

Yes, that’s right. Your ACKs can flow over tapes and USB drives if you want them to.

I use this for archiving and backups.

If you aren’t already familiar with NNCP, you might take a look at my NNCP page. I also have a lot of blog posts about NNCP.

Those pages describe the basics of NNCP: the “packet” (the unit of transmission in NNCP, which can be tiny or many TB), the end-to-end encryption, and so forth. The new command we will now be interested in is nncp-ack.

The Basic Idea

Here are the basic steps to processing this stuff with NNCP:

  1. First, we use nncp-xfer -rx to process incoming packets from the USB (or other media) device. This moves them into the NNCP inbound queue, deleting them from the media device, and verifies the packet integrity.
  2. We use nncp-ack -node $NODE to create ACK packets responding to the packets we just loaded into the rx queue. It writes a list of generated ACKs onto fd 4, which we save off for later use.
  3. We run nncp-toss -seen to process the incoming queue. The use of -seen causes NNCP to remember the hashes of packets seen before, so a duplicate of an already-seen packet will not be processed twice. This command also processes incoming ACKs for packets we’ve sent out previously; if they pass verification, the relevant packets are removed from the local machine’s tx queue.
  4. Now, we use nncp-xfer -keep -tx -mkdir -node $NODE to send outgoing packets to a given node by writing them to a given directory on the media device. -keep causes them to remain in the outgoing queue.
  5. Finally, we use the list of generated ACK packets saved off in step 2 above. That list is passed to nncp-rm -node $NODE -pkt < $FILE to remove those specific packets from the outbound queue. The reason is that there will never be an ACK of ACK packet (that would create an infinite loop), so if we don’t delete them in this manner, they would hang around forever.

You can see these steps follow the same basic outline on upstream’s nncp-ack page.

One thing to keep in mind: if anything else is running nncp-toss, there is a chance of a race condition between steps 1 and 2 (if nncp-toss gets to it first, it might not get an ack generated). This would sort itself out eventually, presumably, as the sender would retransmit and it would be ACKed later.

Further ideas

NNCP guarantees the integrity of packets, but not ordering between packets; if you need that, you might look into my Filespooler program. It is designed to work with NNCP and can provide ordered processing.

An example script

Here is a script you might try for this sort of thing. It may have more logic than you need – really, you just need the steps above – but hopefully it is clear.

#!/bin/bash

set -eo pipefail

MEDIABASE="/media/$USER"

# The local node name
NODENAME="`hostname`"

# All nodes.  NODENAME should be in this list.
ALLNODES="node1 node2 node3"

RUNNNCP=""
# If you need to sudo, use something like RUNNNCP="sudo -Hu nncp"
NNCPPATH="/usr/local/nncp/bin"

ACKPATH="`mktemp -d`"

# Process incoming packets.
#
# Parameters: $1 - the path to scan.  Must contain a directory
# named "nncp".
procrxpath () {
    while [ -n "$1" ]; do
        BASEPATH="$1/nncp"
        shift
        if ! [ -d "$BASEPATH" ]; then
            echo "$BASEPATH doesn't exist; skipping"
            continue
        fi

        echo " *** Incoming: processing $BASEPATH"
        TMPDIR="`mktemp -d`"

        # This rsync and the one below can help with
        # certain permission issues from weird foreign
        # media.  You could just eliminate it and
        # always use $BASEPATH instead of $TMPDIR below.
        rsync -rt "$BASEPATH/" "$TMPDIR/"

        # You may need these next two lines if using sudo as above.
        # chgrp -R nncp "$TMPDIR"
        # chmod -R g+rwX "$TMPDIR"
        echo "     Running nncp-xfer -rx"
        $RUNNNCP $NNCPPATH/nncp-xfer -progress -rx "$TMPDIR"

        for NODE in $ALLNODES; do
                if [ "$NODE" != "$NODENAME" ]; then
                        echo "     Running nncp-ack for $NODE"

                        # Now, we generate ACK packets for each node we will
                        # process.  nncp-ack writes a list of the created
                        # ACK packets to fd 4.  We'll use them later.
                        # If using sudo, add -C 5 after $RUNNNCP.
                        $RUNNNCP $NNCPPATH/nncp-ack -progress -node "$NODE" \
                           4>> "$ACKPATH/$NODE"
                fi
        done

        rsync --delete -rt "$TMPDIR/" "$BASEPATH/"
        rm -fr "$TMPDIR"
    done
}


proctxpath () {
    while [ -n "$1" ]; do
        BASEPATH="$1/nncp"
        shift
        if ! [ -d "$BASEPATH" ]; then
            echo "$BASEPATH doesn't exist; skipping"
            continue
        fi

        echo " *** Outgoing: processing $BASEPATH"
        TMPDIR="`mktemp -d`"
        rsync -rt "$BASEPATH/" "$TMPDIR/"
        # You may need these two lines if using sudo:
        # chgrp -R nncp "$TMPDIR"
        # chmod -R g+rwX "$TMPDIR"

        for DESTHOST in $ALLNODES; do
            if [ "$DESTHOST" = "$NODENAME" ]; then
                continue
            fi

            # Copy outgoing packets to this node, but keep them in the outgoing
            # queue with -keep.
            $RUNNNCP $NNCPPATH/nncp-xfer -keep -tx -mkdir -node "$DESTHOST" -progress "$TMPDIR"

            # Here is the key: that list of ACK packets we made above - now we delete them.
            # There will never be an ACK for an ACK, so they'd keep sending forever
            # if we didn't do this.
            if [ -f "$ACKPATH/$DESTHOST" ]; then
                echo "nncp-rm for node $DESTHOST"
                $RUNNNCP $NNCPPATH/nncp-rm -debug -node "$DESTHOST" -pkt < "$ACKPATH/$DESTHOST"
            fi

        done

        rsync --delete -rt "$TMPDIR/" "$BASEPATH/"
        rm -rf "$TMPDIR"

        # We only want to write stuff once.
        return 0
    done
}

procrxpath "$MEDIABASE"/*

echo " *** Initial tossing..."

# We make sure to use -seen to rule out duplicates.
$RUNNNCP $NNCPPATH/nncp-toss -progress -seen

proctxpath "$MEDIABASE"/*

echo "You can unmount devices now."

echo "Done."

This post is also available on my webiste, where it may be periodically updated.

Fast, Ordered Unixy Queues over NNCP and Syncthing with Filespooler

It seems that lately I’ve written several shell implementations of a simple queue that enforces ordered execution of jobs that may arrive out of order. After writing this for the nth time in bash, I decided it was time to do it properly. But first, a word on the why of it all.

Why did I bother?

My needs arose primarily from handling Backups over Asynchronous Communication methods – in this case, NNCP. When backups contain incrementals that are unpacked on the destination, they must be applied in the correct order.

In some cases, like ZFS, the receiving side will detect an out-of-order backup file and exit with an error. In those cases, processing in random order is acceptable but can be slow if, say, hundreds or thousands of hourly backups have stacked up over a period of time. The same goes for using gitsync-nncp to synchronize git repositories. In both cases, a best effort based on creation date is sufficient to produce a significant performance improvement.

With other cases, such as tar or dar backups, the receiving cannot detect out of order incrementals. In those situations, the incrementals absolutely must be applied with strict ordering. There are many other situations that arise with these needs also. Filespooler is the answer to these.

Existing Work

Before writing my own program, I of course looked at what was out there already. I looked at celeary, gearman, nq, rq, cctools work queue, ts/tsp (task spooler), filequeue, dramatiq, GNU parallel, and so forth.

Unfortunately, none of these met my needs at all. They all tended to have properties like:

  • An extremely complicated client/server system that was incompatible with piping data over existing asynchronous tools
  • A large bias to processing of small web requests, resulting in terrible inefficiency or outright incompatibility with jobs in the TB range
  • An inability to enforce strict ordering of jobs, especially if they arrive in a different order from how they were queued

Many also lacked some nice-to-haves that I implemented for Filespooler:

  • Support for the encryption and cryptographic authentication of jobs, including metadata
  • First-class support for arbitrary compressors
  • Ability to use both stream transports (pipes) and filesystem-like transports (eg, rclone mount, S3, Syncthing, or Dropbox)

Introducing Filespooler

Filespooler is a tool in the Unix tradition: that is, do one thing well, and integrate nicely with other tools using the fundamental Unix building blocks of files and pipes. Filespooler itself doesn’t provide transport for jobs, but instead is designed to cooperate extremely easily with transports that can be written to as a filesystem or piped to – which is to say, almost anything of interest.

Filespooler is written in Rust and has an extensive Filespooler Reference as well as many tutorials on its homepage. To give you a few examples, here are some links:

Basics of How it Works

Filespooler is intentionally simple:

  • The sender maintains a sequence file that includes a number for the next job packet to be created.
  • The receiver also maintains a sequence file that includes a number for the next job to be processed.
  • fspl prepare creates a Filespooler job packet and emits it to stdout. It includes a small header (<100 bytes in most cases) that includes the sequence number, creation timestamp, and some other useful metadata.
  • You get to transport this job packet to the receiver in any of many simple ways, which may or may not involve Filespooler’s assistance.
  • On the receiver, Filespooler (when running in the default strict ordering mode) will simply look at the sequence file and process jobs in incremental order until it runs out of jobs to process.

The name of job files on-disk matches a pattern for identification, but the content of them is not significant; only the header matters.

You can send job data in three ways:

  1. By piping it to fspl prepare
  2. By setting certain environment variables when calling fspl prepare
  3. By passing additional command-line arguments to fspl prepare, which can optionally be passed to the processing command at the receiver.

Data piped in is added to the job “payload”, while environment variables and command-line parameters are encoded in the header.

Basic usage

Here I will excerpt part of the Using Filespooler over Syncthing tutorial; consult it for further detail. As a bit of background, Syncthing is a FLOSS decentralized directory synchronization tool akin to Dropbox (but with a much richer feature set in many ways).

Preparation

First, on the receiver, you create the queue (passing the directory name to -q):

sender$ fspl queue-init -q ~/sync/b64queue

Now, we can send a job like this:

sender$ echo Hi | fspl prepare -s ~/b64seq -i - | fspl queue-write -q ~/sync/b64queue

Let’s break that down:

  • First, we pipe “Hi” to fspl prepare.
  • fspl prepare takes two parameters:
    • -s seqfile gives the path to a sequence file used on the sender side. This file has a simple number in it that increments a unique counter for every generated job file. It is matched with the nextseq file within the queue to make sure that the receiver processes jobs in the correct order. It MUST be separate from the file that is in the queue and should NOT be placed within the queue. There is no need to sync this file, and it would be ideal to not sync it.
    • The -i option tells fspl prepare to read a file for the packet payload. -i - tells it to read stdin for this purpose. So, the payload will consist of three bytes: “Hi\n” (that is, including the terminating newline that echo wrote)
  • Now, fspl prepare writes the packet to its stdout. We pipe that into fspl queue-write:
    • fspl queue-write reads stdin and writes it to a file in the queue directory in a safe manner. The file will ultimately match the fspl-*.fspl pattern and have a random string in the middle.

At this point, wait a few seconds (or however long it takes) for the queue files to be synced over to the recipient.

On the receiver, we can see if any jobs have arrived yet:

receiver$ fspl queue-ls -q ~/sync/b64queue
ID                   creation timestamp          filename
1                    2022-05-16T20:29:32-05:00   fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl

Let’s say we’d like some information about the job. Try this:

receiver$ $ fspl queue-info -q ~/sync/b64queue -j 1
FSPL_SEQ=1
FSPL_CTIME_SECS=1652940172
FSPL_CTIME_NANOS=94106744
FSPL_CTIME_RFC3339_UTC=2022-05-17T01:29:32Z
FSPL_CTIME_RFC3339_LOCAL=2022-05-16T20:29:32-05:00
FSPL_JOB_FILENAME=fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
FSPL_JOB_QUEUEDIR=/home/jgoerzen/sync/b64queue
FSPL_JOB_FULLPATH=/home/jgoerzen/sync/b64queue/jobs/fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl

This information is intentionally emitted in a format convenient for parsing.

Now let’s run the job!

receiver$ fspl queue-process -q ~/sync/b64queue --allow-job-params base64
SGkK

There are two new parameters here:

  • --allow-job-params says that the sender is trusted to supply additional parameters for the command we will be running.
  • base64 is the name of the command that we will run for every job. It will:
    • Have environment variables set as we just saw in queue-info
    • Have the text we previously prepared – “Hi\n” – piped to it

By default, fspl queue-process doesn’t do anything special with the output; see Handling Filespooler Command Output for details on other options. So, the base64-encoded version of our string is “SGkK”. We successfully sent a packet using Syncthing as a transport mechanism!

At this point, if you do a fspl queue-ls again, you’ll see the queue is empty. By default, fspl queue-process deletes jobs that have been successfully processed.

For more

See the Filespooler homepage.


This blog post is also available as a permanent, periodically-updated page.

Tools for Communicating Offline and in Difficult Circumstances

Note: this post is also available on my website, where it will be updated periodically.

When things are difficult – maybe there’s been a disaster, or an invasion (this page is being written in 2022 just after Russia invaded Ukraine), or maybe you’re just backpacking off the grid – there are tools that can help you keep in touch, or move your data around. This page aims to survey some of them, roughly in order from easiest to more complex.

Simple radios

Handheld radios shouldn’t be forgotten. They are cheap, small, and easy to operate. Their range isn’t huge – maybe a couple of miles in rural areas, much less in cities – but they can be a useful place to start. They tend to have no actual encryption features (the “privacy” features really aren’t.) In the USA, options are FRS/GMRS and CB.

Syncthing

With Syncthing, you can share files among your devices or with your friends. Syncthing essentially builds a private mesh for file sharing. Devices will auto-discover each other when on the same LAN or Wifi network, and opportunistically sync.

I wrote more about offline uses of Syncthing, and its use with NNCP, in my blog post A simple, delay-tolerant, offline-capable mesh network with Syncthing (+ optional NNCP). Yes, it is a form of a Mesh Network!

Homepage: https://syncthing.net/

Briar

Briar is an instant messaging service based around Android. It’s IM with a twist: it can use a mesh of Bluetooh devices. Or, if Internet is available, Tor. It has even been extended to support the use of SD cards and USB sticks to carry your messages.

Like some others here, it can relay messages for third parties as well.

Homepage: https://briarproject.org/

Manyverse and Scuttlebutt

Manyverse is a client for Scuttlebutt, which is a sort of asynchronous, offline-friendly social network. You can use it to keep in touch with your family and friends, and it supports syncing over Bluetooth and Wifi even in the absence of Internet.

Homepages: https://www.manyver.se/ and https://scuttlebutt.nz/

Yggdrasil

Yggdrasil is a self-healing, fully end-to-end Encrypted Mesh Network. It can work among local devices or on the global Internet. It has network services that can egress onto things like Tor, I2P, and the public Internet. Yggdrasil makes a perfect companion to ad-hoc wifi as it has auto peer discovery on the local network.

I talked about it in more detail in my blog post Make the Internet Yours Again With an Instant Mesh Network.

Homepage: https://yggdrasil-network.github.io/

Ad-Hoc Wifi

Few people know about the ad-hoc wifi mode. Ad-hoc wifi lets devices in range talk to each other without an access point. You just all set your devices to the same network name and password and there you go. However, there often isn’t DHCP, so IP configuration can be a bit of a challenge. Yggdrasil helps here.

NNCP

Moving now to more advanced tools, NNCP lets you assemble a network of peers that can use Asynchronous Communication over sneakernet, USB drives, radios, CD-Rs, Internet, tor, NNCP over Yggdrasil, Syncthing, Dropbox, S3, you name it . NNCP supports multi-hop file transfer and remote execution. It is fully end-to-end encrypted. Think of it as the offline version of ssh.

Homepage: https://nncp.mirrors.quux.org/

Meshtastic

Meshtastic uses long-range, low-power LoRa radios to build a long-distance, encrypted, instant messaging system that is a Mesh Network. It requires specialized hardware, about $30, but will tend to get much better range than simple radios, and with very little power.

Homepages: https://meshtastic.org/ and https://meshtastic.letstalkthis.com/

Portable Satellite Communicators

You can get portable satellite communicators that can send SMS from anywhere on earth with a clear view of the sky. The Garmin InReach mini and Zoleo are two credible options. Subscriptions range from about $10 to $40 per month depending on usage. They also have global SOS features.

Telephone Lines

If you have a phone line and a modem, UUCP can get through just about anything. It’s an older protocol that lacks modern security, but will deal with slow and noisy serial lines well. XBee SX radios also have a serial mode that can work well with UUCP.

Additional Suggestions

It is probably useful to have a Linux live USB stick with whatever software you want to use handy. Debian can be installed from the live environment, or you could use a security-focused distribution such as Tails or Qubes.

References

This page originated in my Mastodon thread and incorporates some suggestions I received there.

It also formed a post on my blog.

Distributed, Asynchronous Git Syncing with NNCP

I have a problem.

I have a directory that I use with org-mode and org-roam. I want it to be synced across multiple machines. I also want to keep the history with git. And, I want to use end-to-end encryption (no storing a plain git repo on a remote server), have a serverless setup, not require any two machines to be up simultaneously, and be resilient in the face of races and conflicts.

Whew.

I’ve tried a number of setups – git-remote-gcrypt on a remote server (fragile), some complicated scripts around a separate repo in syncthing (requires one machine to be “in charge”), etc. They all were subpar.

Then NNCP introdoced asynchronous multicast and I was intrigued.

So, I wrote gitsync-nncp, which uses NNCP to distribute git bundles to all the participating machines. The comprehensive documentation for gitsync-nncp goes into a lot more detail about how it works and what problems it solves. It’s working quite well for me!

A Simple, Delay-Tolerant, Offline-Capable Mesh Network with Syncthing (+ optional NNCP)

A little while back, I spent a week in a remote area. It had no Internet and no cell phone coverage. Sometimes, I would drive in to town where there was a signal to get messages, upload photos, and so forth. I had to take several devices with me: my phone, my wife’s, maybe a laptop or a tablet too. It seemed there should have been a better way. And there is.

I’ll use this example to talk about a mesh network, but it could just as well apply to people wanting to communicate on a 12-hour flight that has no in-flight wifi, or spacecraft with an intermittent connection, or a person traveling.

Syncthing makes a wonderful solution for things like these. Here are some interesting things about Syncthing:

  • You can think of Syncthing as a serverless, peer-to-peer, open source alternative to Dropbox. Machines sync directly with each other without a server, though you can add a server if you want.
  • It can operate completely without Internet access or any central server, though if Internet access is available, it can readily be used.
  • Syncthing devices connected to the same LAN or Wifi will detect each other’s presence and automatically communicate.
  • Syncthing is capable of handling a constantly-changing topology. It can also, for instance, handle two disconnected clusters of nodes with one node that “travels” between them — perhaps just a phone.
  • Syncthing scales from everything from a phone to thousands of nodes.
  • Syncthing normally performs syncs in every direction, but can also do single-direction syncs
  • An individual Syncthing node can register its interest or disinterest in certain files or directories based on filename patterns

Syncthing works by having you define devices and folders. You can choose which devices to share folders with. A shared folder has an ID that is unique across Sycnthing. You can share a folder from device A to device B, and then device B can share it with device C, even if A and C don’t know about each other or have no way to communicate. More commonly, though, all the devices would know about each other and will opportunistically communicate the best way they can.

Syncthing uses something akin to a Bittorrent protocol. Say you’re syncing videos from your phone, and they’re going to 3 machines. It doesn’t mean that Syncthing has to send it three times from the phone. Syncthing will send each block, most likely, just once; the other nodes in the swarm will register the block availability from the first other node to get it and will exchange blocks with themselves.

Syncthing will typically look for devices on the local LAN. Failing that, it will use an introduction server to see if it can reach them directly using P2P. Failing that, perhaps due to restrictive firewalls or NAT, communication can be relayed through volunteer-run Syncthing servers on the Internet. All Syncthing communications are cryptographically encrypted and verified. You can also configure Syncthing arbitrarily; for instance, to run over ssh or Tor tunnels.

So, let’s look at how Syncthing might help with the example I laid out up front.

All the devices at the remote location could communicate with each other. The Android app is quite capable of syncing photos and videos using Syncthing, for instance. Then one device could be taken to the Internet location and it would transmit data on behalf of all the others – perhaps back to a computer at your home, or to a server somewhere. Perhaps a script running on the remote server would then move files out of the syncthing synced folder into permanent storage elsewhere, triggering a deletion to be sent to the phone to free up storage. When the phone gets back to the other devices, the deletion can be propagated to them to free up storage there too.

Or maybe you have a computer out in a shed or somewhere without Internet access that you go to periodically, and need to get files to it. Again, your phone could be a carrier.

Taking it a step further

If you envision a file as a packet, you could, conceivably, do something like tunnel TCP/IP over Syncthing, assuming generous-enough timeouts. It can truly handle communication.

But you don’t need TCP/IP for this. Consider some other things you could do:

  • Drop a script in a special directory that gets picked up by a remote server and run
  • Drop emails in a special directory that get transmitted and then deleted by a remote system when they’re seen
  • Drop files (eg, photos or videos) in a directory that a remote system will copy or move out of there
  • Drop messages (perhaps gpg-encrypted) — which could be text files — for someone to see and process.
  • Drop NNTP bundles for group communication

You can start to see how there are a lot of possibilities here that extend beyond just file synchronization, though they are built upon a file synchronization tool.

Enter NNCP

Let’s look at a tool that’s especially suited for this: NNCP, which I’ve been writing about a lot lately.

NNCP is designed to handle file exchange and remote execution with remote computers in an asynchronous, store-and-forward manner. NNCP packets are themselves encrypted and authenticated. NNCP traditionally is source-routed (that is, you configure it so that machine A reaches machine D by relaying through B and C), and the packets are onion-routed. NNCP packets can be exchanged by a TCP call, a tar-like stream, copying files to something like a USB stick and physically transporting it to the remote, etc.

This works really well and I’ve been using it myself. But it gets complicated if the network topology isn’t fixed; it is difficult to reroute packets due to the onion routing, for instance. There are various workarounds that could be used — but why not just use Syncthing as a transport in those cases?

nncp-xfer is the command that exchanges packets by writing them to, and reading them from, a directory. It is what you’d use to exchange packets on a USB stick. And what you’d use to exchange packets via Syncthing. It writes packets in a RECIPIENT/SENDER/PACKET directory structure, so it is perfectly fine to have multiple systems exchanging packets in a single Syncthing synced folder tree. This structure also allows leaf nodes to only carry the particular packets they’re interested in. The packets are all encrypted, so they can be freely synced wherever.

Since Syncthing opportunistically syncs a shared folder with any device the folder is shared with, a phone could very easily be the NNCP transport, even if it has no idea what NNCP is. It could carry NNCP packets back and forth between sites, or to the Internet, or whatever.

NNCP supports file transmission, file request, and remote execution, all subject to controls, of course. It is easy to integrate with Exim or Postfix to use as a mail transport, Git transport, and so forth. I use it for backups. It would be quite easy to have it send those backups (encrypted zfs send) via nncp-xfer to Syncthing instead of the usual method, and then if I’ve shared the Syncthing folder with my phone, all I need to do is bring the phone into Internet range and they get sent. nncp-xfer will normally remove the packets out of the xfer directory as it ingests them, so the space will only be consumed on the phone (and laptop) until we know the packets made it to their destination.

Pretty slick, eh?

Remote Directory Tree Comparison, Optionally Asynchronous and Airgapped

Note: this is another article in my series on asynchronous communication in Linux with UUCP and NNCP.

In the previous installment on store-and-forward backups, I mentioned how easy it is to do with ZFS, and some of the tools that can be used to do it without ZFS. A lot of those tools are a bit less robust, so we need some sort of store-and-forward mechanism to verify backups. To be sure, verifying backups is good with ANY scheme, and this could be used with ZFS backups also.

So let’s say you have a shiny new backup scheme in place, and you’d like to verify that it’s working correctly. To do that, you need to compare the source directory tree on machine A with the backed-up directory tree on machine B.

Assuming a conventional setup, here are some ways you might consider to do that:

  • Just copy everything from machine A to machine B and compare locally
  • Or copy everything from machine A to a USB drive, plug that into machine B, and compare locally
  • Use rsync in dry-run mode and see if it complains about anything

The first two options are not particularly practical for large datasets, though I note that the second is compatible with airgapping. Using rsync requires both systems to be online at the same time to perform the comparison.

What would be really nice here is a tool that would write out lots of information about the files on a system: their names, sizes, last modified dates, maybe even sha256sum and other data. This file would be far smaller than the directory tree itself, would compress nicely, and could be easily shipped to an airgapped system via NNCP, UUCP, a USB drive, or something similar.

Tool choices

It turns out there are already quite a few tools in Debian (and other Free operating systems) to do this, and half of them are named mtree (though, of course, not all mtrees are compatible with each other.) We’ll look at some of the options here.

I’ve made a simple test directory for illustration purposes with these commands:

mkdir test
cd test
echo hi > hi
ln -s hi there
ln hi foo
touch empty
mkdir emptydir
mkdir somethingdir
cd somethingdir
ln -s ../there

I then also used touch to set all files to a consistent timestamp for illustration purposes.

Tool option: getfacl (Debian package: acl)

This comes with the acl package, but can be used with other than ACL purposes. Unfortunately, it doesn’t come with a tool to directly compare its output with a filesystem (setfacl, for instance, can apply the permissions listed but won’t compare.) It ignores symlinks and doesn’t show sizes or dates, so is ineffective for our purposes.

Example output:

$ getfacl --numeric -R test
...
# file: test/hi
# owner: 1000
# group: 1000
user::rw-
group::r--
other::r--
...

Tool option: fmtree, the FreeBSD mtree (Debian package: freebsd-buildutils)

fmtree can prepare a “specification” based on a directory tree, and compare a directory tree to that specification. The comparison also is aware of files that exist in a directory tree but not in the specification. The specification format is a bit on the odd side, but works well enough with fmtree. Here’s a sample output with defaults:

$ fmtree -c -p test
...
# .
/set type=file uid=1000 gid=1000 mode=0644 nlink=1
.               type=dir mode=0755 nlink=4 time=1610421833.000000000
    empty       size=0 time=1610421833.000000000
    foo         nlink=2 size=3 time=1610421833.000000000
    hi          nlink=2 size=3 time=1610421833.000000000
    there       type=link mode=0777 time=1610421833.000000000 link=hi

... skipping ...

# ./somethingdir
/set type=file uid=1000 gid=1000 mode=0777 nlink=1
somethingdir    type=dir mode=0755 nlink=2 time=1610421833.000000000
    there       type=link time=1610421833.000000000 link=../there
# ./somethingdir
..

..

You might be wondering here what it does about special characters, and the answer is that it has octal escapes, so it is 8-bit clean.

To compare, you can save the output of fmtree to a file, then run like this:

cd test
fmtree < ../test.fmtree

If there is no output, then the trees are identical. Change something and you get a line of of output explaining each difference. You can also use fmtree -U to change things like modification dates to match the specification.

fmtree also supports quite a few optional keywords you can add with -K. They include things like file flags, user/group names, various tipes of hashes, and so forth. I'll note that none of the options can let you determine which files are hardlinked together.

Here's an excerpt with -K sha256digest added:

    empty       size=0 time=1610421833.000000000 \
                sha256digest=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    foo         nlink=2 size=3 time=1610421833.000000000 \
                sha256digest=98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4

If you include a sha256digest in the spec, then when you verify it with fmtree, the verification will also include the sha256digest. Obviously fmtree -U can't correct a mismatch there, but of course it will detect and report it.

Tool option: mtree, the NetBSD mtree (Debian package: mtree-netbsd)

mtree produces (by default) output very similar to fmtree. With minor differences (such as the name of the sha256digest in the output), the discussion above about fmtree also applies to mtree.

There are some differences, and the most notable is that mtree adds a -C option which reads a spec and converts it to a "format that's easier to parse with various tools." Here's an example:

$ mtree -c -K sha256digest -p test | mtree -C
. type=dir uid=1000 gid=1000 mode=0755 nlink=4 time=1610421833.0 flags=none 
./empty type=file uid=1000 gid=1000 mode=0644 nlink=1 size=0 time=1610421833.0 flags=none 
./foo type=file uid=1000 gid=1000 mode=0644 nlink=2 size=3 time=1610421833.0 flags=none 
./hi type=file uid=1000 gid=1000 mode=0644 nlink=2 size=3 time=1610421833.0 flags=none 
./there type=link uid=1000 gid=1000 mode=0777 nlink=1 link=hi time=1610421833.0 flags=none 
./emptydir type=dir uid=1000 gid=1000 mode=0755 nlink=2 time=1610421833.0 flags=none 
./somethingdir type=dir uid=1000 gid=1000 mode=0755 nlink=2 time=1610421833.0 flags=none 
./somethingdir/there type=link uid=1000 gid=1000 mode=0777 nlink=1 link=../there time=1610421833.0 flags=none 

Most definitely an improvement in both space and convenience, while still retaining the relevant information. Note that if you want the sha256digest in the formatted output, you need to pass the -K to both mtree invocations. I could have done that here, but it is easier to read without it.

mtree can verify a specification in either format. Given what I'm about to show you about bsdtar, this should illustrate why I bothered to package mtree-netbsd for Debian.

Unlike fmtree, the mtree -U command will not adjust modification times based on the spec, but it will report on differences.

Tool option: bsdtar (Debian package: libarchive-tools)

bsdtar is a fascinating program that can work with many formats other than just tar files. Among the formats it supports is is the NetBSD mtree "pleasant" format (mtree -C compatible).

bsdtar can also convert between the formats it supports. So, put this together: bsdtar can convert a tar file to an mtree specification without extracting the tar file. bsdtar can also use an mtree specification to override the permissions on files going into tar -c, so it is a way to prepare a tar file with things owned by root without resorting to tools like fakeroot.

Let's look at how this can work:

$ cd test
$ bsdtar --numeric -cf - --format=mtree .
#mtree
. time=1610472086.318593729 mode=755 gid=1000 uid=1000 type=dir
./empty time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=0
./foo nlink=2 time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=3
./hi nlink=2 time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=3
./ormat\075mtree time=1610472086.318593729 mode=644 gid=1000 uid=1000 type=file size=5632
./there time=1610421833.0 mode=777 gid=1000 uid=1000 type=link link=hi
./emptydir time=1610421833.0 mode=755 gid=1000 uid=1000 type=dir
./somethingdir time=1610421833.0 mode=755 gid=1000 uid=1000 type=dir
./somethingdir/there time=1610421833.0 mode=777 gid=1000 uid=1000 type=link link=../there

You can use mtree -U to verify that as before. With the --options mtree: set, you can also add hashes and similar to the bsdtar output. Since bsdtar can use input from tar, pax, cpio, zip, iso9660, 7z, etc., this capability can be used to create verification of the files inside quite a few different formats. You can convert with bsdtar -cf output.mtree --format=mtree @input.tar. There are some foibles with directly using these converted files with mtree -U, but usually minor changes will get it there.

Side mention: stat(1) (Debian package: coreutils)

This tool isn't included because it won't operate recursively, but is a tool in the similar toolbox.

Putting It Together

I will still be developing a complete non-ZFS backup system for NNCP (or UUCP) in a future post. But in the meantime, here are some ideas you can reflect on:

  • Let's say your backup scheme involves sending a full backup every night. On the source system, you could pipe the generated tar file through something like tee >(bsdtar -cf bcakup.mtree @-) to generate an mtree file in-band while generating the tar file. This mtree file could be shipped over for verification.
  • Perhaps your backup scheme involves sending incremental backup data via rdup or even ZFS, but you would like to periodically verify that everything is good -- that an incremental didn't miss something. Something like mtree -K sha256 -c -x -p / | mtree -C -K sha256 would let you accomplish that.

I will further develop at least one of these ideas in a future post.

Bonus: cross-tool comparisons

In my mtree-netbsd packaging, I added tests like this to compare between tools:

fmtree -c -K $(MTREE_KEYWORDS) | mtree
mtree -c -K $(MTREE_KEYWORDS) | sed -e 's/\(md5\|sha1\|sha256\|sha384\|sha512\)=/\1digest=/' -e 's/rmd160=/ripemd160digest=/' | fmtree
bsdtar -cf - --options 'mtree:uname,gname,md5,sha1,sha256,sha384,sha512,device,flags,gid,link,mode,nlink,size,time,uid,type,uname' --format mtree . | mtree

More Topics on Store-And-Forward (Possibly Airgapped) ZFS and Non-ZFS Backups with NNCP

Note: this is another article in my series on asynchronous communication in Linux with UUCP and NNCP.

In my previous post, I introduced a way to use ZFS backups over NNCP. In this post, I’ll expand on that and also explore non-ZFS backups.

Use of nncp-file instead of nncp-exec

The previous example used nncp-exec (like UUCP’s uux), which lets you pipe stdin in, then queues up a request to run a given command with that input on a remote. I discussed that NNCP doesn’t guarantee order of execution, but that for the ZFS use case, that was fine since zfs receive would just fail (causing NNCP to try again later).

At present, nncp-exec stores the data piped to it in RAM before generating the outbound packet (the author plans to fix this shortly) [Update: This is now fixed; use -use-tmp with nncp-exec!). That made it unusable for some of my backups, so I set it up another way: with nncp-file, the tool to transfer files to a remote machine. A cron job then picks them up and processes them.

On the machine being backed up, we have to find a way to encode the dataset to be received. I chose to do that as part of the filename, so the updated simplesnap-queue could look like this:

#!/bin/bash

set -e
set -o pipefail

DEST="`echo $1 | sed 's,^tank/simplesnap/,,'`"
FILE="bakfsfmt2-`date "+%s.%N".$$`_`echo "$DEST" | sed 's,/,@,g'`"

echo "Processing $DEST to $FILE" >&2
# stdin piped to this
zstd -8 - \
  | gpg --compress-algo none --cipher-algo AES256 -e -r 012345...  \
  | su nncp -c "/usr/local/nncp/bin/nncp-file -nice B -noprogress - 'backupsvr:$FILE'" >&2

echo "Queued $DEST to $FILE" >&2

I’ve added compression and encryption here as well; more on that below.

On the backup server, we would define a different incoming directory for each node in nncp.hjson. For instance:

host1: {
...
   incoming: "/var/local/nncp-bakcups-incoming/host1"
}

host2: {
...
   incoming: "/var/local/nncp-backups-incoming/host2"
}

I’ll present the scanning script in a bit.

Offsite Backup Rotation

Most of the time, you don’t want just a single drive to store the backups. You’d like to have a set. At minimum, one wouldn’t be plugged in so lightning wouldn’t ruin all your backups. But maybe you’d store a second drive at some other location you have access to (friend’s house, bank box, etc.)

There are several ways you could solve this:

  • If the remote machine is at a location with network access and you trust its physical security (remember that although it will store data encrypted at rest and will transport it encrypted, it will — in most cases — handle un-encrypted data during processing), you could of course send NNCP packets to it over the network at the same time you send them to your local backup system.
  • Alternatively, if the remote location doesn’t have network access or you want to keep it airgapped, you could transport the NNCP packets by USB drive to the remote end.
  • Or, if you don’t want to have any kind of processing capability remotely — probably a wise move — you could rotate the hard drives themselves, keeping one plugged in locally and unplugging the other to take it offsite.

The third option can be helped with NNCP, too. One way is to create separate NNCP installations for each of the drives that you store data on. Then, whenever one is plugged in, the appropriate NNCP config will be loaded and appropriate packets received and processed. The neighbor machine — the spooler — would just store up packets for the offsite drive until it comes back onsite (or, perhaps, your airgapped USB transport would do this). Then when it’s back onsite, all the queued up ZFS sends get replayed and the backups replicated.

Now, how might you handle this with NNCP?

The simple way would be to have each system generating backups send them to two destinations. For instance:

zstd -8 - | gpg --compress-algo none --cipher-algo AES256 -e -r 07D5794CD900FAF1D30B03AC3D13151E5039C9D5 \
  | tee >(su nncp -c "/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress - 'backupdisk1:$FILE'") \
        >(su nncp -c "/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress - 'backupdisk2:$FILE'") \
   > /dev/null

You could probably also more safely use pee(1) (from moreutils) to do this.

This has an unfortunate result of doubling the network traffic from every machine being backed up. So an alternative option would be to queue the packets to the spooling machine, and run a distribution script from it; something like this, in part:

INCOMINGDIR="/var/local/nncp-bakfs-incoming"
LOCKFILE="$INCOMINGDIR/.lock"
printf -v EVAL_SAFE_LOCKFILE '%q' "$LOCKFILE"
if dotlockfile -r 0 -l -p "${LOCKFILE}"; then
  logit "Lock obtained at ${LOCKFILE} with dotlockfile"
  trap 'ECODE=$?; dotlockfile -u '"${EVAL_SAFE_LOCKFILE}"'; exit $ECODE' EXIT INT TERM
else
  logit "Could not obtain lock at $LOCKFILE; $0 likely already running."
  exit 0
fi


logit "Scanning queue directory..."
cd "$INCOMINGDIR"
for HOST in *; do
   cd "$INCOMINGDIR/$HOST"
   for FILE in bakfsfmt2-*; do
           if [ -f "$FILE" ]; then
                   for BAKFS in backupdisk1 backupdisk2; do
                           runcommand nncp-file -nice B+5 -noprogress "$FILE" "$BAKFS:$HOST/$FILE"
                   done
                   runcommand rm "$FILE"
           else
                   logit "$HOST: Skipping $FILE since it doesn't exist"
           fi
   done

done
logit "Scan complete."

Security Considerations

You’ll notice that in my example above, the encryption happens as the root user, but nncp is called under su. This means that even if there is a vulnerability in NNCP, the data would still be protected by GPG. I’ll also note here that many sites run ssh as root unnecessarily; the same principles should apply there. (ssh has had vulnerabilities in the past as well). I could have used gpg’s built-in compression, but zstd is faster and better, so we can get good performance by using fast compression and piping that to an algorithm that can use hardware acceleration for encryption.

I strongly encourage considering transport, whether ssh or NNCP or UUCP, to be untrusted. Don’t run it as root if you can avoid it. In my example, the nncp user, which all NNCP commands are run as, has no access to the backup data at all. So even if NNCP were compromised, my backup data wouldn’t be. For even more security, I could also sign the backup stream with gpg and validate that on the receiving end.

I should note, however, that this conversation assumes that a network- or USB-facing ssh or NNCP is more likely to have an exploitable vulnerability than is gpg (which here is just processing a stream). This is probably a safe assumption in general. If you believe gpg is more likely to have an exploitable vulnerability than ssh or NNCP, then obviously you wouldn’t take this particular approach.

On the zfs side, the use of -F with zfs receive is avoided; this could lead to a compromised backed-up machine generating a malicious rollback on the destination. Backup zpools should be imported with -R or -N to ensure that a malicious mountpoint property couldn’t be used to cause an attack. I choose to use “zfs receive -u -o readonly=on” which is compatible with both unmounted backup datasets and zpools imported with -R (or both). To access the data in a backup dataset, you would normally clone it and access it there.

The processing script

So, put this all together and look at an example of a processing script that would run from cron as root and process the incoming ZFS data.

#!/bin/bash
set -e
set -o pipefail

# Log a message
logit () {
   logger -p info -t "`basename "$0"`[$$]" "$1"
}

# Log an error message
logerror () {
   logger -p err -t "`basename "$0"`[$$]" "$1"
}

# Log stdin with the given code.  Used normally to log stderr.
logstdin () {
   logger -p info -t "`basename "$0"`[$$/$1]"
}

# Run command, logging stderr and exit code
runcommand () {
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
}

STORE=backups/simplesnap
INCOMINGDIR=/backups/nncp/incoming

if ! [ -d "$INCOMINGDIR" ]; then
        logerror "$INCOMINGDIR doesn't exist"
        exit 0
fi

LOCKFILE="/backups/nncp/.nncp-backups-zfs-scan.lock"
printf -v EVAL_SAFE_LOCKFILE '%q' "$LOCKFILE"
if dotlockfile -r 0 -l -p "${LOCKFILE}"; then
  logit "Lock obtained at ${LOCKFILE} with dotlockfile"
  trap 'ECODE=$?; dotlockfile -u '"${EVAL_SAFE_LOCKFILE}"'; exit $ECODE' EXIT INT TERM
else
  logit "Could not obtain lock at $LOCKFILE; $0 likely already running."
  exit 0
fi

EXITCODE=0


cd "$INCOMINGDIR"
logit "Scanning queue directory..."
for HOST in *; do
    HOSTPATH="$INCOMINGDIR/$HOST"
    # files like backupsfmt2-134.13134_dest
    for FILE in "$HOSTPATH"/backupsfmt2-[0-9]*_?*; do
        if [ ! -f "$FILE" ]; then
            logit "Skipping non-existent $FILE"
            continue
        fi

        # Now, $DEST will be HOST/DEST.  Strip off the @ also.
        DEST="`echo "$FILE" | sed -e 's/^.*backupsfmt2[^_]*_//' -e 's,@,/,g'`"

        if [ -z "$DEST" ]; then
            logerror "Malformed dest in $FILE"
            continue
        fi
        HOST2="`echo "$DEST" | sed 's,/.*,,g'`"
        if [ -z "$HOST2" ]; then
            logerror "Malformed DEST $DEST in $FILE"
            continue
        fi

        if [ ! "$HOST" = "$HOST2" ]; then
            logerror "$DIR: $HOST doesn't match $HOST2"
            continue
        fi

        logit "Processing $FILE to $STORE/$DEST"
            if runcommand gpg -q -d < "$FILE" | runcommand zstdcat | runcommand zfs receive -u -o readonly=on "$STORE/$DEST"; then
                logit "Successfully processed $FILE to $STORE/$DEST"
                runcommand rm "$FILE"
        else
                logerror "FAILED to process $FILE to $STORE/$DEST"
                EXITCODE=15
        fi

Applying These Ideas to Non-ZFS Backups

ZFS backups made our job easier in a lot of ways:

  • ZFS can calculate a diff based on an efficiently-stored previous local state (snapshot or bookmark), rather than a comparison to a remote state (rsync)
  • ZFS "incremental" sends, while less efficient than rsync, are reasonably efficient, sending only changed blocks
  • ZFS receive detects and enforces that the incremental source on the local machine must match the incremental source of the original stream, enforcing ordering
  • Datasets using ZFS encryption can be sent in their encrypted state
  • Incrementals can be done without a full scan of the filesystem

Some of these benefits you just won't get without ZFS (or something similar like btrfs), but let's see how we could apply these ideas to non-ZFS backups. I will explore the implementation of them in a future post.

When I say "non ZFS", I am being a bit vague as to whether the source, the destination, or both systems are running a non-ZFS filesystem. In general I'll assume that neither are ZFS.

The first and most obvious answer is to just tar up the whole system and send that every day. This is, of course, only suitable for small datasets on a fast network. These tarballs could be unpacked on the destination and stored more efficiently via any number of methods (hardlink trees, a block-level deduplicator like borg or rdedup, or even just simply compressed tarballs).

To make the network trip more efficient, something like rdiff or xdelta could be used. A signature file could be stored on the machine being backed up (generated via tee/pee at stream time), and the next run could simply send an rdiff delta over NNCP. This would be quite network-efficient, but still would require reading every byte of every file on every backup, and would also require quite a bit of temporary space on the receiving end (to apply the delta to the previous tarball and generate a new one).

Alternatively, a program that generates incremental backup files such as rdup could be used. These could be transmitted over NNCP to the backup server, and unpacked there. While perhaps less efficient on the network -- every file with at least one modified byte would be retransmitted in its entirety -- it avoids the need to read every byte of unmodified files or to have enormous temporary space. I should note here that GNU tar claims to have an incremental mode, but it has a potential data loss bug.

There are also some tools with algorithms that may apply well in this use care: syrep and fssync being the two most prominent examples, though rdedup (mentioned above) and the nascent asuran project may also be combinable with other tools to achieve this effect.

I should, of course, conclude this section by mentioning btrfs. Every time I've tried it, I've run into serious bugs, and its status page indicates that only some of them have been resolved. I would not consider using it for something as important as backups. However, if you are comfortable with it, it is likely to be able to run in more constrained environments than ZFS and could probably be processed in much the same way as zfs streams.

Airgapped / Asynchronous Backups with ZFS over NNCP

In my previous articles in the series on asynchronous communication with the modern NNCP tool, I talked about its use for asynchronous, potentially airgapped, backups. The first article, How & Why To Use Airgapped Backups laid out the foundations for this. Now let’s dig into the details.

Today’s post will cover ZFS, because it has a lot of features that make it very easy to support in this setup. Non-ZFS backups will be covered later.

The setup is actually about as simple as it is for SSH, but since people are less familiar with this kind of communication, I’m going to try to go into more detail here.

Assumptions

I am assuming a setup where:

  • The machines being backed up run ZFS
  • The disk(s) that hold the backups are also running ZFS
  • zfs send / receive is desired as an efficient way to transport the backups
  • The machine that holds the backups may have no network connection whatsoever
  • Backups will be sent encrypted over some sort of network to a spooling machine, which temporarily holds them until they are transported to the destination backup system and ingested there. This system will be unable to decrypt the data streams it temporarily stores.

Hardware

Let’s start with hardware for the machine to hold the backups. I initially considered a Raspberry Pi 4 with 8GB of RAM. That would probably have been a suitable machine, at least for smaller backup sets. However, none of the Raspberry Pi machines support hardware AES encryption acceleration, and my Pi4 benchmarks as about 60MB/s for AES encryption. I want my backups to be encrypted, and decided this would just be too slow for my purposes. Again, if you don’t need encrypted backups or don’t care that much about performance — may people probably fall into this category — you can have a fully-functional Raspberry Pi 4 system for under $100 that would make a fantastic backup server.

I wound up purchasing a Qotom-Q355G4 micro PC with a Core i5 for about $315. It has USB 3 ports and is designed as a rugged, long-lasting system. I have been using one of their older Celeron-based models as my router/firewall for a number of years now and it’s been quite reliable.

For backup storage, you can get a USB 3 external drive. My own preference is to get a USB 3 “toaster” (device that lets me plug in SATA drives) so that I have more control over the underlying medium and can save the expense and hassle of a bunch of power supplies. In a future post, I will discuss drive rotation so you always have an offline drive.

Then, there is the question of transport to the backup machine. A simple solution would be to have a heavily-firewalled backup system that has no incoming ports open but makes occasional outgoing connections to one specific NNCP daemon on the spooling machine. However, for airgapped operation, it would also be very simple to use nncp-xfer to transport the data across on a USB stick or some such. You could set up automounting for a specific USB stick – plug it in, all the spooled data is moved over, then plug it in to the backup system and it’s processed, and any outbound email traffic or whatever is copied to the USB stick at that point too. The NNCP page has some more commentary about this kind of setup.

Both are fairly easy to set up, and NNCP is designed to be transport-agnostic, so in this article I’m going to focus on how to integrate ZFS with NNCP.

Operating System

Of course, it should be no surprise that I set this up on Debian.

As an added step, I did all the configuration in Ansible stored in a local git repo. This adds a lot of work, but it means that it is trivial to periodically wipe and reinstall if any security issue is suspected. The git repo can be copied off to another system for storage and takes the system from freshly-installed to ready-to-use state.

Security

There is, of course, nothing preventing you from running NNCP as root. The zfs commands, obviously, need to be run as root. However, from a privilege separation standpoint, I have chosen to run everything relating to NNCP as a nncp user. NNCP already does encryption, but if you prefer to have zero knowledge of the data even to NNCP, it’s trivial to add gpg to the pipeline as well, and in fact I’ll be demonstrating that in a future post for other reasons.

Software

Besides NNCP, there needs to be a system that generates the zfs send streams. For this project, I looked at quite a few. Most were designed to inspect the list of snapshots on a remote end, compare it to a list on the local end, and calculate a difference from there. This, of course, won’t work for this situation.

I realized my own simplesnap project was very close to being able to do this. It already used an algorithm of using specially-named snapshots on the machine being backed up, so never needed any communication about what snapshots were present where. All it needed was a few more options to permit sending to a stream instead of zfs receive. I made those changes and they are available in simplesnap 2.0.0 or above. That version has also been uploaded to sid, and will work fine as-is on buster as well.

Preparing NNCP

I’m going to assume three hosts in this setup:

  • laptop is the machine being backed up. Of course, you may have quite a few of these.
  • spooler holds the backup data until the backup system picks it up
  • backupsvr holds the backups

The basic NNCP workflow documentation covers the basic steps. You’ll need to run nncp-cfgnew on each machine. This generates a basic configuration, along with public and private keys for that machine. You’ll copy the public key sets to the configurations of the other machines as usual. On the laptop, you’ll add a via line like this:

backupsvr: {
  id: ....
  exchpub: ...
  signpub: ...
  noisepub: ...
  via: ["spooler"]

This tells NNCP that data destined for backupsvr should always be sent via spooler first.

You can then arrange for the nncp-daemon to run on the spooler, and nncp-caller or nncp-call on the backupsvr. Or, alternatively, airgapped between the two with nncp-xfer.

Generating Backup Data

Now, on the laptop, install simplesnap (2.0.0 or above). Although you won’t be backing up to the local system, simplesnap still maintains a hostlock in ZFS. Prepate a dataset for it:

zfs create tank/simplesnap
zfs set org.complete.simplesnap:exclude=on tank/simplesnap

Then, create a script /usr/local/bin/runsimplesnap like this:

#!/bin/bash

set -e

simplesnap --store tank/simplesnap --setname backups --local --host `hostname` \
   --receivecmd /usr/local/bin/simplesnap-queue \
   --noreap

su nncp -c '/usr/local/nncp/bin/nncp-toss -noprogress -quiet'

if ip addr | grep -q 192.168.65.64; then
  su nncp -c '/usr/local/nncp/bin/nncp-call -noprogress -quiet -onlinedeadline 1 spooler'
fi

The call to simplesnap sets it up to send the data to simplesnap-queue, which we’ll create in a moment. The –receivmd, plus –noreap, sets it up to run without ZFS on the local system.

The call to nncp-toss will process any previously-received inbound NNCP packets, if there are any. Then, in this example, we do a very basic check to see if we’re on the LAN (checking 192.168.65.64), and if so, will establish a connection to the spooler to transmit the data. If course, you could also do this over the Internet, with tor, or whatever, but in my case, I don’t want to automatically do this in case I’m tethered to mobile. I figure if I want to send backups in that case, I can fire up nncp-call myself. You can also use nncp-caller to set up automated connections on other schedules; there are a lot of options.

Now, here’s what /usr/local/bin/simplesnap-queue looks like:

#!/bin/bash

set -e
set -o pipefail

DEST="`echo $1 | sed 's,^tank/simplesnap/,,'`"

echo "Processing $DEST" >&2
# stdin piped to this
su nncp -c "/usr/local/nncp/bin/nncp-exec -nice B -noprogress backupsvr zfsreceive '$DEST'" >&2
echo "Queued for $DEST" >&2

This is a pretty simple script. simplesnap will call it with a path based on the –store, with the hostname after; so, for instance, tank/simplesnap/laptop/root or some such. This script strips off the leading tank/simplesnap (which is a local fragment), leaving the host and dataset paths. Then it just pipes it to nncp-exec. -nice B classifies it as low-priority bulk data (so if you have some more important interactive data, it would be sent first), then passes it to whatever the backupsvr defines as zfsreceive.

Receiving ZFS backups

In the NNCP configuration on the recipient’s side, in the laptop section, we define what command it’s allowed to run as zfsreceive:

      exec: {
        zfsreceive: ["/usr/bin/sudo", "-H", "/usr/local/bin/nncp-zfs-receive"]
      }

We authorize the nncp user to run this under sudo in /etc/sudoers.d/local–nncp:

Defaults env_keep += "NNCP_SENDER"
nncp ALL=(root) NOPASSWD: /usr/local/bin/nncp-zfs-receive

The NNCP_SENDER is the public key ID of the sending node when nncp-toss processes the incoming data. We can use that for sanity checking later.

Now, here’s a basic nncp-zfs-receive script:

#!/bin/bash
set -e
set -o pipefail

STORE=backups/simplesnap
DEST="$1"

# now process stdin
runcommand zfs receive -o readonly=on -x mountpoint "$STORE/$DEST"

And there you have it — all the basics are in place.

Update 2020-12-30: An earlier version of this article had “zfs receive -F” instead of “zfs receive -o readonly=on -x mountpoint”. These changed arguments are more robust.
Update 2021-01-04: I am now recommending “zfs receive -u -o readonly=on”; see my successor article for more.

Enhancements

You could enhance the nncp-zfs-receive script to improve logging and error handling. For instance:

#!/bin/bash

set -e
set -o pipefail

STORE=backups/simplesnap
# $1 will be the host/dataset

DEST="$1"
HOST="`echo "$1" | sed 's,/.*,,g'`"
if [ -z "$HOST" ]; then
   echo "Malformed command line"
   exit 5
fi

# Log a message
logit () {
   logger -p info -t "`basename "$0"`[$$]" "$1"
}

# Log an error message
logerror () {
   logger -p err -t "`basename "$0"`[$$]" "$1"
}

# Log stdin with the given code.  Used normally to log stderr.
logstdin () {
   logger -p info -t "`basename "$0"`[$$/$1]"
}

# Run command, logging stderr and exit code
runcommand () {
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
}
exiterror () {
   logerror "$1"
   echo "$1" 1>&2
   exit 10
}

# Sanity check

if [ "$HOST" = "laptop" ]; then
  if [ "$NNCP_SENDER" != "12345678" ]; then
    exiterror "Host $HOST doesn't match sender $NNCP_SENDER"
  fi
else
  exiterror "Unknown host $HOST"
fi

runcommand zfs receive -F "$STORE/$DEST"

Now you’ll capture the ZFS receive output in syslog in a friendly way, so you can look back later why things failed if they did.

Further notes on NNCP

nncp-toss will examine the exit code from an invocation. If it is nonzero, it will keep the command (and associated stdin) in the queue and retry it on the next invocation. NNCP does not guarantee order of execution, so it is possible in some cases that ZFS streams may be received in the wrong order. That is fine here; zfs receive will exit with an error, and nncp-toss will just run it again after the dependent snapshots have been received. For non-ZFS backups, a simple sequence number can handle this issue.