Category Archives: Technology

Airgapped / Asynchronous Backups with ZFS over NNCP

December 30, 2020Debian, Linux, Softwareairgap, asynchronous, nncp, zfsJohn Goerzen

In my previous articles in the series on asynchronous communication with the modern NNCP tool, I talked about its use for asynchronous, potentially airgapped, backups. The first article, How & Why To Use Airgapped Backups laid out the foundations for this. Now let’s dig into the details.

Today’s post will cover ZFS, because it has a lot of features that make it very easy to support in this setup. Non-ZFS backups will be covered later.

The setup is actually about as simple as it is for SSH, but since people are less familiar with this kind of communication, I’m going to try to go into more detail here.

Assumptions

I am assuming a setup where:

The machines being backed up run ZFS
The disk(s) that hold the backups are also running ZFS
zfs send / receive is desired as an efficient way to transport the backups
The machine that holds the backups may have no network connection whatsoever
Backups will be sent encrypted over some sort of network to a spooling machine, which temporarily holds them until they are transported to the destination backup system and ingested there. This system will be unable to decrypt the data streams it temporarily stores.

Hardware

Let’s start with hardware for the machine to hold the backups. I initially considered a Raspberry Pi 4 with 8GB of RAM. That would probably have been a suitable machine, at least for smaller backup sets. However, none of the Raspberry Pi machines support hardware AES encryption acceleration, and my Pi4 benchmarks as about 60MB/s for AES encryption. I want my backups to be encrypted, and decided this would just be too slow for my purposes. Again, if you don’t need encrypted backups or don’t care that much about performance — may people probably fall into this category — you can have a fully-functional Raspberry Pi 4 system for under $100 that would make a fantastic backup server.

I wound up purchasing a Qotom-Q355G4 micro PC with a Core i5 for about $315. It has USB 3 ports and is designed as a rugged, long-lasting system. I have been using one of their older Celeron-based models as my router/firewall for a number of years now and it’s been quite reliable.

For backup storage, you can get a USB 3 external drive. My own preference is to get a USB 3 “toaster” (device that lets me plug in SATA drives) so that I have more control over the underlying medium and can save the expense and hassle of a bunch of power supplies. In a future post, I will discuss drive rotation so you always have an offline drive.

Then, there is the question of transport to the backup machine. A simple solution would be to have a heavily-firewalled backup system that has no incoming ports open but makes occasional outgoing connections to one specific NNCP daemon on the spooling machine. However, for airgapped operation, it would also be very simple to use nncp-xfer to transport the data across on a USB stick or some such. You could set up automounting for a specific USB stick – plug it in, all the spooled data is moved over, then plug it in to the backup system and it’s processed, and any outbound email traffic or whatever is copied to the USB stick at that point too. The NNCP page has some more commentary about this kind of setup.

Both are fairly easy to set up, and NNCP is designed to be transport-agnostic, so in this article I’m going to focus on how to integrate ZFS with NNCP.

Operating System

Of course, it should be no surprise that I set this up on Debian.

As an added step, I did all the configuration in Ansible stored in a local git repo. This adds a lot of work, but it means that it is trivial to periodically wipe and reinstall if any security issue is suspected. The git repo can be copied off to another system for storage and takes the system from freshly-installed to ready-to-use state.

Security

There is, of course, nothing preventing you from running NNCP as root. The zfs commands, obviously, need to be run as root. However, from a privilege separation standpoint, I have chosen to run everything relating to NNCP as a nncp user. NNCP already does encryption, but if you prefer to have zero knowledge of the data even to NNCP, it’s trivial to add gpg to the pipeline as well, and in fact I’ll be demonstrating that in a future post for other reasons.

Software

Besides NNCP, there needs to be a system that generates the zfs send streams. For this project, I looked at quite a few. Most were designed to inspect the list of snapshots on a remote end, compare it to a list on the local end, and calculate a difference from there. This, of course, won’t work for this situation.

I realized my own simplesnap project was very close to being able to do this. It already used an algorithm of using specially-named snapshots on the machine being backed up, so never needed any communication about what snapshots were present where. All it needed was a few more options to permit sending to a stream instead of zfs receive. I made those changes and they are available in simplesnap 2.0.0 or above. That version has also been uploaded to sid, and will work fine as-is on buster as well.

Preparing NNCP

I’m going to assume three hosts in this setup:

laptop is the machine being backed up. Of course, you may have quite a few of these.
spooler holds the backup data until the backup system picks it up
backupsvr holds the backups

The basic NNCP workflow documentation covers the basic steps. You’ll need to run nncp-cfgnew on each machine. This generates a basic configuration, along with public and private keys for that machine. You’ll copy the public key sets to the configurations of the other machines as usual. On the laptop, you’ll add a via line like this:

backupsvr: {
  id: ....
  exchpub: ...
  signpub: ...
  noisepub: ...
  via: ["spooler"]

This tells NNCP that data destined for backupsvr should always be sent via spooler first.

You can then arrange for the nncp-daemon to run on the spooler, and nncp-caller or nncp-call on the backupsvr. Or, alternatively, airgapped between the two with nncp-xfer.

Generating Backup Data

Now, on the laptop, install simplesnap (2.0.0 or above). Although you won’t be backing up to the local system, simplesnap still maintains a hostlock in ZFS. Prepate a dataset for it:

zfs create tank/simplesnap
zfs set org.complete.simplesnap:exclude=on tank/simplesnap

Then, create a script /usr/local/bin/runsimplesnap like this:

#!/bin/bash

set -e

simplesnap --store tank/simplesnap --setname backups --local --host `hostname` \
   --receivecmd /usr/local/bin/simplesnap-queue \
   --noreap

su nncp -c '/usr/local/nncp/bin/nncp-toss -noprogress -quiet'

if ip addr | grep -q 192.168.65.64; then
  su nncp -c '/usr/local/nncp/bin/nncp-call -noprogress -quiet -onlinedeadline 1 spooler'
fi

The call to simplesnap sets it up to send the data to simplesnap-queue, which we’ll create in a moment. The –receivmd, plus –noreap, sets it up to run without ZFS on the local system.

The call to nncp-toss will process any previously-received inbound NNCP packets, if there are any. Then, in this example, we do a very basic check to see if we’re on the LAN (checking 192.168.65.64), and if so, will establish a connection to the spooler to transmit the data. If course, you could also do this over the Internet, with tor, or whatever, but in my case, I don’t want to automatically do this in case I’m tethered to mobile. I figure if I want to send backups in that case, I can fire up nncp-call myself. You can also use nncp-caller to set up automated connections on other schedules; there are a lot of options.

Now, here’s what /usr/local/bin/simplesnap-queue looks like:

#!/bin/bash

set -e
set -o pipefail

DEST="`echo $1 | sed 's,^tank/simplesnap/,,'`"

echo "Processing $DEST" >&2
# stdin piped to this
su nncp -c "/usr/local/nncp/bin/nncp-exec -nice B -noprogress backupsvr zfsreceive '$DEST'" >&2
echo "Queued for $DEST" >&2

This is a pretty simple script. simplesnap will call it with a path based on the –store, with the hostname after; so, for instance, tank/simplesnap/laptop/root or some such. This script strips off the leading tank/simplesnap (which is a local fragment), leaving the host and dataset paths. Then it just pipes it to nncp-exec. -nice B classifies it as low-priority bulk data (so if you have some more important interactive data, it would be sent first), then passes it to whatever the backupsvr defines as zfsreceive.

Receiving ZFS backups

In the NNCP configuration on the recipient’s side, in the laptop section, we define what command it’s allowed to run as zfsreceive:

      exec: {
        zfsreceive: ["/usr/bin/sudo", "-H", "/usr/local/bin/nncp-zfs-receive"]
      }

We authorize the nncp user to run this under sudo in /etc/sudoers.d/local–nncp:

Defaults env_keep += "NNCP_SENDER"
nncp ALL=(root) NOPASSWD: /usr/local/bin/nncp-zfs-receive

The NNCP_SENDER is the public key ID of the sending node when nncp-toss processes the incoming data. We can use that for sanity checking later.

Now, here’s a basic nncp-zfs-receive script:

#!/bin/bash
set -e
set -o pipefail

STORE=backups/simplesnap
DEST="$1"

# now process stdin
runcommand zfs receive -o readonly=on -x mountpoint "$STORE/$DEST"

And there you have it — all the basics are in place.

Update 2020-12-30: An earlier version of this article had “zfs receive -F” instead of “zfs receive -o readonly=on -x mountpoint”. These changed arguments are more robust.
Update 2021-01-04: I am now recommending “zfs receive -u -o readonly=on”; see my successor article for more.

Enhancements

You could enhance the nncp-zfs-receive script to improve logging and error handling. For instance:

#!/bin/bash

set -e
set -o pipefail

STORE=backups/simplesnap
# $1 will be the host/dataset

DEST="$1"
HOST="`echo "$1" | sed 's,/.*,,g'`"
if [ -z "$HOST" ]; then
   echo "Malformed command line"
   exit 5
fi

# Log a message
logit () {
   logger -p info -t "`basename "$0"`[$$]" "$1"
}

# Log an error message
logerror () {
   logger -p err -t "`basename "$0"`[$$]" "$1"
}

# Log stdin with the given code.  Used normally to log stderr.
logstdin () {
   logger -p info -t "`basename "$0"`[$$/$1]"
}

# Run command, logging stderr and exit code
runcommand () {
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
}
exiterror () {
   logerror "$1"
   echo "$1" 1>&2
   exit 10
}

# Sanity check

if [ "$HOST" = "laptop" ]; then
  if [ "$NNCP_SENDER" != "12345678" ]; then
    exiterror "Host $HOST doesn't match sender $NNCP_SENDER"
  fi
else
  exiterror "Unknown host $HOST"
fi

runcommand zfs receive -F "$STORE/$DEST"

Now you’ll capture the ZFS receive output in syslog in a friendly way, so you can look back later why things failed if they did.

Further notes on NNCP

nncp-toss will examine the exit code from an invocation. If it is nonzero, it will keep the command (and associated stdin) in the queue and retry it on the next invocation. NNCP does not guarantee order of execution, so it is possible in some cases that ZFS streams may be received in the wrong order. That is fine here; zfs receive will exit with an error, and nncp-toss will just run it again after the dependent snapshots have been received. For non-ZFS backups, a simple sequence number can handle this issue.

Asynchronous Email: Exim over NNCP (or UUCP)

December 26, 2020Softwareasynchronous, email, exim, nncpJohn Goerzen

Following up to yesterday’s article about how NNCP rehabilitates asynchronous communication with modern encryption and onion routing, here is the first of my posts showing how to put it into action.

Email is a natural fit for async; in fact, much of early email was carried by UUCP. It is useful for an airgapped machine to be able to send back messages; errors from cron, results of handling incoming data, disk space alerts, etc. (Of course, this would apply to a non-airgapped machine also).

The NNCP documentation already describes how to do this for Postfix. Here I will show how to do it for Exim.

A quick detour to UUCP land

When you encounter a system such as email that has instructions for doing something via UUCP, that should be an alert to you that “here is some very relevant information for doing this same thing via NNCP.” The syntax is different, but broadly, here’s a table of similar NNCP commands:

Purpose	UUCP	NNCP
Connect to remote system	uucico -s, uupoll	nncp-call, nncp-caller
Receive connection (pipe, daemon, etc)	uucico (-l or similar)	nncp-daemon
Request remote execution, stdin piped in	uux	nncp-exec
Copy file to remote machine	uucp	nncp-file
Copy file from remote machine	uucp	nncp-freq
Process received requests	uuxqt	nncp-toss
Move outbound requests to dir (for USB stick, airgap, etc)	N/A	nncp-xfer
Create streaming package of outbound requests	N/A	nncp-bundle

If you used UUCP back in the day, you surely remember bang paths. I will not be using those here. NNCP handles routing itself, rather than making the MTA be aware of the network topology, so this simplifies things considerably.

Sending from Exim to a smarthost

One common use for async email is from a satellite system: one that doesn’t receive mail, or have local mailboxes, but just needs to get email out to the Internet. This is a common situation even for conventionally-connected systems; in Exim speak, this is a “satellite system that routes mail via a smarthost.” That is, every outbound message goes to a specific target, which then is responsible for eventual delivery (over the Internet, LAN, whatever).

This is fairly simple in Exim.

We actually have two choices for how to do this: bsmtp or rmail mode. bsmtp (batch SMTP) is the more modern way, and is essentially a derivative of SMTP that explicitly can be queued asynchronously. Basically it’s a set of SMTP commands that can be saved in a file. The alternative is “rmail” (which is just an alias for sendmail these days), where the data is piped to rmail/sendmail with the recipients given on the command line. Both can work with Exim and NNCP, but because we’re doing shiny new things, we’ll use bsmtp.

These instructions are loosely based on the Using outgoing BSMTP with Exim HOWTO. Some of these may assume Debianness in the configuration, but should be easily enough extrapolated to other configs as well.

First, configure Exim to use satellite mode with minimal DNS lookups (assuming that you may not have working DNS anyhow).

Then, in the Exim primary router section for smarthost (router/200_exim4-config_primary in Debian split configurations), just change transport = remote_smtp_smarthost to transport = nncp.

Now, define the NNCP transport. If you are on Debian, you might name this transports/40_exim4-config_local_nncp:

nncp:
  debug_print = "T: nncp transport for $local_part@$domain"
  driver = pipe
  user = nncp
  batch_max = 100
  use_bsmtp
  command = /usr/local/nncp/bin/nncp-exec -noprogress -quiet hostname_goes_here rsmtp
.ifdef REMOTE_SMTP_HEADERS_REWRITE
  headers_rewrite = REMOTE_SMTP_HEADERS_REWRITE
.endif
.ifdef REMOTE_SMTP_RETURN_PATH
  return_path = REMOTE_SMTP_RETURN_PATH
.endif

This is pretty straightforward. We pipe to nncp-exec, run it as the nncp user. nncp-exec sends it to a target node and runs whatever that node has called rsmtp (the command to receive bsmtp data). When the target node processes the request, it will run the configured command and pipe the data in to it.

More complicated: Routing to various NNCP nodes

Perhaps you would like to be able to send mail directly to various NNCP nodes. There are a lot of ways to do that.

Fundamentally, you will need a setup similar to the UUCP example in Exim’s manualroute manual, which lets you define how to reach various hosts via UUCP/NNCP. Perhaps you have a star topology (every NNCP node exchanges email with a central hub). In the NNCP world, you have two choices of how you do this. You could, at the Exim level, make the central hub the smarthost for all the side nodes, and let it redistribute mail. That would work, but requires decrypting messages at the hub to let Exim process. The other alternative is to configure NNCP to just send to the destinations via the central hub; that takes advantage of onion routing and doesn’t require any Exim processing at the central hub at all.

Receiving mail from NNCP

On the receiving side, first you need to configure NNCP to authorize the execution of a mail program. In the section of your receiving host where you set the permissions for the client, include something like this:

      exec: {
        rsmtp: ["/usr/sbin/sendmail", "-bS"]
      }

The -bS option is what tells Exim to receive BSMTP on stdin.

Now, you need to tell Exim that nncp is a trusted user (able to set From headers arbitrarily). Assuming you are running NNCP as the nncp user, then add MAIN_TRUSTED_USERS = nncp to a file such as /etc/exim4/conf.d/main/01_exim4-config_local-nncp. That’s it!

Some hosts, of course, both send and receive mail via NNCP and will need configurations for both.

Rehabilitating Asynchronous Communication with NNCP: A Cross Between Tor, ssh, and UUCP

December 25, 2020Softwareasynchronous, nncp, offlineJohn Goerzen

Have you ever been traveling, shot a ton of photos and videos, but were annoyed to find it was saturating the terrible wifi you had access to? Maybe you’d wish the upload to pause until you get somewhere else, but then pausing syncing on your Nextcloud/Syncthing/Dropbox would also pause other syncing you didn’t want to pause. Or you have trouble backing up your laptop when not at home, in a way that won’t accidentaly eat up your cell phone data.

There are ways to help with this: asynchronous transfer.

Here’s a lot of background. If you want to see how encrypted, onion-routed UUCP looks, skip ahead to the “NNCP” section!

There is an old saying: “When all you have is a hammer, every problem looks like a nail.” We have this wonderful tool called ssh available, and it is pervasive and well-understood, so we tend to use it. But we’ve missed out on some benefits of asynchronous processing that we actually used to have more frequently.

Of course, we are all used to some asynchronous services in our lives. Email is a popular example: most mail clients work offline and will transmit stored messages when the mail server becomes reachable. Mail servers themselves work that way, too. Many instant messaging platforms do as well.

Even some backup systems do. Bacula/Bareos, for instance, spools all backup data to disk on the system connected to the tape drive, and from there to the tape itself. They do this for several reasons, but primarily the fact that if tape drives are not fed with data at their design speed, it can cause physical damage to the tape or even the drive. It causes the drive to have to pause, and seek backwards to reposition for the next write. This creates excessive travel of the tape over the write heads, causing a condition known as “tape shine” where the tape is damaged prematurely.

Here are some problems people often run into when sending data across a network (or the Internet) synchronously:

One side of the communication is much faster than the other
Internet issues interrupting communications mid-stream
Slow Internet causing processes to take much longer than planned, resulting in unexpected results or locking issues
Physical damage due to performance issues

Of course, there are plenty of situations where synchronous communication is a must. For instance:

When the status of the transaction at the remote end must be known immediately
When there is insufficient space to spool a job’s data

I suspect that the reason we don’t do more asynchronous processing these days, despite it being strong in the Unix heritage, is the lack of modern tools to do it. Let’s explore some more.

Some of my use cases

I run ZFS on all my systems that support it: file server, laptops, workstations, etc. It is only natural to use ZFS send/receive to do backups, and I do. However, when I am traveling, my laptop never gets backed up, because the backups are pulled from the backup system. Sure, there are ways around that; a VPN, for instance. But then we have the situation where sometimes I do not want to send the backup even if I have a working Internet connection: perhaps I’m tethered to a mobile connection and it would be expensive to do so, or I’m on hotel Wifi that is flaky and slow and I don’t want to give up any of its meager bandwidth.

I have another backup-related problem. I have a remote server, which until recently was using extremely slow disks. If I made significant changes, the backup would take the better part of a day. That’s annoying when I try to back up hourly. So of course I had to implement locking, but then that means none of my other machines would back up that day either.

Once I needed to transmit about 2TB of data. My home Internet connection was terribly slow, and I calculated it would take multiple months to do this. So I took to manually copying parts of the data to my laptop, and whenever I’d find an airport or coffee shop with faster Internet than at home, I’d send off those bits from it. But it took a ton of work.

The bespoke asynchronous problem

And that “ton of work” is perhaps why we aren’t doing more of this. There’s been no great standard solution, so it’s all “roll your own” when you need to. So we just use ssh, because it’s easier and usually “good enough”. But as I wrote in my recent article on airgapped backups, there are reasons to go async.

Solutions

Wouldn’t it be great to be able to queue up data for a machine, and let it get there in whatever way it can? Maybe a fast Internet connection is found, or via Tor, or via copying to a USB stick, or via radio broadcast? It would make many of these scenarios a lot easier. And there are ways for this now, with modern security!

We have some tools on Linux for this: git-annex for storage and migration, syrep for synchronization, and NNCP for file transfer and remote execution (could be combined with some of these other tools). Let’s dive in to NNCP.

NNCP

If you already know UUCP, think of NNCP as UUCP brought into the modern era, with modern security and tools.

Basically, NNCP permits you to send files to a remote system, request files from a remote system, and pipe data to an NNCP command that requests execution remotely. So you could, say, pipe a zfs send to NNCP which sends it to the remote and pipes it to zfs receive when it gets there.

NNCP has a delay-tolerant, resumable protocol that can run over just about any reliable connection: TCP, serial, Tor, radios of various kinds, you name it. But that’s not all; it also can dump its queue onto something like a USB stick for transport, or even make a tar-style stream that could be munged however you like. If you want to get fancy, you can assign priorities to data packets, so that, for instance, outbound email will always get sent before that 1TB file you’ve got to send also. You can also configure it so that certain carriers handle certain priorities of data; your cell phone would only handle the most urgent, but a USB stick would take anything.

NNCP is source-routed; you can tell it that the way that Bob reaches Alice is via Carl, then Betty. Bob can generate a message that will be sent along that route, fully encrypted and authenticated at each step of the way; Carl can’t see the content of the message or even anything about it other than its next hop.

How this helps

Let’s revisit some of my scnearios with NNCP.

For the laptop being backed up, while traveling it can queue up its backups, or photos, or videos, or whatever. They could be triggered by a command when on a good connection, or automatically. The data could be copied to USB and given to a friend to transmit; perfectly safe due to encryption. Or it could all wait until arriving at home, safely out of your other syncing directories. The NNCP documentation has an example of this.

For the server being backed up slowly, that’s easily solved; the slow backup would simply be queued up, and transmitted and processed when it’s ready. This wouldn’t interrupt other backups.

How about the 2TB transmission problem? That’s also made a lot easier. A command could be run to fill up a USB stick with parts of the queue, then that USB stick plugged in and transmitted whenever at a fast location. Repeat as needed while the slow system continues its upload of the remaining bits.

NNCP has a lot of interesting use cases documented as well.

If you are already familiar with how public keys work in SSH, then NNCP should be immediately familiar as well. It is a similar concept (though arguably somewhat easier to set up).

I am working on setting up a NNCP network, and will have more posts on how to do so once I’ve got it going. In the meantime, the documentation for the project is also pretty good.

How & Why To Use Airgapped Backups

December 23, 2020Technologyairgap, asynchronous, backups, nncp, zfsJohn Goerzen

A good backup strategy needs to consider various threats to the integrity of data. For instance:

Building catches fire
Accidental deletion
Equipment failure
Security incident / malware / compromise

It’s that last one that is of particular interest today. A lot of backup strategies are such that if a user (or administrator) has their local account or network compromised, their backups could very well be destroyed as well. For instance, do you ssh from the account being backed up to the system holding the backups? Or rsync using a keypair stored on it? Or access S3 buckets, etc? It is trivially easy in many of these schemes to totally ruin cloud-based backups, or even some other schemes. rsync can be run with –delete (and often is, to prune remotes), S3 buckets can be deleted, etc. And even if you try to lock down an over-network backup to be append-only, still there are vectors for attack (ssh credentials, OpenSSL bugs, etc). In this post, I try to explore how we can protect against them and still retain some modern conveniences.

A backup scheme also needs to make a balance between:

Cost
Security
Accessibility
Efficiency (of time, bandwidth, storage, etc)

My story so far…

About 20 years ago, I had an Exabyte tape drive, with the amazing capacity of 7GB per tape! Eventually as disk prices fell, I had external disks plugged in to a server, and would periodically rotate them offsite. I’ve also had various combinations of partial or complete offsite copies over the Internet as well. I have around 6TB of data to back up (after compression), a figure that is growing somewhat rapidly as I digitize some old family recordings and videos.

Since I last wrote about backups 5 years ago, my scheme has been largely unchanged; at present I use ZFS for local and to-disk backups and borg for the copies over the Internet.

Let’s take a look at some options that could make this better.

Tape

The original airgapped backup. You back up to a tape, then you take the (fairly cheap) tape out of the drive and put in another one. In cost per GB, tape is probably the cheapest medium out there. But of course it has its drawbacks.

Let’s start with cost. To get a drive that can handle capacities of what I’d be needing, at least LTO-6 (2.5TB per tape) would be needed, if not LTO-7 (6TB). New, these drives cost several thousand dollars, plus they need LVD SCSI or Fibre Channel cards. You’re not going to be hanging one off a Raspberry Pi; these things need a real server with enterprise-style connectivity. If you’re particularly lucky, you might find an LTO-6 drive for as low as $500 on eBay. Then there are tapes. A 10-pack of LTO-6 tapes runs more than $200, and provides a total capacity of 25TB – sufficient for these needs (note that, of course, you need to have at least double the actual space of the data, to account for multiple full backups in a set). A 5-pack of LTO-7 tapes is a little more expensive, while providing more storage.

So all-in, this is going to be — in the best possible scenario — nearly $1000, and possibly a lot more. For a large company with many TB of storage, the initial costs can be defrayed due to the cheaper media, but for a home user, not so much.

Consider that 8TB hard drives can be found for $150 – $200. A pair of them (for redundancy) would run $300-400, and then you have all the other benefits of disk (quicker access, etc.) Plus they can be driven by something as cheap as a Raspberry Pi.

Fancier tape setups involve auto-changers, but then you’re not really airgapped, are you? (If you leave all your tapes in the changer, they can generally be selected and overwritten, barring things like hardware WORM).

As useful as tape is, for this project, it would simply be way more expensive than disk-based options.

Fundamentals of disk-based airgapping

The fundamental thing we need to address with disk-based airgapping is that the machines being backed up have no real-time contact with the backup storage system. This rules out most solutions out there, that want to sync by comparing local state with remote state. If one is willing to throw storage efficiency out the window — maybe practical for very small data sets — one could just send a full backup daily. But in reality, what is more likely needed is a way to store a local proxy for the remote state. Then a “runner” device (a USB stick, disk, etc) could be plugged into the network, filled with queued data, then plugged into the backup system to have the data dequeued and processed.

Some may be tempted to short-circuit this and just plug external disks into a backup system. I’ve done that for a long time. This is, however, a risk, because it makes those disks vulnerable to whatever may be attacking the local system (anything from lightning to ransomware).

ZFS

ZFS is, it should be no surprise, particularly well suited for this. zfs send/receive can send an incremental stream that represents a delta between two checkpoints (snapshots or bookmarks) on a filesystem. It can do this very efficiently, much more so than walking an entire filesystem tree.

Additionally, with the recent addition of ZFS crypto to ZFS on Linux, the replication stream can optionally reflect the encrypted data. Yes, as long as you don’t need to mount them, you can mostly work with ZFS datasets on an encrypted basis, and can directly tell zfs send to just send the encrypted data instead of the decrypted data.

The downside of ZFS is the resource requirements at the destination, which in terms of RAM are higher than most of the older Raspberry Pi-style devices. Still, one could perhaps just save off zfs send streams and restore them later if need be, but that implies a periodic resend of a full stream, an inefficient operation. dedpulicating software such as borg could be used on those streams (though with less effectiveness if they’re encrypted).

Tar

Perhaps surprisingly, tar in listed incremental mode can solve this problem for non-ZFS users. It will keep a local cache of the state of the filesystem as of the time of the last run of tar, and can generate new tarballs that reflect the changes since the previous run (even deletions). This can achieve a similar result to the ZFS send/receive, though in a much less elegant way.

Bacula / Bareos

Bacula (and its fork Bareos) both have support for a FIFO destination. Theoretically this could be used to queue of data for transfer to the airgapped machine. This support is very poorly documented in both and is rumored to have bitrotted, however.

rdiff and xdelta

rdiff and xdelta can be used as sort of a non-real-time rsync, at least on a per-file basis. Theoretically, one could generate a full backup (with tar, ZFS send, or whatever), take an rdiff signature, and send over the file while keeping the signature. On the next run, another full backup is piped into rdiff, and on the basis of the signature file of the old and the new data, it produces a binary patch that can be queued for the backup target to update its stored copy of the file.

This leaves history preservation as an exercise to be undertaken on the backup target. It may not necessarily be easy and may not be efficient.

rsync batches

rsync can be used to compute a delta between two directory trees and express this as a single-file batch that can be processed by a remote rsync. Unfortunately this implies the sender must always keep an old tree around (barring a solution such as ZFS snapshots) in order to compute the delta, and of course it still implies the need for history processing on the remote.

Getting the Data There

OK, so you’ve got an airgapped system, some sort of “runner” device for your sneakernet (USB stick, hard drive, etc). Now what?

Obviously you could just copy data on the runner and move it back off at the backup target. But a tool like NNCP (sort of a modernized UUCP) offer a lot of help in automating the process, returning error reports, etc. NNCP can be used online over TCP, over reliable serial links, over ssh, with offline onion routing via intermediaries or directly, etc.

Imagine having an airgapped machine at a different location you go to frequently (workplace, friend, etc). Before leaving, you put a USB stick in your pocket. When you get there, you pop it in. It’s despooled and processed while you want, and return emails or whatever are queued up to be sent when you get back home. Not bad, eh?

Future installment…

I’m going to try some of these approaches and report back on my experiences in the next few weeks.

Non-Creepy Technology Purchasing & Gifting Guides

December 16, 2020Freedom, Online Life, Technologyeff, ethicaltech, fsf, privacy, surveillanceJohn Goerzen

This time of year, a lot of people are thinking of buying gadgets and phones as gifts. But there are a lot of tech companies that have unethical practices, from terrible working conditions in their factories to spying on their users. Here are some buying guides to help you find gadgets that are fun – and not creepy.

The Free Software Foundation’s Ethical Tech Giving Guide is a fantastic resource from what’s probably the pickiest organization out there when it comes to tech. Not only do they highlight good devices, they also explain why and why you should, for instance, avoid the iPhone (their history of silencing political activists and spying on users).

The FSF also has a Guide to DRM-Free Living talks about books, video, audio, and software that respects your freedom by letting you make your own backups, move it to other devices, and continue to use your purchases even if you have no Internet or the company you bought them from goes bankrupt. This is a fantastic and HUGE resource; there are hundreds of organizations out there that provide content in a way that respects your rights — and many of them do it for free, legally, as well.

PrivacyTools has a fantastic series of guides on everything from email providers to operating systems, as well as links to a number of other guides.

The DeGoogle wiki on Reddit (as well as the sidebar) has a lot of fantastic alternatives to things like Chromebooks, Chrome, Gmail, etc.

Related resources

Here are some resources for education (what the issues are) and information about what companies and products to avoid.

In addition to the FSF’s other fantastic resources above, they also have a list of proprietary malware. It lists things, practices, and companies to avoid, and talks about the reasons why. Their addictions page is particularly good and relevant to my recent post on the problems of the attention economy.

The Surveillance Self-Defense site from the Electronic Frontier Foundation is a fantastic introduction into how corporate surveillance works and how to defend against it.

Use with a grain of salt:

Mozilla, the people behind Firefox, have a site called Privacy Not Included that rates products by how “creepy” they are. They focus more narrowly on privacy than the more expansive set of freedoms the FSF considers (privacy is one of a number of things the FSF looks at), and in some cases I would say Mozilla is too generous (eg, with the Amazon Kindle, a number of their data points are just incorrect.)

How the Attention Economy Hurts You via Social Media Sites like Facebook

December 10, 2020Online Life, Technologyanger, facebook, fear, fediverse, socialmediaJohn Goerzen

Note: This post is also available on my website, where it will be periodically updated.

There is a whole science to manipulating our attention. And because there is a lot of money to be made by doing this well, it means we all encounter attempts to manipulate what we pay attention to each day. What is this, and how is it harmful? This post will be the first on a series on the topic.

Why is attention so important?

When people use Facebook, they use it for free. Facebook generally doesn’t even try to sell them anything, yet has billions in revenues. What, then, is Facebook’s product?

Well, really, it’s you. Or, more specifically, your attention. Facebook sells your attention to advertisers. Everything they do is in service to that. They want you to spend more time on the site so they can show you more ads.

(I should say here that I’m using Facebook as an example, but this applies to other social media companies too.)

Seeking to maximize attention

So if your attention is so important to their profit, it follows naturally that they would seek ways to get people to spend more time on their site. And they do. They track all sorts of metrics, including “engagement” (if you click “like”, comment, share, or otherwise interact with content). They know which sorts of things are likely to capture your (and I mean you in specific!) attention and show you that. Your neighbor may have different interests and Facebook judges different things are likely to capture their attention.

Manipulating your attention

Attention turning into money isn’t unique for social media. In fact, in the article If It Bleeds, It Leads: Understanding Fear-Based Media, Psychology Today writes:

In previous decades, the journalistic mission was to report the news as it actually happened, with fairness, balance, and integrity. However, capitalistic motives associated with journalism have forced much of today’s television news to look to the spectacular, the stirring, and the controversial as news stories. It’s no longer a race to break the story first or get the facts right. Instead, it’s to acquire good ratings in order to get advertisers, so that profits soar.

News programming uses a hierarchy of if it bleeds, it leads. Fear-based news programming has two aims. The first is to grab the viewer’s attention. In the news media, this is called the teaser. The second aim is to persuade the viewer that the solution for reducing the identified fear will be in the news story. If a teaser asks, “What’s in your tap water that YOU need to know about?” a viewer will likely tune in to get the up-to-date information to ensure safety.

You’ve probably seen fear-based messages a lot on Facebook. They will highlight messages to liberals about being afraid of what Trump is doing, and to conservatives about being afraid of what Biden is doing. They may or may not even intentionally be doing this; it is their algorithm predicts that those would maximize time and engagement for certain people, so that’s what they see.

Fear leads to controversy

It’s not just fear, though. Social media also loves controversy. There’s nothing that makes people really want to stay on Facebook like anger. See something controversial and you’ll see hundreds or thousands of people are there arguing about it — and in the process, giving Facebook their attention. A quick Internet search will show you numerous articles on how marketing companes can leverage controvery to get attention and engagement with their campaigns.

Consequences of maximizing fear and controversy

What does it mean to society at large — and to you personally — that large companies make a lot of money by maximizing fear and controversy?

The most obvious way is it leads to less common ground. If the posts and reactions that show common ground are never seen because they don’t drive engagement, it poisons the well; left and right hate each other with ever more vigor — a profitable outcome to Facebook, but a poisonous one to all of us.

I have had several friendships lost because I — a liberal in agreement with these friends on political matters — still talk to Trump voters. On the other side, we’ve seen people storm the Michigan statehouse with weapons. How did that level of disagreement — and even fear behind it — get so firmly embedded in our society? Surely the fact that social media shows us things designed to stimulate fear and anger must play a role.

What does it do to our ability to have empathy for, and understand, others? The Facebook groups I’ve been in for like-minded people have largely been flooded with memes calling the President “rump” and other things clearly designed to make people angry or fearful. It’s a worthless experience, and not just that, but it’s a harmful experience.

When our major media — TV and social networks — all are optimizing for fear, anger, and controvesry, we have a society beholden to fear, anger, and controvesy.

In my next installment, I’m going to talk about what to do about this, including the decentralized social networks of the Fediverse that are specifically designed to put you back in charge of your attention.

Update 2020-12-16: There are two followup articles for this: how to join the Fediverse and non-creepy technology purchasing and gifting guides. The latter references the FSF’s page on software manipulation towards addiction, which is particularly relevant to this topic.

The Fundamental Problem in Python 3

November 22, 2019TechnologypythonJohn Goerzen

“In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.”

– Douglas Adams

This expands on my recent post The Incredible Disaster of Python 3. I seem to have annoyed the Internet…

Back in the mists of time, Unix was invented. Today the descendants of Unix, whether literal or in spirit, power the majority of the world’s cell phones, most of the most popular sites on the Internet, etc. And among this very popular architecture, there lies something that has made people very angry at times: on a Unix filesystem, 254 bytes are valid in filenames. The two that are not are 0x00 and the slash character. Otherwise, they are valid in virtually any combination (the special entries “.” and “..” being the exception).

This property has led to a whole host of bugs, particularly in shell scripts. A filename with a leading dash might look like a parameter to a tool. Filenames can contain newline characters, space characters, control characters, and so forth; running ls in a directory with maliciously-named files could certainly scramble one’s terminal. These bugs continue to persist, though modern shells offer techniques that — while optional — can be used to avoid most of these classes of bugs.

It should be noted here that not every valid stream of bytes constitutes a stream of bytes that can be decoded as UTF-8. This is a departure from earlier encoding schemes such as iso-8859-1 and cp437; you might get gibberish, but “garbage in, garbage out” was a thing and if your channel was 8-bit clean, your gibberish would survive unmodified.

Unicode brings many advantages, and has rightly become the predominant standard for text encoding. But the previous paragraph highlights one of the challenges, and this challenge, along with some others, are at the heart of the problem with Python 3. That is because, at a fundamental level, Python 3’s notion of a filename is based on a fiction. Not only that, but it tries to introduce strongly-typed strings into what is fundamentally a weakly-typed, dynamic language.

A quick diversion: The Rust Perspective

The Unicode problem is a problematic one, and it may be impossible to deal with it with complete elegance. Many approaches exist; here I will describe Rust’s string-related types, of which there are three for our purposes:

The String (and related &str) is used for textual data and contains bytes that are guaranteed to be valid UTF-8 at all times
The Vec<u8> (and related [u8]) is a representation of pure binary bytes, of which all 256 possible characters are valid in any combination, whether or not it forms valid UTF-8
And the Path, which represents a path name on the system.

The Path uses the underlying operating system’s appropriate data type (here I acknowledge that Windows is very different from POSIX in this regard, though I don’t go into that here). Compile-time errors are generated when these types are mixed without proper safe conversion.

The Python Fiction

Python, in contrast, has only two types; roughly analogous to the String and the Vec<u8> in Rust. Critically, most of the Python standard library treats a filename as a String – that is, a sequence of valid Unicode code points, which is a subset of the valid POSIX filenames.

Do you see what we just did here? We’ve set up another shell-like situation in which filenames that are valid on the system create unexpected behaviors in a language. Only this time, it’s not \n, it’s things like \xF7.

From a POSIX standpoint, the correct action would have been to use the bytes type for filenames; this would mandate proper encode/decode calls by the user, but it would have been quite clear. It should be noted that some of the most core calls in Python, such as open(), do accept both bytes and strings, but this behavior is by no means consistent in the standard library, and some parts of the library that process filenames (for instance, listdir in its most common usage) return strings.

The Plot Thickens

At some point, it was clearly realized that this behavior was leading to a lot of trouble on POSIX systems. Having a listdir() function be unable (in its common usage; see below) to handle certain filenames was clearly not going to work. So Python introduced its surrogate escape. When using surrogate escapes, when attempting to decode a binary byte that is not valid in UTF-8, it is replaced with a multibyte UTF-8 sequence from Unicode code space that is otherwise rarely used. Then, when converted back to a binary sequence, this Unicode code point is converted to the same original byte. However, this is not a systemwide default and in many cases must be specifically requested.

And now you see this is both an ugly kludge and a violation of the promise of what a string is supposed to be in Python 3, since this doesn’t represent a valid Unicode character at all, but rather a token for the notion that “there was a byte here that we couldn’t convert to Unicode.” Now you have a string that the system thinks is Unicode, that looks like Unicode, that you can process as Unicode — substituting, searching, appending, etc — but which is actually at least partially representing things that should rightly be unrepresentable in Unicode.

And, of course, surrogate escapes are not universally used by even the Python standard library either. So we are back to the problem we had in Python 2: what the heck is a string, anyway? It might be all valid Unicode, it might have surrogate escapes in it, it might have been decoded from the wrong locale (because life isn’t perfect), and so forth.

Unicode Realities

The article pragmatic Unicode highlights these facts:

Computers are built on bytes
The world needs more than 256 symbols
You cannot infer the encoding of bytes — you must be told, or have to guess
Sometimes you are told wrong

I have no reason to quibble with this. How, then, does that stack up with this code from Python? (From zipfile.py, distributed as part of Python)

            if flags & 0x800:
                # UTF-8 file names extension
                filename = filename.decode('utf-8')
            else:
                # Historical ZIP filename encoding
                filename = filename.decode('cp437')

There is a reason that Python can’t extract a simple ZIP file properly. The snippet above violated the third rule by inferring a cp437 encoding when it shouldn’t. But it’s worse; the combination of factors leads extracall() to essentially convert a single byte from CP437 to a multibyte Unicode code point on extraction, rather than simply faithfully reproducing the bytestream that was the filename. Oh, and it doesn’t use surrogate escapes. Good luck with that one.

It gets even worse

Let’s dissect Python’s disastrous documentation on Unicode filenames.

First, we begin with the premise that there is no filename encoding in POSIX. Filenames are just blobs of bytes. There is no filename encoding!

What about $LANG and friends? They give hints about the environment, languages for interfaces, and terminal encoding. They can often be the best HINT as to how we should render characters and interpret filenames. But they do not subvert the fundamental truth, which is that POSIX filenames do not have to conform to UTF-8.

So, back to the Python documentation. Here are the problems with it:

It says that there will be a filesystem encoding if you set LANG or LC_CTYPE, falling back to UTF-8 if not specified. As we have already established, UTF-8 can’t handle POSIX filenames.
It gets worse: “The os.listdir() function returns filenames, which raises an issue: should it return the Unicode version of filenames, or should it return bytes containing the encoded versions? os.listdir() can do both”. So we are somewhat tacitly admitting here that str was a poor choice for filenames, but now we try to have it every which way. This is going to end badly.
And then there’s this gem: “Note that on most occasions, you should can just stick with using Unicode with these APIs. The bytes APIs should only be used on systems where undecodable file names can be present; that’s pretty much only Unix systems now.” Translation: Our default advice is to pretend the problem doesn’t exist, and will cause your programs to be broken or crash on POSIX.

Am I just living in the past?

This was the most common objection raised to my prior post. “Get over it, the world’s moved on.” Sorry, no. I laid out the case for proper handling of this in my previous post. But let’s assume that your filesystems are all new, with shiny UTF-8 characters. It’s STILL a problem. Why? Because it is likely that an errant or malicious non-UTF-8 sequence will cause a lot of programs to crash or malfunction.

We know how this story goes. All the shell scripts that do the wrong thing when “; rm” is in a filename, for instance. Now, Python is not a shell interpreter, but if you have a program that crashes on a valid filename, you have — at LEAST — a vector for denial of service. Depending on the circumstances, it could turn into more.

Conclusion

Some Python 3 code is going to crash or be unable to process certain valid POSIX filenames.
Some Python 3 code might use surrogate escapes to handle them.
Some Python 3 code — part of Python itself even — just assumes it’s all from cp437 (DOS) and converts it that way.
Some people recommend using latin-1 instead of surrogate escapes – even official Python documentation covers this.

The fact is: A Python string is the WRONG data type for a POSIX filename, and so numerous, incompatible kludges have been devised to work around this problem. There is no consensus on which kludge to use, or even whether or not to use one, even within Python itself, let alone the wider community. We are going to continue having these problems as long as Python continues to use a String as the fundamental type of a filename.

Doing the right thing in Python 3 is extremely hard, not obvious, and rarely taught. This is a recipe for a generation of buggy code. Easy things should be easy; hard things should be possible. Opening a file correctly should be easy. Sadly I fear we are in for many years of filename bugs in Python, because this would be hard to fix now.

Resources

(For even more fun, consider command line parameters and environment variables! I’m annoyed enough with filenames to leave those alone for now.)

The Incredible Disaster of Python 3

November 20, 2019SoftwarepythonJohn Goerzen

Update 2019-11-22: A successor article to this one dives into some of the underlying complaints.

I have long noted issues with Python 3’s bytes/str separation, which is designed to have a type “bytes” that is a simple list of 8-bit characters, and “str” which is a Unicode string. After apps started using Python 3, I started noticing issues: they couldn’t open filenames that were in ISO-8859-1, gpodder couldn’t download podcasts with 8-bit characters in their title, etc. I have files on my system dating back to well before widespread Unicode support in Linux.

Due to both upstream and Debian deprecation of Python 2, I have been working to port pygopherd to Python 3. I was not looking forward to this task. It turns out that the string/byte types in Python 3 are even more of a disaster than I had at first realized.

Background: POSIX filenames

On POSIX platforms such as Unix, a filename consists of one or more 8-bit bytes, which may be any 8-bit value other than 0x00 or 0x2F (‘/’). So a file named “test\xf7.txt” is perfectly acceptable on a Linux system, and in ISO-8859-1, that filename would contain the division sign ÷. Any language that can’t process valid filenames has serious bugs – and Python is littered with these bugs.

Inconsistencies in Types

Before we get to those bugs, let’s look at this:

>>> "/foo"[0]
'/'
>>> "/foo"[0] == '/'
True
>>> b"/foo"[0]
47
>>> b"/foo"[0] == '/'     # this will fail anyhow because bytes never equals str
False
>>> b"/foo"[0] == b'/'
False
>>> b"/foo"[0] == b'/'[0]
True

Look at those last two items. With the bytes type, you can’t compare a single element of a list to a single character, even though you still can with a str. I have no explanation for this mysterious behavior, though thankfully the extensive tests I wrote in 2003 for pygopherd did cover it.

Bugs in the standard library

A whole class of bugs arise because parts of the standard library will accept str or bytes for filenames, while other parts accept only str. Here are the particularly egregious examples I ran into.

Python 3’s zipfile module is full of absolutely terrible code. As I reported in Python bug 38861, even a simple zipfile.extractall() fails to faithfully reproduce filenames contained in a ZIP file. Not only that, but there is egregious code like this in zipfile.py:

            if flags & 0x800:
                # UTF-8 file names extension
                filename = filename.decode('utf-8')
            else:
                # Historical ZIP filename encoding
                filename = filename.decode('cp437')

I can assure you that zip on Unix was not mystically converting filenames from iso-8859-* to cp437 (which was from DOS, and almost unheard-of on Unix). Or how about this gem:

    def _encodeFilenameFlags(self):
        try:
            return self.filename.encode('ascii'), self.flag_bits
        except UnicodeEncodeError:
            return self.filename.encode('utf-8'), self.flag_bits | 0x800

This combines to a situation where perfectly valid filenames cannot be processed by the zipfile module, valid filenames are mangled on extraction, and unwanted and incorrect character set conversions are performed. zipfile has no mechanism to access ZIP filenames as bytes.

How about the dbm module? It simply has no way to specify a filename as bytes, and absolutely can’t open a file named “text\x7f”. There is simply no way to make that happen. I reported this in Python bug 38864.

Update 2019-11-20: As is pointed out in the comments, there is a way to encode this byte in a Unicode string in Python, so “absolutely can’t open” was incorrect. However, I strongly suspect that little code uses that approach and it remains a problem.

I should note that a simple open(b"foo\x7f.txt", "w") works. The lowest-level calls are smart enough to handle this, but the ecosystem built atop them is uneven at best. It certainly doesn’t help that things like b"foo" + "/" are runtime crashers.

Larger Consequences of These Issues

I am absolutely convinced that these are not the only two modules distributed with Python itself that are incapable of opening or processing valid files on a Unix system. I fully expect that these issues are littered throughout the library. Nobody appears to be testing for them. Nobody appears to care about them.

It is part of a worrying trend I have been seeing lately of people cutting corners and failing to handle valid things that have been part of the system for years. We are, by example and implementation, teaching programmers that these shortcuts are fine, that it’s fine to use something that is required to be utf-8 to refer to filenames on Linux, etc. A generation of programmers will grow up writing code that is incapable of processing files with perfectly valid names. I am thankful that grep, etc. aren’t written in Python, because if they were, they’d crash all the time.

Here are some other examples:

When running “git status” on my IBM3151 terminal connected to Linux, I found it would clear the screen each time. Huh. Apparently git assumes that if you’re using it from a terminal, the terminal supports color, and it doesn’t bother using terminfo; it just sends ANSI sequences assuming that everything uses them. The IBM3151 doesn’t by default. (GNU tools like ls get this right) This is but one egregious example of a whole suite of tools that fail to use the ncurses/terminfo libraries that we’ve had for years to properly abstract these things.
A whole suite of tools, including ssh, tmux, and so forth, blindly disable handling of XON/XOFF on the terminal, neglecting the fact that this is actually quite important for some serial lines. Thankfully I can at least wrap things in GNU Screen to get proper XON/XOFF handling.
The Linux Keyspan USB serial driver doesn’t even implement XON/XOFF handling at all.

Now, you might make an argument “Well, ISO-8859-* is deprecated. We’ve all moved on to Unicode!” And you would be, of course, wrong. Unix had roughly 30 years of history before xterm supported UTF-8. It would be quite a few more years until UTF-8 reached the status of default for many systems; it wasn’t until Debian etch in 2007 that Debian used utf-8 by default. Files with contents or names in other encoding schemes exist and people find value in old files. “Just rename them all!” you might say. In some situations, that might work, but consider — how many symlinks would it break? How many scripts that refer to things by filenames would it break? The answer is most certainly nonzero. There is no harm in having files laying about the system in other encoding schemes — except to buggy software that can’t cope. And this post doesn’t even concern the content of files, which is a whole additional problem, though thankfully the situation there is generally at least somewhat better.

There are also still plenty of systems that can’t handle multibyte characters (and in various embedded or mainframe contexts, can’t even handle 8-bit characters). Not all terminals support ANSI. It requires only correct thinking (“What is a valid POSIX filename? OK, our datatypes better support that then”) to do the right thing.

Update 1, 2019-11-21: Here is an article dating back to 2014 about the Unicode issues in Python 3, which goes into quite a bit of detail about it. It lays out a compelling case for the issues with its attempt to implement a replacement for cat in python 2 and 3. The Practical Python porting for systems programmers is also relevant and, like me, highlights many of these same issues. Finally, this is not the first time I raised issues; I wrote The Python Unicode Mess more than a year ago. Unfortunately, as I am now working to port a larger codebase, the issues I raised before are more acute, and I have discovered more. At this point, I am extremely unlikely to use Python for any new project due to these issues.

TCP/IP over LoRa radios

October 31, 2019Technologyamateur radio, lora, radioJohn Goerzen

As I wrote yesterday, I have been experimenting with LoRa radios. Today, I got TCP/IP working over them!

The AX.25 protocol did indeed turn out to be well-suited to this. It’s simple and works. The performance is, predictably, terrible; ping times around 500-600ms, but it does work. I fired up ssh, ran emacs, did a bit with bash, and — yep! Very cool. I tried mosh as well, thinking it would be great for this, but for some reason it just flooded the link with endless packets and was actually rather terrible.

I wrote up how to use it. It’s not even all that hard!

Pretty satisfying seeing this work.

Long-Range Radios: A Perfect Match for Unix Protocols From The 70s

October 30, 2019Linux, Programmingamateur radio, lora, radioJohn Goerzen

It seems I’ve been on a bit of a vintage computing kick lately. After connecting an original DEC vt420 to Linux and resurrecting some old operating systems, I dove into UUCP.

In fact, it so happened that earlier in the week, my used copy of Managing UUCP & Usenet (its author list includes none other than Tim O’Reilly) arrived. I was reading about the challenges of networking in the 70s: half-duplex lines, slow transmission rates, and modems that had separate dialers. And then I stumbled upon long-distance radio. It turns out that a lot of modern long-distance radio has much in common with the challenges of communication in the 1970s – 1990s, and some of our old protocols might be particularly well-suited for it. Let me explain — I’ll start with the old software, and then talk about the really cool stuff going on in hardware (some radios that can send a signal for 10-20km or more with very little power!), and finally discuss how to bring it all together.

UUCP

UUCP, for those of you that may literally have been born after it faded in popularity, is a batch system for exchanging files and doing remote execution. For users, the uucp command copies files to or from a remote system, and uux executes commands on a remote system. In practical terms, the most popular use of this was to use uux to execute rmail on the remote system, which would receive an email message on stdin and inject it into the system’s mail queue. All UUCP commands are queued up and transmitted when a “call” occurs — over a modem, TCP, ssh pipe, whatever.

UUCP had to deal with all sorts of line conditions: very slow lines (300bps), half-duplex lines, noisy and error-prone communication, poor or nonexistent flow control, even 7-bit communication. It supports a number of different transport protocols that can accommodate these varying conditions. It turns out that these mesh fairly perfectly with some properties of modern long-distance radio.

AX.25

The AX.25 stack is a frame-based protocol used by amateur radio folks. Its air speed is 300bps, 1200bps, or (rarely) 9600bps. The Linux kernel has support for the AX.25 protocol and it is quite possible to run TCP/IP atop it. I have personally used AX.25 to telnet to a Linux box 15 miles away over a 1200bps air speed, and have also connected all the way from Kansas to Texas and Indiana using 300bps AX.25 using atmospheric skip. AX.25 has “connected” packets (as TCP) and unconnected/broadcast ones (similar to UDP) and is a error-detected protocol with retransmit. The radios generally used with AX.25 are always half-duplex and some of them have iffy carrier detection (which means collision is frequent). Although the whole AX.25 stack has grown rare in recent years, a subset of it is still in wide use as the basis for APRS.

A lot of this is achieved using equipment that’s not particularly portable: antennas on poles, radios that transmit with anywhere from 1W to 100W of power (even 1W is far more than small portable devices normally use), etc. Also, under the regulations of the amateur radio service, transmitters must be managed by a licensed operator and cannot be encrypted.

Nevertheless, AX.25 is just a protocol and it could, of course, run on other kinds of carriers than traditional amateur radios.

Long-range low-power radios

There is a lot being done with radios these days, much of which I’m not going to discuss. I’m not covering very short-range links such as Bluetooth, ZigBee, etc. Nor am I covering longer-range links that require large and highly-directional antennas (such as some are doing in the 2.4GHz and 5GHz bands). What I’m covering is long-range links that can be used by portable devices.

There is always a compromise in radios, and if we are going to achieve long-range links with poor antennas and low power, the compromise is going to be in bitrate. These technologies may scale down to as low at 300bps or up to around 115200bps. They can, as a side bonus, often be quite cheap.

HC-12 radios

HC-12 is a radio board, commonly used with Arduino, that sports 500bps to 115200bps communication. According to the vendor, in 500bps mode, the range is 1800m or 0.9mi, while at 115200bps, the range is 100m or 328ft. They’re very cheap, at around $5 each.

There are a few downsides to HC-12. One is that the lowest air bitrate is 500bps, but the lowest UART bitrate is 1200bps, and they have no flow control. So, if you are running in long-range mode, “only small packets can be sent: max 60 bytes with the interval of 2 seconds.” This would pose a challenge in many scenarios: though not much for UUCP, which can be perfectly well configured to have a 60-byte packet size and a window size of 1, which would wait for a remote ACK before proceeding.

Also, they operate over 433.4-473.0 MHz which appears to fall outside the license-free bands. It seems that many people using HC-12 are doing so illegally. With care, it would be possible to operate it under amateur radio rules, since this range is mostly within the 70cm allocation, but then it must follow amateur radio restrictions.

LoRa radios

LoRa is a set of standards for long range radios, which are advertised as having a range of 15km (9mi) or more in rural areas, and several km in cities.

LoRa can be done in several ways: the main LoRa protocol, and LoRaWAN. LoRaWAN expects to use an Internet gateway, which will tell each node what frequency to use, how much power to use, etc. LoRa is such that a commercial operator could set up roughly one LoRaWAN gateway per city due to the large coverage area, and some areas have good LoRa coverage due to just such operators. The difference between the two is roughly analogous to the difference between connecting two machines with an Ethernet crossover cable, and a connection over the Internet; LoRaWAN includes more protocol layers atop the basic LoRa. I have yet to learn much about LoRaWAN; I’ll follow up later on that point.

The speed of LoRa ranges from (and different people will say different things here) about 500bps to about 20000bps. LoRa is a packetized protocol, and the maximum packet size depends

LoRa sensors often advertise battery life in the months or years, and can be quite small. The protocol makes an excellent choice for sensors in remote or widely dispersed areas. LoRa transceiver boards for Arduino can be found for under $15 from places like Mouser.

I wound up purchasing two LoStik USB LoRa radios from Amazon. With some experimentation, with even very bad RF conditions (tiny antennas, one of them in the house, the other in a car), I was able to successfully decode LoRa packets from 2 miles away! And these aren’t even the most powerful transmitters available.

Talking UUCP over LoRa

In order to make this all work, I needed to write interface software; the LoRa radios don’t just transmit things straight out. So I wrote lorapipe. I have successfully transmitted files across this UUCP link!

Developing lorapipe was somewhat more challenging than I expected. For one, the LoRa modem raw protocol isn’t well-suited to rapid fire packet transmission; after receiving each packet, the modem exits receive mode and must be told to receive again. Collisions with protocols that ACKd data and had a receive window — which are many — were a problem so bad that it rendered some of the protocols unusable. I wound up adding a “expect more data after this packet” byte to every transmission, and have the receiver not transmit until it believes the sender is finished. This dramatically improved things. There’s more detail on this in my lorapipe documentation.

So far, I have successfully communicated over LoRa using UUCP, kermit, and YMODEM. KISS support will be coming next.

I am also hoping to discover the range I can get from this thing if I use more proper antennas (outdoor) and transmitters capable of transmitting with more power.

All in all, a fun project so far.

The Changelog

Comments on family, technology, and society