Have you ever been traveling, shot a ton of photos and videos, but were annoyed to find it was saturating the terrible wifi you had access to? Maybe you’d wish the upload to pause until you get somewhere else, but then pausing syncing on your Nextcloud/Syncthing/Dropbox would also pause other syncing you didn’t want to pause. Or you have trouble backing up your laptop when not at home, in a way that won’t accidentaly eat up your cell phone data.
There are ways to help with this: asynchronous transfer.
Here’s a lot of background. If you want to see how encrypted, onion-routed UUCP looks, skip ahead to the “NNCP” section!
There is an old saying: “When all you have is a hammer, every problem looks like a nail.” We have this wonderful tool called ssh available, and it is pervasive and well-understood, so we tend to use it. But we’ve missed out on some benefits of asynchronous processing that we actually used to have more frequently.
Of course, we are all used to some asynchronous services in our lives. Email is a popular example: most mail clients work offline and will transmit stored messages when the mail server becomes reachable. Mail servers themselves work that way, too. Many instant messaging platforms do as well.
Even some backup systems do. Bacula/Bareos, for instance, spools all backup data to disk on the system connected to the tape drive, and from there to the tape itself. They do this for several reasons, but primarily the fact that if tape drives are not fed with data at their design speed, it can cause physical damage to the tape or even the drive. It causes the drive to have to pause, and seek backwards to reposition for the next write. This creates excessive travel of the tape over the write heads, causing a condition known as “tape shine” where the tape is damaged prematurely.
Here are some problems people often run into when sending data across a network (or the Internet) synchronously:
- One side of the communication is much faster than the other
- Internet issues interrupting communications mid-stream
- Slow Internet causing processes to take much longer than planned, resulting in unexpected results or locking issues
- Physical damage due to performance issues
Of course, there are plenty of situations where synchronous communication is a must. For instance:
- When the status of the transaction at the remote end must be known immediately
- When there is insufficient space to spool a job’s data
I suspect that the reason we don’t do more asynchronous processing these days, despite it being strong in the Unix heritage, is the lack of modern tools to do it. Let’s explore some more.
Some of my use cases
I run ZFS on all my systems that support it: file server, laptops, workstations, etc. It is only natural to use ZFS send/receive to do backups, and I do. However, when I am traveling, my laptop never gets backed up, because the backups are pulled from the backup system. Sure, there are ways around that; a VPN, for instance. But then we have the situation where sometimes I do not want to send the backup even if I have a working Internet connection: perhaps I’m tethered to a mobile connection and it would be expensive to do so, or I’m on hotel Wifi that is flaky and slow and I don’t want to give up any of its meager bandwidth.
I have another backup-related problem. I have a remote server, which until recently was using extremely slow disks. If I made significant changes, the backup would take the better part of a day. That’s annoying when I try to back up hourly. So of course I had to implement locking, but then that means none of my other machines would back up that day either.
Once I needed to transmit about 2TB of data. My home Internet connection was terribly slow, and I calculated it would take multiple months to do this. So I took to manually copying parts of the data to my laptop, and whenever I’d find an airport or coffee shop with faster Internet than at home, I’d send off those bits from it. But it took a ton of work.
The bespoke asynchronous problem
And that “ton of work” is perhaps why we aren’t doing more of this. There’s been no great standard solution, so it’s all “roll your own” when you need to. So we just use ssh, because it’s easier and usually “good enough”. But as I wrote in my recent article on airgapped backups, there are reasons to go async.
Solutions
Wouldn’t it be great to be able to queue up data for a machine, and let it get there in whatever way it can? Maybe a fast Internet connection is found, or via Tor, or via copying to a USB stick, or via radio broadcast? It would make many of these scenarios a lot easier. And there are ways for this now, with modern security!
We have some tools on Linux for this: git-annex for storage and migration, syrep for synchronization, and NNCP for file transfer and remote execution (could be combined with some of these other tools). Let’s dive in to NNCP.
NNCP
If you already know UUCP, think of NNCP as UUCP brought into the modern era, with modern security and tools.
Basically, NNCP permits you to send files to a remote system, request files from a remote system, and pipe data to an NNCP command that requests execution remotely. So you could, say, pipe a zfs send to NNCP which sends it to the remote and pipes it to zfs receive when it gets there.
NNCP has a delay-tolerant, resumable protocol that can run over just about any reliable connection: TCP, serial, Tor, radios of various kinds, you name it. But that’s not all; it also can dump its queue onto something like a USB stick for transport, or even make a tar-style stream that could be munged however you like. If you want to get fancy, you can assign priorities to data packets, so that, for instance, outbound email will always get sent before that 1TB file you’ve got to send also. You can also configure it so that certain carriers handle certain priorities of data; your cell phone would only handle the most urgent, but a USB stick would take anything.
NNCP is source-routed; you can tell it that the way that Bob reaches Alice is via Carl, then Betty. Bob can generate a message that will be sent along that route, fully encrypted and authenticated at each step of the way; Carl can’t see the content of the message or even anything about it other than its next hop.
How this helps
Let’s revisit some of my scnearios with NNCP.
For the laptop being backed up, while traveling it can queue up its backups, or photos, or videos, or whatever. They could be triggered by a command when on a good connection, or automatically. The data could be copied to USB and given to a friend to transmit; perfectly safe due to encryption. Or it could all wait until arriving at home, safely out of your other syncing directories. The NNCP documentation has an example of this.
For the server being backed up slowly, that’s easily solved; the slow backup would simply be queued up, and transmitted and processed when it’s ready. This wouldn’t interrupt other backups.
How about the 2TB transmission problem? That’s also made a lot easier. A command could be run to fill up a USB stick with parts of the queue, then that USB stick plugged in and transmitted whenever at a fast location. Repeat as needed while the slow system continues its upload of the remaining bits.
NNCP has a lot of interesting use cases documented as well.
If you are already familiar with how public keys work in SSH, then NNCP should be immediately familiar as well. It is a similar concept (though arguably somewhat easier to set up).
I am working on setting up a NNCP network, and will have more posts on how to do so once I’ve got it going. In the meantime, the documentation for the project is also pretty good.
@jgoerzen This was the first time I heard of NNCP. Thanks, looks interesting.
RFP NNCP
@jtr @debian @kensanata So #NNCP at its core is different because it doesn’t require the source and destination to be online simultaneously. With TCP, the origin of a packet is responsible for ensuring it gets delivered, retransmitted if dropped, etc. With NNCP, you send data to the next hop and then that hop takes over responsibility. It may be days before that hop is able to deliver it. That hop may be a computer, or it may be a USB stick or a radio. 2/
NNCP
@jtr @debian @kensanata #NNCP can be used to send files and execute things remotely. Think of it like ssh/scp and authorized_keys. I can say: tar -cpf /usr | nncp-exec dest untar -C /backupswhere I defined untar on the dest node as something that runs “tar -xpf -” and then we add a -C telling it where to unpack.You could run a very similar command with ssh. Difference: dest doesn’t have to be online with NNCP. 3/
NNCP
@jtr @debian @kensanata In that nncp-exec example, the data piped to nncp-exec is saved, along with the command line to run, in a data packet that is encrypted using the public key of the destination. Now it can be transported, directly or indirectly (via other nodes), over the network, USB drives, CD-ROMs, tapes, radios, laptops, phones, a combination of these, whatever. It can traverse multiple NNCP hops along the way, and is onion-routed like tor 4/
@jtr @debian @kensanata So let’s make this practical. Say you have slow Internet at home but fast Internet at a coffee shop. You have 10GB to send out. On your desktop at home, you queue it up to go like this: desktop->laptop->remote_server. Your desktop encrypts the 10GB to remote_server, wrapping that in a packet encrypted to the laptop. The laptop decrypts the outer encryption when it gets the data, but still can’t see the actual data. Laptop sends it out when you get to coffee shop. 5/
@jtr @debian @kensanata I use #NNCP for backups. I take hourly snapshots with #ZFS. I have a backup system with no Internet. My backups are sent to a staging box, and then I can get those packets over to the backup box with the various methods I’ve mentioned. On my laptops, NNCP is configured to only automatically send packets to staging when they’re on the home LAN, because I may be tethered to 4G otherwise. But I can manually send them off when I’m on fast Internet away from home. 6/
NNCP
zfs
@jtr @debian @kensanata #NNCP also has an #asynchronous #multicast feature, which I use to sync my #orgmode #git repo as described here https://changelog.complete.org/archives/10274-distributed-asynchronous-git-syncing-with-nncp I also have more on NNCP here https://www.complete.org/nncp/ 7/
NNCP
asynchronous
git
multicast
orgmode
Distributed, Asynchronous Git Syncing with NNCP
@jtr @debian @kensanata So #NNCP can be used to transport #Usenet news and #email because fundamentally those things can be transported by piping data into the rnews and rmail commands, which fits perfectly with nncp-exec. In fact, a predecessor to NNCP, #UUCP, was the way email and news often flowed in the early days, and this support is mostly still there even in modern servers. It takes just a bit of tweaking to make it use NNCP instead. 8/
NNCP
email
usenet
uucp
@jtr @debian @kensanata So I hope this helps, feel free to ask followup questions π end/