All posts by John Goerzen

Facebook’s Blocking Decisions Are Deliberate – Including Their Censorship of Mastodon

In the aftermath of my report of Facebook censoring mentions of the open-source social network Mastodon, there was a lot of conversation about whether or not this was deliberate.

That conversation seemed to focus on whether a human speficially added joinmastodon.org to some sort of blacklist. But that’s not even relevant.

OF COURSE it was deliberate, because of how Facebook tunes its algorithm.

Facebook’s algorithm is tuned for Facebook’s profit. That means it’s tuned to maximize the time people spend on the site — engagement. In other words, it is tuned to keep your attention on Facebook.

Why do you think there is so much junk on Facebook? So much anti-vax, anti-science, conspiracy nonsense from the likes of Breitbart? It’s not because their algorithm is incapable of surfacing the good content; we already know it can because they temporarily pivoted it shortly after the last US election. They intentionally undid its efforts to make high-quality news sources more prominent — twice.

Facebook has said that certain anti-vax disinformation posts violate its policies. It has an extremely cumbersome way to report them, but it can be done and I have. These reports are met with either silence or a response claiming the content didn’t violate their guidelines.

So what algorithm is it that allows Breitbart to not just be seen but to thrive on the platform, lets anti-vax disinformation survive even a human review, while banning mentions of Mastodon?

One that is working exactly as intended.

We may think this algorithm is busted. Clearly, Facebook does not. If their goal is to maximize profit by maximizing engagement, the algorithm is working exactly as designed.

I don’t know if joinmastodon.org was specifically blacklisted by a human. Nor is it relevant.

Facebook’s choice to tolerate and promote the things that service its greed for engagement and money, even if they are the lowest dregs of the web, is deliberate. It is no accident that Breitbart does better than Mastodon on Facebook. After all, which of these does its algorithm detect keep people engaged on Facebook itself more?

Facebook removes the ban

You can see all the screenshots of the censorship in my original post. Now, Facebook has reversed course:

We also don’t know if this reversal was human or algorithmic, but that still is beside the point.

The point is, Facebook intentionally chooses to surface and promote those things that drive engagement, regardless of quality.

Clearly many have wondered if tens of thousands of people have died unnecessary deaths over COVID as a result. One whistleblower says “I have blood on my hands” and President Biden said “they’re killing people” before “walking back his comments slightly”. I’m not equipped to verify those statements. But what do they think is going to happen if they prioritize engagement over quality? Rainbows and happiness?

Facebook Is Censoring People For Mentioning Open-Source Social Network Mastodon

Update: Facebook has reversed itself over this censorship, but I maintain that whether the censorship was algorithmic or human, it was intentional either way. Details in my new post.

Last November, I made a brief post to Facebook about Mastodon. Mastodon is an open-source and open social network, which is decentralized and all about user control instead of corporate control. I’ve blogged about Mastodon and the dangers of Facebook before, but rarely mentioned Mastodon on Facebook itself.

Today, I received this notice that Facebook had censored my post about Mastodon:

Facebook censoring a post

Wonder with me for a second what this one-off post I composed myself might have done to trip Facebook’s filter…. and it is probably obvious that what tripped the filter was the mention of an open source competitor, even though Facebook is much more enormous than Mastodon. I have been a member of Facebook for many years, and this is the one and only time anything like that has happened.

Why they decided today to take down that post – I have no idea.

In case you wondered about their sincerity towards stamping out misinformation — which, on the rare occasions they do something about, they “deprioritize” rather than remove as they did here — this probably answers your question. Or, are they sincere about thinking they’re such a force for good by “connecting the world’s people?” Well, only so long as the world’s people don’t say nice things about alternatives to Facebook, I guess.

“Well,” you might be wondering, “Why not appeal, since they obviously made a mistake?” Because, of course, you can’t:

Indeed I did tick a box that said I disagreed, but there was no place to ask why or to question their action.

So what would cause a non-controversial post from a long-time Facebook member that has never had anything like this happen, to disappear?

Greed. Also fear.

Maybe I’d feel sorry for them if they weren’t acting like a bully.

Edit: There are reports from several others on Mastodon of the same happening this week. I am trying to gather more information. It sounds like it may be happening on Twitter as well.

Edit 2: And here are some other reports from both Facebook and Twitter. Definitely not just me.

Edit 3: While trying to reply to someone on Facebook, that was trying to defend Facebook, I mentioned joinmastodon.org and got this:

Anyone else seeing it?

Edit 4: It is far more than just me, clearly. More reports are out there; for instance, this one and that one.

Excellent Experience with Debian Bullseye

I’ve appreciated the bullseye upgrade, like most Debian upgrades. I’m not quite sure how, since I was already running a backports kernel, but somehow the entire system is snappier. Maybe newer X or something? I’m really pleased with it. Hardware integration is even nicer now, particularly the automatic driverless support for scanners in addition to the existing support for printers.

All in all, a very nice upgrade, and pretty painless.

I experienced a few odd situations.

For one, I had been using Gnome Flashback. Since xmonad-log-applet didn’t compile there (due to bitrot in the log applet, not flashback), and I had been finding Gnome Flashback to be a rather dusty and forgotten corner of Gnome for a long time, I decided to try Mate.

Mate just seemed utterly unable to handle a situation with a laptop and an external monitor very well. I want to use only the external monitor with the laptop lid is closed, and it just couldn’t remember how to do the right thing – external monitor on, laptop monitor off, laptop not put into suspend. gdm3 also didn’t seem to be able to put the external monitor to sleep, either, causing a few nights of wasted power.

So off I went to XFCE, which I had been using for years on my workstation anyhow. Lots more settings available in XFCE, plus things Just Worked there. Odd that XFCE, the thin and light DE, is now the one that has the most relevant settings. It seems the Gnome “let’s remove a bunch of features” approach has extended to MATE as well.

When I switched to XFCE, I also removed gdm3 from my system, leaving lightdm as the only DM on it. That matched what my desktop machine was using, and also what task-xfce-desktop called for. But strangely, the XFCE settings for lightdm were completely different between the laptop and the desktop. It turns out that with lightdm, you can have the lightdm-gtk-greeter and the accompanying lightdm-gtk-greeter-settings, or slick-greeter and the accompanying lightdm-settings. One machine had one greeter and settings, and the other had the other. Why, I don’t know. But lightdm-gtk-greeter-settings had the necessary options for putting monitors to sleep on the login screen, so I went with it.

This does highlight a bit of a weakness in Debian upgrades. There is SO MUCH choice in Debian, which I highly value. At some point, almost certainly without my conscious choice, one machine got one greeter and another got the other. Despite both having task-xfce-desktop installed, they got different desktop experiences. There isn’t a great way to say “OK, I know I had a bunch of things installed before, but NOW I want the default bullseye experience”.

But overall, it is an absolutely fantastic distribution. It is great to see this nonprofit community distribution continue to have such quality on such an immense scale. And hard to believe I’ve been a Debian developer for 25 years. That seems almost impossible!

Distributed, Asynchronous Git Syncing with NNCP

I have a problem.

I have a directory that I use with org-mode and org-roam. I want it to be synced across multiple machines. I also want to keep the history with git. And, I want to use end-to-end encryption (no storing a plain git repo on a remote server), have a serverless setup, not require any two machines to be up simultaneously, and be resilient in the face of races and conflicts.

Whew.

I’ve tried a number of setups – git-remote-gcrypt on a remote server (fragile), some complicated scripts around a separate repo in syncthing (requires one machine to be “in charge”), etc. They all were subpar.

Then NNCP introdoced asynchronous multicast and I was intrigued.

So, I wrote gitsync-nncp, which uses NNCP to distribute git bundles to all the participating machines. The comprehensive documentation for gitsync-nncp goes into a lot more detail about how it works and what problems it solves. It’s working quite well for me!

Roundup of Unique Data/Storage Hosting Options

Recently I have been taking another look at the services at rsync.net and it got me thinking: what would I do with a lot of storage? What might I want to run with it, if it were fairly cheap?

  • Backups are an obvious place to start. Borgbackup makes a pretty compelling option: very bandwidth-efficient thanks to block-level rolling hash dedup, encryption fully on the client side, etc. Borg can run over ssh, though does need a server-side program.
  • Nextcloud is another option. With Google Photos getting quite expensive now, if you could have a TB of storage that you control, what might you do with it? Nextcloud also includes IM, video chat, and online document editing similar to Google Docs.
  • I’ve written before about the really neat properties of Syncthing: distributed synchronization that needs no server component. It also supports untrusted nodes in the mesh, where all content is encrypted before it reaches them. Sometimes an intermediary node is useful; for instance, if nodes A and C are to sync but are rarely online at the same time, an untrusted node B that is always online can facilitate synchronization. A server with some space could help with this.
  • A relay for NNCP or UUCP.
  • More broadly, you could self-host your photo or video cllection.

Let’s start taking a look at what’s out there. I’m going to try to focus on things that are unique for some reason: pricing, features, etc. Incidentally, good reviews are hard to find due to the proliferation of affiliate links. I have no affiliate relationships with anyone mentioned here and there are no affiliate links in this post.

I’ll start with the highest-end community and commercial options (though both are quite competitive on price for what they are), and then move on to the cheaper options.

Community option: SDF

SDF is somewhat hard to define. “What is SDF?” could prompt answers like:

  • A community-run network offering free Unix shells to the public
  • A diverse community of people that connect with unique tools. A social network in the 80s sense, sort of.
  • A provider of… let me see… VPN, DSL, and even dialup access.
  • An organization that runs various Open Source social network services, including Mastodon, Pixelfed (image sharing), PeerTube (video sharing), WordPress, even Minecraft.
  • A provider of various services for a nominal charge: $3/mo gets you access to the MetaArray with 800GB of storage space which you have shell access to, and can store stuff on with Nextcloud, host public webpages, etc.
  • Thriving communities around amateur radio, musicians, Plan 9, and even – brace yourself – TOPS-20, a DEC operating system first released in 1976 and not updated since 1988.
  • There’s even a Wikipedia article about SDF.

There’s a lot there. SDF lets you use things for yourself, of course, but you can also join a community. It’s not a commercial service backed by SLAs — it’s best-effort — but it’s been around more than 30 years and has a great track record.

Top commercial option for backup storage: rsync.net

rsync.net offers storage broadly over SSH: sftp, rsync, scp, borg, rclone, restic, git-annex, git, and such. You do not get a shell, but you do get to run a few noninteractive commands via ssh. You can, for instance, run git clone on the rsync server.

The rsync special sauce is in ZFS. They run raidz3 on their arrays (and also offer dual location setups for an additional fee), offer both free and paid ZFS snapshots, etc. The service is designed to be extremely reliable, particularly for backups, and it seems to me to meet those goals.

Basic storage is $0.025 per GB/mo, but with certain account types such as borg, can be had for $0.015 per GB/mo. The minimum size is 400GB or $10/mo. There are no bandwidth charges. This makes it quite economical even compared to, say, S3. Additional discounts start at 10TB, so 10TB with rsync.net would cost $204.80/mo or $81.92 on the borg plan.

You won’t run Nextcloud on this thing, but for backups that must be reliable, or even a photo collection or something, it makes perfect sense.

When you look into other options, you’ll find that other providers are a lot more vague about their storage setup than rsync.net.

Various offerings from Hetzner

Hetzner is one of Europe’s large hosting companies, and they have several options of interest.

Their Storage Box competes directly with the rsync.net service. Their per-GB storage cost is lower than rsync.net, and although they do include a certain amount of free bandwidth with each account, bandwidth is not unlimited and could result in charges. Still, if you don’t drive 2x or more your storage usage in bandwidth each month, it would be cheaper than rsync. The Storage Box also uses ZFS with some kind of redundancy, though they don’t specifcy details.

What differentiates them from rsync.net is the protocol support. They support sftp, scp, Borg, ssh, rsync, etc. just as rsync.net does. But then they also throw in Samba/CIFS, FTPS, HTTPS, and WebDAV – all optionally enabled or disabled by you. Although things like sshfs exist, they aren’t particularly optimal for some use cases, and CIFS support may just be what you need in some situations.

10TB with Hetzner would cost EUR 39.90/mo, or about $48.84/mo. (This figure is higher for Europeans, who also have to pay VAT.)

Hetzner also offers a Storage Share, which is a private Nextcloud instance. 10TB of that is exactly the same cost as 10TB of the Storage Box. You can add your own users, groups, etc. to this as your are the Nextcloud admin of your instance. Hetzner throws in automatic updates (which is great, as updates have been a pain in my side for a long time). Nextcloud is ideal for things like photo sharing, even has email and chat built in, etc. For about the same price at 2TB of Google One, you can have 2TB of Nextcloud with all those services for yourself. Not bad. You can also mount a Nextcloud instance with WebDAV.

Interestingly, Nextcloud supports “external storages” as backend for the data. It supports another Nextcloud instance, OpenStack or S3 object storage, and SFTP, SMB/CIFS, and WebDAV. If you’re thinking you’d like both SFTP and Nextcloud access to a pool of storage, I imagine you could always get a large Storage Box from Hetzner (internal transfer is free), pair it with a small Nextcloud instance, and link the two with Nextcloud external storage.

Dedicated Servers

If you want a more DIY approach, you can find some interesting deals on actual dedicated server hardware – you get the entire machine to yourself. I’ve been using OVH’s SoYouStart for a number of years, with good experienaces, and they have a number of server configurations available. For instance, for $45.99, you can get a Xeon box with 4x2TB drives and 32GB RAM. With RAID5 or raidz1, that’s 6TB of available space – and cheaper than the 6TB from rsync.net (though less redundant) plus you get the whole box to yourself too. OVH directly has some more storage servers; for instance, you can get a box with 4x4TB + 1x500GB SSD for $86.75/mo, giving you 12TB available with RAID5/raidz1, plus a 16GB server to do what you want with.

Hetzner also has some larger options available, for instance 2x4TB at EUR39 or 2x8TB at EUR54, both with 64GB of RAM.

Bargain Corner

Yes, you can find 10TB for $25/mo. It’s hosted on ceph, by what appears to be mostly a single person (though with a lot of experience and a fair bit of transparency). You’re not going to have the round-the-clock support experience as with rsync.net, nor its raidz3 level of redundancy – but if you don’t need that, there are quite a few options.

Let’s start with Lima Labs. Yes, 10TB is $25/mo, and they support sftp, rsync, borg, and even NFS mounts on storage backed by Ceph. The owner, Sam, seems to be a nice guy but the service isn’t going to be on the scale of rsync.net or Hetzner. That may or may not be OK for your needs – I mean, you can even get 1TB for $5/mo, so there are some fantastic deals to be had here.

BorgBase does Borg hosting and borg hosting only. You can get 1TB for $6.67/mo or, for instance, 10TB for $53.46. They don’t say much about their infrastructure and it’s hard to get a read on the company, but for Borg backups, it could be a nice option.

Bargain Corner Part 2: Seedboxes

There’s a market out there of companies offering BitTorrent seeding and downloading services. Typically, these services offer you Unix ssh access to a shell, give you a bunch of space on completely non-redundant drives (theory being that the data on them is transient), lots of bandwidth, for a low price. Some people use them for BitTorrent, others for media serving and such.

If you are willing to take the lowest in drive redundancy, there are some deals to be had. Whatbox is a popular leader here, and has an extensive wiki with info. Or you can find some seedbox.io “shared storage” plans – for instance, 12TB for $32.49/mo. But it’s completely non-redundant drives.

Seedbox has a partner company, Walker Servers, with some interesting deals; for instance, 4x8TB for EUR 52.45. Not bad for 24TB usable with RAID5 – but Walker Servers is completely unknown to me and doesn’t publish a phone number. So, YMMV.

Conclusion

I’m sure I’ve left out many quality options here, but hopefully this is enough to lay out a general lay of the land. Leave other suggestions in the comments.

Recovering Our Lost Free Will Online: Tools and Techniques That Are Available Now

As I’ve been thinking and writing about privacy and decentralization lately, I had a conversation with a colleague this week, and he commented about how loss of privacy is related to loss of agency: that is, loss of our ability to make our own choices, pursue our own interests, and be master of our own attention.

In terms of telecommunications, we have never really been free, though in terms of Internet and its predecessors, there have been times where we had a lot more choice. Many are too young to remember this, and for others, that era is a distant memory.

The irony is that our present moment is one of enormous consolidation of power, and yet also one of a proliferation of technologies that let us wrest back some of that power. In this post, I hope to enlighten or remind us of some of the choices we have lost — and also talk about the ways in which we can choose to regain them, already, right now.

I will talk about the possibilities, the big dreams that are possible now, and then go into more detail about the solutions.

The Problems & Possibilities

The limitations of “online”

We make the assumption that we must be “online” to exchange data. This is reinforced by many “modern” protocols; Twitter clients, for instance, don’t tend to let you make posts by relaying them through disconnected devices.

What would it be like if you could fully participate in global communities without a constant Internet connection? If you could share photos with your friends, read the news, read your email, etc. even if you don’t have a connection at present? Even if the device you use to do that never has a connection, but can route messages via other devices that do?

Would it surprise you to learn that this was once the case? Back in the days of UUCP, much email and Usenet news — a global discussion forum that didn’t require an Internet connection — was relayed via occasional calls over phone lines. This technology remains with us, and has even improved.

Sadly, many modern protocols make no effort in this regard. Some email clients will let you compose messages offline to send when you get online later, but the assumption always is that you will be connected to an IP network again soon.

NNCP, on the other hand, lets you relay messages over TCP, a radio, a satellite, or a USB stick. Email and Usenet, since they were designed in an era where store-and-forward was valued, can actually still be used in an entirely “offline” fashion (without ever touching an IP-based network). All it takes is for someone to care to make it happen. You can even still do it over UUCP if you like.

The physical and data link layers

Many of us just accept that we communicate in a few ways: Wifi for short distances, and then cable modems or DSL for our local Internet connection, and then many people are fuzzy about what happens after that. Or, alternatively, we have 4G phones that are the local Internet connection, and the same “fuzzy” things happen after.

Think about this for a moment. Which of these do you control in any way? Sometimes just wifi, sometimes maybe you have choices of local Internet providers. After that, your traffic is handled by enormous infrastructure companies.

There is choice here.

People in ham radio have been communicating digitally over long distances without the support of the traditional Internet for decades, but the technology to do this is now more accessible to anyone. Long-distance radio has had tremendous innovation in the last decade; cheap radios can now communicate over several miles/km without any other infrastructure at all. We all carry around radios (Wifi and Bluetooth) in our pockets that don’t have to be used as mere access points to the Internet or as drivers of headphones, but can also form their own networks directly (Briar).

Meshtastic is an example; it’s an instant messenger that can form a mesh over many miles/km and requires no IP infrastructure at all. Briar is similar. XBee radios form a mesh in hardware, allowing peers to reach each other (also over many miles/km) with a serial or framed protocol.

Loss of peer-to-peer

Back in the late 90s, I worked at a university. I had a 386 on my desk for a workstation – not a powerful computer even then. But I put the boa webserver on it and could just serve pages on the Internet. I didn’t have to get permission. Didn’t have to pay a hosting provider. I could just DO it.

And of course that is because the university had no firewall and no NAT. Every PC at the university was a full participant on the Internet as much as the servers at Microsoft or DEC. All I needed was a DNS entry. I could run my own SMTP server if I wanted, run a web or Gopher server, and that was that.

There are many reasons why this changed. Nowadays most residential ISPs will block SMTP for their customers, and if they didn’t, others would; large email providers have decided not to federate with IPs in residential address spaces. Most people have difficulty even getting a static IP address in the first place. Many are behind firewalls, NATs, or both, meaning that incoming connections of any kind are problematic.

Do you see what that means? It has weakened the whole point of the Internet being a network of peers. While IP still acts that way, as a practical matter, there are clients that are prevented from being servers by administrative policy they have no control over.

Imagine if you, a person with an Internet connection to your laptop or phone, could just decide to host a website, or a forum on it. For moderate levels of load, they are certainly capable of this. The only thing in the way is the network management policies you can’t control.

Elaborate technologies exist to try to bridge this divide, and some, like Tor or cjdns, can work quite well. More on this below.

Expense of running something popular

Related to the loss of peer-to-peer infrastructure is the very high cost of hosting something popular. Do you want to share videos with lots of people? That almost certainly is going to require expensive equipment and bandwidth.

There is a reason that there are only a small handful of popular video streaming sites online. It requires a ton of money to host videos at scale.

What if it didn’t? What if you could achieve economies of scale so much that you, an individual, could compete with the likes of YouTube? You wouldn’t necessarily have to run ads to support the service. You wouldn’t have to have billions of dollars or billions of viewers just to make it work.

This technology exists right now. Of course many of you are aware of how Bittorrent leverages the swarm for files. But projects like IPFS, Dat, and Peertube have taken this many steps further to integrate it into a global ecosystem. And, at least in the case of Peertube, this is a thing that works right now in any browser already!

Application-level “walled gardens”

I was recently startled at how much excitement there was when Github introduced “dark mode”. Yes, Github now offers two colors on its interface. Already back in the 80s and 90s, many DOS programs had more options than that.

Git is a decentralized protocol, but Github has managed to make it centralized.

Email is a decentralized protocol — pick your own provider, and they all communicate — but Facebook and Twitter aren’t. You can’t just pick your provider for Facebook. It’s Facebook or nothing.

There is a profit motive in locking others out; these networks want to keep you using their platforms because their real customers are advertisers, and they want to keep showing you ads.

Is it possible to have a world where you get to pick your own app for sharing photos, and it works even if your parents use a different one? Yes, yes it is.

Mastodon and the Fediverse are fantastic examples for social media. Pixelfed is specifically designed for photos, Mastodon for short-form communication, there’s Pleroma for more long-form communication, and they all work together. You can use Mastodon to read Pleroma content or look at Pixelfed photos, and there are many (free) providers of each.

Freedom from manipulation

I recently wrote about the dangers of the attention economy, so I won’t go into a lot of detail here. Fundamentally, you are not the customer of Facebook or Google; advertisers are. They optimize their site to keep you on it as much as possible so that they can show you as many ads as possible which makes them as much money as possible. Ads, of course, are fundamentally seeking to manipulate your behavior (“buy this product”).

By lowering the cost of running services, we can give a huge boost to hobbyists and nonprofits that want to do so without an ultimate profit motive. For-profit companies benefit also, with a dramatically reduced cost structure that frees them to pursue their mission instead of so many ads.

Freedom from snooping (privacy and anonymity)

These days, it’s not just government snooping that people think about. It’s data stolen by malware, spies at corporations (whether human or algorithmic), and even things like basic privacy of one’s own security footage. Here the picture is improving; encryption in transit, at least at a basic level, has become much more common with TLS being a standard these days. Sadly, end-to-end encryption (E2EE) is not nearly as much, perhaps because corporations have a profit motive to have access to your plaintext and metadata.

Closely related to privacy is anonymity: that is, being able to do things in an anonymous fashion. The two are not necessarily equal: you could send an encrypted message but reveal who the correspondents are, as with email; or, you could send a plaintext message over a Tor exit node that hides who the correspondents are. It is sometimes difficult to achieve both.

Nevertheless, numerous answers exist here that tackle one or both problems, from the Signal messenger to Tor.

Solutions That Exist Today

Let’s dive in to some of the things that exist today.

One concept you’ll see in many of these is integrated encryption with public keys used for addressing. In other words, your public key is akin to an IP address (and in some cases, is literally your IP address.)

Data link and networking technologies (some including P2P)

  • Starting with the low-power and long-distance technologies, I’ve written quite a bit about LoRA, which are low-power long-distance radios. They can easily achieve several miles/km while still using much less than 1W of power. LoRA is a common building block of mesh off-the-grid messenger systems such as meshtastic, which forms an ad-hoc mesh of LoRA devices with days-long battery life and miles-long communication abilities. LoRA trades speed for bandwidth; in its longest-distance modes, it may operate at 300bps or less. That is not a typo. Some LoRAWAN devices have battery life measured in years (usually one-way sensors and such). Also, the Pine64 folks are working to integrate LoRA on nearly all their product line, which includes single-board computers, phones, and laptops.
  • Similar to LoRA is XBee SX from Digi. While not quite as long-distance as LoRA, it does still do quite a bit with low power and also goes many miles. XBee modules have automatic mesh routing in firmware, and can be used in either frame mode or “serial cable emulation” mode in which they act as if they’re a serial cable. Unlike plain LoRA, XBee radios do hardware retransmit. They also run faster, at up to about 150Kbps – though that is still a lot slower than wifi.
  • I’ve written about secure mesh messengers recently. One of them, Briar, particularly stands out in that it is able to form an ad-hoc mesh using phone’s Bluetooth radios. It can also route messages over the public Internet, which it does exclusively using Tor.
  • I’ve also written a lot about NNCP, the sort of modernized UUCP. NNCP is completely different than the others here in that it is a store-and-forward network – sort of a modern UUCP. NNCP has easy built-in support for routing packets using USB drives, clean serial interfaces, TCP, basically anything you can pipe to, even broadcast satellite and such. And you don’t even have to pick one; you can use all of the above: Internet when it’s available, USB sticks or portable hard drives when not, etc. It uses Tor-line onion routing with E2EE. You’re not going to run TCP over NNCP, but files (including videos), backups, email, even remote execution are all possible. It is the most “Unixy” of the modern delay-tolerant networks and makes an excellent choice for a number of use cases where store-and-forward and extreme flexibility in transportation make a lot of sense.
  • Moving now into the range of speeds and technologies we’re more used to, there is a lot of material out there on building mesh networks on Wifi or Wifi-adjacent technology. Amateur radio operators have been active in this area for years, and even if you aren’t a licensed ham and don’t necessarily flash amateur radio firmware onto your access points, a lot of the ideas and concepts they cover could be of interest. For instance, the Amateur Radio Emergency Data Network covers both permanent and ad-hoc meshs, and this AREDN video covers device selection for AREDN — which also happens to be devices that would be useful for quite a few other mesh or long-distance point-to-point setups.
  • Once you have a physical link of some sort, cjdns and the Hyperboria network have the goals of literally replacing the Internet – but are fully functional immediately. cjdns assigns each node an IPv6 address based on its public key. The network uses DHT for routing between nodes. It can run directly atop Ethernet (and Wifi) as its own native protocol, without an IP stack underneath. It can also run as a layer atop the current Internet. And it can optionally be configured to let nodes find an exit node to reach the current public Internet, which they can do opportunistically if given permission. All traffic is E2EE. One can run an isolated network, or join the global Hyperboria network. The idea is that local meshes could be formed, and then geographically distant meshes can be linked together by simply using the current public Internet as a dumb transport. This, actually, strongly resembles the early days of Internet buildout under NSFNet. The Torento Mesh is a prominent user of cjdns, and they publish quite a bit of information online. cjdns as a standalone identity is in decline, but forms the basis of the pkt network, which is designed to foster an explosion in WISPs.
  • Similar in concept to cjdns is Yggdrasil, which uses a different routing algorithm. It is now more active than cjdns and has active participants and developers.
  • Althea is a startup in this space, hoping to encourage communities to build meshes whose purpose is to provide various routes to access to the traditional Internet, including digital currency micropayments. This story documents how one rural community is using it.
  • Tor is a somewhat interesting case. While it doesn’t provide kernel-level routing, it does provide a SOCKS5 proxy. Traditionally, Tor is used to achieve anonymity while browsing the public Internet via an exit node. However, you can stay entirely in-network by using onion services (basically ports that are open to Tor). All Tor traffic is onion-routed so that the originating IP cannot be discovered. Data within Tor is E2EE, though if you are using an exit node to the public Internet, that of course can’t apply there.
  • GNUnet is a large suite of tools for P2P communication. It includes file downloading, Tor-like IP over the network, a DNS replacement, and facilitates quite a few of the goals discussed here. (Added in a 2021-02-22 update)

P2P Infrastructure

While some of the technologies above, such as cjdns, explicitly facitilitate peer-to-peer communication, there are some other application-level technologies to look at.

  • IPFS has been having a lot of buzz lately, since the Brave browser integrated support. IPFS headlines as “powers the distributed web”, but it is actually more than that; various other apps layer atop it. The core idea is that content you request gets reshared by your node for some period of time, somewhat akin to Bittorrent. IPFS runs atop the regular Internet and is typically accessed through an app.
  • The Dat Protocol is somewhat similar in concept to IPFS, though the approach is somewhat different; it emphasizes efficient distribution of updates at the expense of requiring a git-like history.
  • IPFS itself is based on libp2p, which is designed to be a generic infrastructure for adding P2P capabilities to your own code. It is probably fair to say libp2p is still quite complex compared to ordinary TCP, and the language support is in its infancy, but nevertheless it is quite an exciting development to watch.
  • Of course almost all of us are familiar with Bittorrent, the software that first popularized the idea of a distributed mesh sharing knowledge about which chunks of a dataset they have in order to maximize the efficiency of distributing the whole thing. Bittorrent is still in wide use (and, despite its reputation, that wide use includes legitimate users such as archive.org and Debian).
  • I recently wrote about building a delay-tolerant offline-capable mesh with Syncthing. Syncthing, on its surface, is something like an open source Dropbox. But look into a bit and you realize it’s fully P2P, serverless, can support various network topologies including intermittent connectivity between network parts, and such. My article dives into that in more detail. If your needs are mostly related to files, Syncthing can make a fine mesh infrastructure that is auto-healing and is equally at home on the public Internet, a local wifi access point with no Internet at all, a private mesh like cjdns, etc.
  • Also showing some promise is Secure Scuttlebutt (SSB). Its most well-known application is a social network, but in my opinion some of the other applications atop SSB are more interesting. SSB is designed to be offline-friendly, can do things like automatically exchange data with peers on the same Wifi (eg, a coffee shop), etc., though it is an append-only log that can be unwieldy on mobile sometimes.

Instant Messengers and Chat

I won’t go into a lot of detail here since I recently wrote a roundup of secure mesh messengers and also a followup article about Signal and some hidden drawbacks of P2P. Please refer to those articles for some interesting things that are happening in this space.

Matrix is a distributed IM platform similar in concept to Slack or IRC, but globally distributed in a mesh. It supports optional E2EE.

Social Media

I wrote recently about how to join the Fediverse, which covered joining Mastodon, a federeated, decentralized social network. Mastodon is the largest of these, with several million users, and is something of a much nicer version of Twitter.

Mastodon is also part of what is known as the “Fediverse”, which are applications that are loosely joined together by their support of the ActivityPub protocol. Other popular Fediverse applications include Pixelfed (similar to Instagram) and Peertube for sharing video. Peertube is particularly interesting in that it supports Webtorrent for efficiently distributing popular videos. Webtorrent is akin to Bittorrent running efficiently inside your browser.

Concluding Remarks

Part of my goal with this is encouraging people to dream big, to ask questions like:

What could you do if offline were easy?

What is possible if you have freedom in the physical and data link layers? Dream big.

We’re so used to thinking that it’s quite difficult for two devices on the Internet to talk to each other. What would be possible if this were actually quite easy?

The assumption that costs rise dramatically as popularity increases is also baked into our thought processes. What if that weren’t the case — could you take on Youtube from your garage? Would lowering barriers to entry lower the ad economy and let nonprofits have more equal footing with large corporations?

We have so many walled gardens, from Github to Facebook, that we almost forget it doesn’t have to be that way.

So having asked these questions, my secondary point is to suggest that these aren’t pie-in-the-sky notions. These possibilites are with us right now.

You’ll notice from this list that virtually every one of these technologies is ad-free at its heart (though some would be capable of serving ads). They give you back your attention. Many preserve privacy, anonymity, or both. Many dramatically improve your freedom of association and communication. Technologies like IPFS and Bittorrent ease the burden of running something popular.

Some are quite easy to use (Mastodon or Peertube) while others are much more complex (libp2p or the lower-level mesh network systems).

Clearly there is still room for improvement in many areas.

But my fundamental point is this: good technology is here, right now. Technical people can vote with their feet and wallets and start using it. Early adopters will help guide the way for the next set of improvements. Join us!

A Simple, Delay-Tolerant, Offline-Capable Mesh Network with Syncthing (+ optional NNCP)

A little while back, I spent a week in a remote area. It had no Internet and no cell phone coverage. Sometimes, I would drive in to town where there was a signal to get messages, upload photos, and so forth. I had to take several devices with me: my phone, my wife’s, maybe a laptop or a tablet too. It seemed there should have been a better way. And there is.

I’ll use this example to talk about a mesh network, but it could just as well apply to people wanting to communicate on a 12-hour flight that has no in-flight wifi, or spacecraft with an intermittent connection, or a person traveling.

Syncthing makes a wonderful solution for things like these. Here are some interesting things about Syncthing:

  • You can think of Syncthing as a serverless, peer-to-peer, open source alternative to Dropbox. Machines sync directly with each other without a server, though you can add a server if you want.
  • It can operate completely without Internet access or any central server, though if Internet access is available, it can readily be used.
  • Syncthing devices connected to the same LAN or Wifi will detect each other’s presence and automatically communicate.
  • Syncthing is capable of handling a constantly-changing topology. It can also, for instance, handle two disconnected clusters of nodes with one node that “travels” between them — perhaps just a phone.
  • Syncthing scales from everything from a phone to thousands of nodes.
  • Syncthing normally performs syncs in every direction, but can also do single-direction syncs
  • An individual Syncthing node can register its interest or disinterest in certain files or directories based on filename patterns

Syncthing works by having you define devices and folders. You can choose which devices to share folders with. A shared folder has an ID that is unique across Sycnthing. You can share a folder from device A to device B, and then device B can share it with device C, even if A and C don’t know about each other or have no way to communicate. More commonly, though, all the devices would know about each other and will opportunistically communicate the best way they can.

Syncthing uses something akin to a Bittorrent protocol. Say you’re syncing videos from your phone, and they’re going to 3 machines. It doesn’t mean that Syncthing has to send it three times from the phone. Syncthing will send each block, most likely, just once; the other nodes in the swarm will register the block availability from the first other node to get it and will exchange blocks with themselves.

Syncthing will typically look for devices on the local LAN. Failing that, it will use an introduction server to see if it can reach them directly using P2P. Failing that, perhaps due to restrictive firewalls or NAT, communication can be relayed through volunteer-run Syncthing servers on the Internet. All Syncthing communications are cryptographically encrypted and verified. You can also configure Syncthing arbitrarily; for instance, to run over ssh or Tor tunnels.

So, let’s look at how Syncthing might help with the example I laid out up front.

All the devices at the remote location could communicate with each other. The Android app is quite capable of syncing photos and videos using Syncthing, for instance. Then one device could be taken to the Internet location and it would transmit data on behalf of all the others – perhaps back to a computer at your home, or to a server somewhere. Perhaps a script running on the remote server would then move files out of the syncthing synced folder into permanent storage elsewhere, triggering a deletion to be sent to the phone to free up storage. When the phone gets back to the other devices, the deletion can be propagated to them to free up storage there too.

Or maybe you have a computer out in a shed or somewhere without Internet access that you go to periodically, and need to get files to it. Again, your phone could be a carrier.

Taking it a step further

If you envision a file as a packet, you could, conceivably, do something like tunnel TCP/IP over Syncthing, assuming generous-enough timeouts. It can truly handle communication.

But you don’t need TCP/IP for this. Consider some other things you could do:

  • Drop a script in a special directory that gets picked up by a remote server and run
  • Drop emails in a special directory that get transmitted and then deleted by a remote system when they’re seen
  • Drop files (eg, photos or videos) in a directory that a remote system will copy or move out of there
  • Drop messages (perhaps gpg-encrypted) — which could be text files — for someone to see and process.
  • Drop NNTP bundles for group communication

You can start to see how there are a lot of possibilities here that extend beyond just file synchronization, though they are built upon a file synchronization tool.

Enter NNCP

Let’s look at a tool that’s especially suited for this: NNCP, which I’ve been writing about a lot lately.

NNCP is designed to handle file exchange and remote execution with remote computers in an asynchronous, store-and-forward manner. NNCP packets are themselves encrypted and authenticated. NNCP traditionally is source-routed (that is, you configure it so that machine A reaches machine D by relaying through B and C), and the packets are onion-routed. NNCP packets can be exchanged by a TCP call, a tar-like stream, copying files to something like a USB stick and physically transporting it to the remote, etc.

This works really well and I’ve been using it myself. But it gets complicated if the network topology isn’t fixed; it is difficult to reroute packets due to the onion routing, for instance. There are various workarounds that could be used — but why not just use Syncthing as a transport in those cases?

nncp-xfer is the command that exchanges packets by writing them to, and reading them from, a directory. It is what you’d use to exchange packets on a USB stick. And what you’d use to exchange packets via Syncthing. It writes packets in a RECIPIENT/SENDER/PACKET directory structure, so it is perfectly fine to have multiple systems exchanging packets in a single Syncthing synced folder tree. This structure also allows leaf nodes to only carry the particular packets they’re interested in. The packets are all encrypted, so they can be freely synced wherever.

Since Syncthing opportunistically syncs a shared folder with any device the folder is shared with, a phone could very easily be the NNCP transport, even if it has no idea what NNCP is. It could carry NNCP packets back and forth between sites, or to the Internet, or whatever.

NNCP supports file transmission, file request, and remote execution, all subject to controls, of course. It is easy to integrate with Exim or Postfix to use as a mail transport, Git transport, and so forth. I use it for backups. It would be quite easy to have it send those backups (encrypted zfs send) via nncp-xfer to Syncthing instead of the usual method, and then if I’ve shared the Syncthing folder with my phone, all I need to do is bring the phone into Internet range and they get sent. nncp-xfer will normally remove the packets out of the xfer directory as it ingests them, so the space will only be consumed on the phone (and laptop) until we know the packets made it to their destination.

Pretty slick, eh?

The Hidden Drawbacks of P2P (And a Defense of Signal)

Not long ago, I posted a roundup of secure messengers with off-the-grid capabilities. Some conversation followed, which led me to consider some of the problems with P2P protocols.

P2P and Privacy

Brave adopting IPFS has driven a lot of buzz lately. IPFS is essentially a decentralized, distributed web. This concept has a lot of promise. But take a look at the IPFS privacy document. Some things to highlight:

  • “Nodes announce a variety of information essential to the DHT’s function — including their unique node identifiers (PeerIDs) and the CIDs of data that they’re providing — and because of this, information about which nodes are retrieving and/or reproviding which CIDs is publicly available.”
  • “those DHT queries happen in public. Because of this, it’s possible that third parties could be monitoring this traffic to determine what CIDs are being requested, when, and by whom.”
  • “nodes’ unique identifiers are themselves public…your PeerID is still a long-lived, unique identifier for your node. Keep in mind that it’s possible to do a DHT lookup on your PeerID and, particularly if your node is regularly running from the same location (like your home), find your IP address…Additionally, longer-term monitoring of the public IPFS network could yield information about what CIDs your node is requesting and/or reproviding and when.”

So in this case, you have traded giving information about what you request to specific sites to giving it to potentially hundreds of untrusted peers, some of which may be logging this for nefarious purposes. Worse, you have a durable PeerID that can be used for tracking and tied to your IP address — a data collector’s dream. This PeerID, combined with DHT requests and the CIDs (Content ID) of the things you host (implying you viewed them in the past), can be used to establish a picture of what you are requesting now and requested recently.

Similar can be said from everything like Scuttlebutt to GNU Jami; any service that operates on a P2P basis will likely reveal your IP, and tie your identity to it (and your IP address history). In some cases, as with Jami, this would be limited to friends you add; in others, as with Scuttlebutt and IPFS, it could be revealed to anyone.

The advantages of P2P are undeniable and profound, but few are effectively addressing the privacy implications. The one I know of that is, Briar, routes all traffic over Tor; every node is reached by a Tor onion service.

Federation: somewhat better

In a federated model, every client connects to a server, and there are many servers participating in a federation with each other. Matrix and Mastodon are examples of a federated model. In this scenario, only one server — your own homeserver — can track you by IP. End-to-end encryption is certainly possible in a federated model, and Matrix supports it. This does give a third party (the specific server you use) knowledge of your IP, but that knowledge can be significantly limited.

A downside of this approach is that if your particular homeserver is down, you are unable to communicate. Truly decentralized P2P solutions don’t have that problem — thought they do have a related one, which is that clients communicating with each other must both be online simultaneously in order for messages to be transmitted, and this can be a real challenge for mobile devices.

Centralization and Signal

Signal is centralized; it has one central server farm, and if it is down, you can’t communicate or choose any other server, either. We saw it go down recently after Elon Musk mentioned it.

Still, I recommend Signal for the general public. Here’s why.

Signal brings encryption and privacy to meet people where they’re at, not the other way around. People don’t have to choose a server, it can automatically recognize contacts that use Signal, it has emojis, attachments, secure voice and video calling, and (aside from the Musk incident), it all just works. It feels like, and is, a polished, modern experience with the bells and whistles people are used to.

I’m a huge fan of Matrix (aka Element) and even run my own instance. It has huge promise. But it is Not. There. Yet. Why do I saw this about Matrix?

  • Synapse, the only currently viable Matrix server, is not ready. My Matrix instance hosts ONE person, me. Synapse uses many GB of RAM and 10+GB of disk space. Despite extensive tuning, nothing helped much. It’s caused OOMs more than once. It can’t be hosted on a Raspberry Pi or even one of the cheaper VPSs.
  • Now then, how about choosing a Matrix instance? Well, you could just tell a person to use matrix.org. But then it spent a good portion of last year unable to federate with other popular nodes due to Synapse limitations. Or you could pick a random node, but will it be up when someone needs to say “my car broke down?” Some are run from a dorm computer, some by a team in a datacenter, some by one person with EC2, and you can’t really know. Will your homeserver be stable and long-lived? Hard to say.
  • Voice and video calling are not there yet in Matrix. Matrix has two incompatible video calling methods (Jitsi and built-in), neither work consistently well, both are hard to manage, and both have NAT challenges.
  • Matrix is so hard to set up on a server that there is matrix-docker-ansible-deploy. This makes it much better, but it is STILL terribly hard to deploy, and very simple things like “how do I delete a user” or “let me shrink down this 30GB database” are barely there yet, if at all.
  • Encryption isn’t mandatory in Matrix. E2EE has been getting dramatically better in the last few releases, but it is still optional, especially for what people would call “group chats” (rooms). Signal is ALWAYS encrypted. Always. (Unless, I guess, you set it as your SMS provider on Android). You’ve got to take the responsibility off the user to verify encryption status, and instead make it the one and only way to use the ecosystem.

Again, I love MAtrix. I use it every day to interact with Matrix, IRC, Slack, and Discord channels. It has a ton of promise. But would I count on it to carry a “my car’s broken down and I’m stranded” message? No.

How about some of the other options out there? I mentioned Briar above. It’s fantastic and its offline options are novel and promising. But in common usage, it can’t deliver a message unless both devices are online simultaneously, and doesn’t run on iOS (though both are being worked on). It also can’t send photos or do voice or video calling.

Some of these same limitations apply to most of the other Signal alternatives also. either that, or they are encryption-optional, or terribly hard to set up and use. I recently mentioned Status, which shows a ton of promise, but has no voice or video calling capabilities. Scuttlebutt is a fantastic protocol with extremely difficult onboarding (lengthy process, error-prone finding a pub, multi-GB initial download, etc.) And many of these leak IP addresses as discussed above.

So Signal gives people:

  • Dead-simple setup
  • Store-and-forward delivery (devices need not be online simultaneously)
  • Encrypted everything, including voice and video calls, and the ability to send photos and video encrypted

If you are going to tell someone, “it’s so EASY to get your texts away from Facebook and AT&T”, then Signal is the thing you’ve got to point them to. It may not be in two years, but for now, it is. Do not let the perfect be the enemy of the good. It advances the status quo without harming usability, which nothing else does yet.

I am aware of all of the very legitimate criticisms of Signal. They are real and they are why I am excited that there are so many alternatives with promise, some of which I use actively. Let us technical people use, debug, contribute to, and evangelize the alternatives.

And while we’re doing that, tell Grandma to contact us on Signal.

Roundup of Secure Messengers with Off-The-Grid Capabilities (Distributed/Mesh Messengers)

Amid all the conversation about Signal, and the debate over decentralization, one thing has often not been raised: all of these things require an Internet connection.

“Of course,” you might say. “Internet is everywhere these days.” Well, not so much, and it turns out there are some very good reasons that people might want messengers that work offline. Here are some examples:

  • Internet-using messengers leak certain metadata (eg, that a person is using it, or perhaps a sophisticated adversary could use timing analysis to determine that two people are talking using it)
  • Cell signal outages due to natural disaster, large influx of people (protests, unusual sporting events, festivals, etc), or other factors
  • Locations where cell signals are not available (rural areas, camping locations, wilderness areas, etc.)
  • Devices that don’t have cell data capability (many tablets, phones that have had service expire, etc.)

How do they work?

These all use some form of local radio signal. Some, such as Briar, may use short-range Bluetooth and Wifi, while others use radios such as LoRa that can reach several miles with low power. I’ve written quite a bit about LoRa before, and its unique low-speed but extreme-distance radio capabilities even on low power.

One common thread through these is that most of them are Android-only, though many are compatible with F-Droid and privacy-enhanced Android distributions.

Every item on this list uses full end-to-end encryption (E2EE).

Let’s dive on in.

Briar

Of all the options mentioned here, Briar is the one that bridges the traditional Internet-based approach with alternative options the best. It offers three ways for distributing data:

  • Over the Internet, via Tor onion services
  • Via Bluetooth to nearby devices
  • Via Wifi, to other devices connected to the same access point, even if Internet isn’t wokring on that AP

As far as I can tell, there is no centralized server in Briar at all. Your “account”, such as it is, lives entirely within your device; if you wipe your device, you will have to make a new account and re-establish contacts. The use of Tor is also neat to see; it ensures that an adversary can’t tell, just from that, that you’re using Briar at all, though of course timing analysis may still be possible (and Bluetooth and Wifi uses may reval some of who is communicating).

Briar features several types of messages (detailed in the manual), which really are just different spins on communication, which they liken to metaphors people are familiar with:

  • Basic 1-to-1 private messaging
  • “Private groups”, in which one particular person invites people to the chat group, and can dissolve it at any time
  • “Forums”, similar to private groups, but any existing member can invite more people to them, and they continue to exist until the last member leaves (founder isn’t special)
  • “Blogs”, messages that are automatically shared with all your contacts

By default, Briar raises an audible notification for incoming messages of all types. This is configurable for each type.

“Blogs” have a way to reblog (even a built-in RSS reader to facilitate that), but framed a different way, they are broadcast messages. They could, for instance, be useful for a “send help” message to everyone (assuming that people haven’t all shut off notifications of blogs due to others using them different ways).

Briar’s how it works page has an illustration specifically of how blogs are distributed. I’m unclear on some of the details, and to what extent this applies to other kinds of messages, but one thing that you can notice from this is that a person A could write a broadcast message without Internet access, person B could receive it via Bluetooth or whatever, and then when person B gets Internet access again, the post could be distributed more widely. However, it doesn’t appear that Briar is really a full mesh, since only known contacts in the distribution path for the message would repeat it.

There are some downsides to Briar. One is that, since an account is fully localized to a device, one must have a separate account for each device. That can lead to contacts having to pick a specific device to send a message to. There is an online indicator, which may help, but it’s definitely not the kind of seamless experience you get from Internet-only messengers. Also, it doesn’t support migrating to a new phone, live voice/video calls, or attachments, but attachments are in the works.

All in all, a solid communicator, and is the only one on this list that works 100% with the hardware everyone already has. While Bluetooth and Wifi have far more limited range than the other entries, there is undeniably convenience in not needing any additional hardware, and it may be particularly helpful when extra bags/pockets aren’t available. Also, Briar is fully Open Source.

Meshtastic

Meshtastic is a radio-first LoRa mesh project. What do I mean by radio-first? Well, basically cell phones are how you interact with Meshtastic, but they are optional. The hardware costs about $30 and the batteries last about 8 days. Range between nodes is a few miles in typical conditions (up to 11km / 7mi in ideal conditions), but nodes act as repeaters, so it is quite conceivable to just drop a node “in the middle” if you and contacts will be far apart. The project estimates that around 2000 nodes are in operation, and the network is stronger the more nodes are around.

The getting started site describes how to build one.

Most Meshtastic device builds have a screen and some buttons. They can be used independently from the Android app to display received messages, distance and bearing to other devices (assuming both have a GPS enabled), etc. This video is an introduction showing it off, this one goes over the hardware buttons. So even if your phone is dead, you can at least know where your friends are. Incidentally, the phone links up to the radio board using Bluetooth, and can provide a location source if you didn’t include one in your build. There are ideas about solar power for Meshtastic devices, too.

Meshtastic doesn’t, as far as I know, have an option for routing communication over the Internet, but the devices appear to be very thoughtfully-engineered and easy enough to put together. This one is definitely on my list to try.

Ripple-based devices

This is based on the LoRa Mesh Radio Instructables project, and is similar in concept to Meshtastic. It uses similar hardware, a similar app, but also has an option with a QWERTY hardware keyboard available, for those that want completely phone-free operation while still being able to send messages.

There are a number of related projects posted at Instructables: a GPS tracker, some sensors, etc. These are variations on the same basic concept.

These use the Ripple firmware, which is not open source, so I haven’t pursued it further.

GoTenna

For people that want less of a DIY model, and don’t mind proprietary solutions, there are two I’ll mention. The first is GoTenna Mesh, which is LoRa-based and sells units for $90 each. However, there are significant community concerns about the longevity of the project, as GoTenna has re-focused on government and corporate work. The Android app hasn’t been updated in 6 monnths despite a number of reviews citing issues, and the iOS app is also crusty.

Beartooth

Even more expensive at $125 each is the Beartooth. Also a proprietary option, I haven’t looked into it more, but they are specifically targetting backwoods types of markets.

Do not use: Bridgefy

Bridgefy was briefly prominent since it was used during the Hong Kong protests. However, numerous vulnerabilities have been demonstrated, and the developers have said they are re-working the app to address them. I wouldn’t recommend it for now.

Alternatives: GMRS handhelds

In the USA, GMRS voice handhelds are widely available. Although a license is required, it is simple (no exam) and cheap ($35) and extends to a whole family. GMRS radios also interoperate with FRS radios, which require no license and share some frequencies, but are limited to lower power (though are often sufficient).

Handheld GMRS radios that use up to 5W of power are readily available. A voice signal is a lot harder to carry for a long distance than a very low-bandwidth digital one, so even with much more power you will probably not get the same kind of range you will with something like Meshtastic, and they don’t come with any kind of security or encryption at all. However, for basic communication, they are often a useful tool.

Remote Directory Tree Comparison, Optionally Asynchronous and Airgapped

Note: this is another article in my series on asynchronous communication in Linux with UUCP and NNCP.

In the previous installment on store-and-forward backups, I mentioned how easy it is to do with ZFS, and some of the tools that can be used to do it without ZFS. A lot of those tools are a bit less robust, so we need some sort of store-and-forward mechanism to verify backups. To be sure, verifying backups is good with ANY scheme, and this could be used with ZFS backups also.

So let’s say you have a shiny new backup scheme in place, and you’d like to verify that it’s working correctly. To do that, you need to compare the source directory tree on machine A with the backed-up directory tree on machine B.

Assuming a conventional setup, here are some ways you might consider to do that:

  • Just copy everything from machine A to machine B and compare locally
  • Or copy everything from machine A to a USB drive, plug that into machine B, and compare locally
  • Use rsync in dry-run mode and see if it complains about anything

The first two options are not particularly practical for large datasets, though I note that the second is compatible with airgapping. Using rsync requires both systems to be online at the same time to perform the comparison.

What would be really nice here is a tool that would write out lots of information about the files on a system: their names, sizes, last modified dates, maybe even sha256sum and other data. This file would be far smaller than the directory tree itself, would compress nicely, and could be easily shipped to an airgapped system via NNCP, UUCP, a USB drive, or something similar.

Tool choices

It turns out there are already quite a few tools in Debian (and other Free operating systems) to do this, and half of them are named mtree (though, of course, not all mtrees are compatible with each other.) We’ll look at some of the options here.

I’ve made a simple test directory for illustration purposes with these commands:

mkdir test
cd test
echo hi > hi
ln -s hi there
ln hi foo
touch empty
mkdir emptydir
mkdir somethingdir
cd somethingdir
ln -s ../there

I then also used touch to set all files to a consistent timestamp for illustration purposes.

Tool option: getfacl (Debian package: acl)

This comes with the acl package, but can be used with other than ACL purposes. Unfortunately, it doesn’t come with a tool to directly compare its output with a filesystem (setfacl, for instance, can apply the permissions listed but won’t compare.) It ignores symlinks and doesn’t show sizes or dates, so is ineffective for our purposes.

Example output:

$ getfacl --numeric -R test
...
# file: test/hi
# owner: 1000
# group: 1000
user::rw-
group::r--
other::r--
...

Tool option: fmtree, the FreeBSD mtree (Debian package: freebsd-buildutils)

fmtree can prepare a “specification” based on a directory tree, and compare a directory tree to that specification. The comparison also is aware of files that exist in a directory tree but not in the specification. The specification format is a bit on the odd side, but works well enough with fmtree. Here’s a sample output with defaults:

$ fmtree -c -p test
...
# .
/set type=file uid=1000 gid=1000 mode=0644 nlink=1
.               type=dir mode=0755 nlink=4 time=1610421833.000000000
    empty       size=0 time=1610421833.000000000
    foo         nlink=2 size=3 time=1610421833.000000000
    hi          nlink=2 size=3 time=1610421833.000000000
    there       type=link mode=0777 time=1610421833.000000000 link=hi

... skipping ...

# ./somethingdir
/set type=file uid=1000 gid=1000 mode=0777 nlink=1
somethingdir    type=dir mode=0755 nlink=2 time=1610421833.000000000
    there       type=link time=1610421833.000000000 link=../there
# ./somethingdir
..

..

You might be wondering here what it does about special characters, and the answer is that it has octal escapes, so it is 8-bit clean.

To compare, you can save the output of fmtree to a file, then run like this:

cd test
fmtree < ../test.fmtree

If there is no output, then the trees are identical. Change something and you get a line of of output explaining each difference. You can also use fmtree -U to change things like modification dates to match the specification.

fmtree also supports quite a few optional keywords you can add with -K. They include things like file flags, user/group names, various tipes of hashes, and so forth. I'll note that none of the options can let you determine which files are hardlinked together.

Here's an excerpt with -K sha256digest added:

    empty       size=0 time=1610421833.000000000 \
                sha256digest=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    foo         nlink=2 size=3 time=1610421833.000000000 \
                sha256digest=98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4

If you include a sha256digest in the spec, then when you verify it with fmtree, the verification will also include the sha256digest. Obviously fmtree -U can't correct a mismatch there, but of course it will detect and report it.

Tool option: mtree, the NetBSD mtree (Debian package: mtree-netbsd)

mtree produces (by default) output very similar to fmtree. With minor differences (such as the name of the sha256digest in the output), the discussion above about fmtree also applies to mtree.

There are some differences, and the most notable is that mtree adds a -C option which reads a spec and converts it to a "format that's easier to parse with various tools." Here's an example:

$ mtree -c -K sha256digest -p test | mtree -C
. type=dir uid=1000 gid=1000 mode=0755 nlink=4 time=1610421833.0 flags=none 
./empty type=file uid=1000 gid=1000 mode=0644 nlink=1 size=0 time=1610421833.0 flags=none 
./foo type=file uid=1000 gid=1000 mode=0644 nlink=2 size=3 time=1610421833.0 flags=none 
./hi type=file uid=1000 gid=1000 mode=0644 nlink=2 size=3 time=1610421833.0 flags=none 
./there type=link uid=1000 gid=1000 mode=0777 nlink=1 link=hi time=1610421833.0 flags=none 
./emptydir type=dir uid=1000 gid=1000 mode=0755 nlink=2 time=1610421833.0 flags=none 
./somethingdir type=dir uid=1000 gid=1000 mode=0755 nlink=2 time=1610421833.0 flags=none 
./somethingdir/there type=link uid=1000 gid=1000 mode=0777 nlink=1 link=../there time=1610421833.0 flags=none 

Most definitely an improvement in both space and convenience, while still retaining the relevant information. Note that if you want the sha256digest in the formatted output, you need to pass the -K to both mtree invocations. I could have done that here, but it is easier to read without it.

mtree can verify a specification in either format. Given what I'm about to show you about bsdtar, this should illustrate why I bothered to package mtree-netbsd for Debian.

Unlike fmtree, the mtree -U command will not adjust modification times based on the spec, but it will report on differences.

Tool option: bsdtar (Debian package: libarchive-tools)

bsdtar is a fascinating program that can work with many formats other than just tar files. Among the formats it supports is is the NetBSD mtree "pleasant" format (mtree -C compatible).

bsdtar can also convert between the formats it supports. So, put this together: bsdtar can convert a tar file to an mtree specification without extracting the tar file. bsdtar can also use an mtree specification to override the permissions on files going into tar -c, so it is a way to prepare a tar file with things owned by root without resorting to tools like fakeroot.

Let's look at how this can work:

$ cd test
$ bsdtar --numeric -cf - --format=mtree .

. time=1610472086.318593729 mode=755 gid=1000 uid=1000 type=dir
./empty time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=0
./foo nlink=2 time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=3
./hi nlink=2 time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=3
./ormat\075mtree time=1610472086.318593729 mode=644 gid=1000 uid=1000 type=file size=5632
./there time=1610421833.0 mode=777 gid=1000 uid=1000 type=link link=hi
./emptydir time=1610421833.0 mode=755 gid=1000 uid=1000 type=dir
./somethingdir time=1610421833.0 mode=755 gid=1000 uid=1000 type=dir
./somethingdir/there time=1610421833.0 mode=777 gid=1000 uid=1000 type=link link=../there

You can use mtree -U to verify that as before. With the --options mtree: set, you can also add hashes and similar to the bsdtar output. Since bsdtar can use input from tar, pax, cpio, zip, iso9660, 7z, etc., this capability can be used to create verification of the files inside quite a few different formats. You can convert with bsdtar -cf output.mtree --format=mtree @input.tar. There are some foibles with directly using these converted files with mtree -U, but usually minor changes will get it there.

Side mention: stat(1) (Debian package: coreutils)

This tool isn't included because it won't operate recursively, but is a tool in the similar toolbox.

Putting It Together

I will still be developing a complete non-ZFS backup system for NNCP (or UUCP) in a future post. But in the meantime, here are some ideas you can reflect on:

  • Let's say your backup scheme involves sending a full backup every night. On the source system, you could pipe the generated tar file through something like tee >(bsdtar -cf bcakup.mtree @-) to generate an mtree file in-band while generating the tar file. This mtree file could be shipped over for verification.
  • Perhaps your backup scheme involves sending incremental backup data via rdup or even ZFS, but you would like to periodically verify that everything is good -- that an incremental didn't miss something. Something like mtree -K sha256 -c -x -p / | mtree -C -K sha256 would let you accomplish that.

I will further develop at least one of these ideas in a future post.

Bonus: cross-tool comparisons

In my mtree-netbsd packaging, I added tests like this to compare between tools:

fmtree -c -K $(MTREE_KEYWORDS) | mtree
mtree -c -K $(MTREE_KEYWORDS) | sed -e 's/\(md5\|sha1\|sha256\|sha384\|sha512\)=/\1digest=/' -e 's/rmd160=/ripemd160digest=/' | fmtree
bsdtar -cf - --options 'mtree:uname,gname,md5,sha1,sha256,sha384,sha512,device,flags,gid,link,mode,nlink,size,time,uid,type,uname' --format mtree . | mtree