Category Archives: Online Life

I’m Not Very Popular, Thankfully. That Makes The Internet Fun Again

“Like and subscribe!”

“Help us get our next thousand (or million) followers!”

I was using Linux before it was popular. Back in the day where you had to write Modelines for your XF86Config file — and do it properly, or else you might ruin your monitor. Back when there wasn’t a word processor (thankfully; that forced me to learn LaTeX, which I used to write my papers in college).

I then ran Linux on an Alpha, a difficult proposition in an era when web browsers were either closed-source or too old to be useful; all sorts of workarounds, including emulating Digital UNIX.

Recently I wrote a deep dive into the DOS VGA text mode and how to achieve it on a modern UEFI Linux system.

Nobody can monetize things like this. I am one of maybe a dozen or two people globally that care about that sort of thing. That’s fine.

Today, I’m interested in things like asynchronous communication, NNCP, and Gopher. Heck, I’m posting these words on a blog. Social media displaced those, right?

Some of the things I write about here have maybe a few dozen people on the planet interested in them. That’s fine.

I have no idea how many people read my blog. I have no idea where people hear about my posts from. I guess I can check my Mastodon profile to see how many followers I have, but it’s not something I tend to do. I don’t know if the number is going up or down, or if it is all that much in Mastodon terms (probably not).

Thank goodness.

Since I don’t have to care about what’s popular, or spend hours editing video, or thousands of dollars on video equipment, I can just sit down and write about what interests me. If that also interests you, then great. If not, you can find what interests you — also fine.

I once had a colleague that was one of these “plugged into Silicon Valley” types. He would periodically tell me, with a mixture of excitement and awe, that one of my posts had made Hacker News.

This was always news to me, because I never paid a lot of attention over there. Occasionally that would bring in some excellent discussion, but more often than not, it was comments from people that hadn’t read or understood the article trying to appear smart by arguing with what it — or rather, what they imagined it said, I guess.

The thing I value isn’t subscriber count. It’s discussion. A little discussion in the comments or on Mastodon – that’s perfect, even if only 10 people read the article. I have the most fun in a community.

And I’ll go on writing about NNCP and Gopher and non-square DOS pixels, with audiences of dozens globally. I have no advertisers to keep happy, and I enjoy it, so why not?

A Twisty Maze of Ill-Behaved Bots

Like many, bot traffic has been causing significant issues for my hosted server recently. I’ve been noticing a dramatic increase in bots that do not respect robots.txt, especially the crawl-delay I have set there. Not only that, but many of them are sending user-agent strings that are quite precisely matching what desktop browsers send. That is, they don’t identify themselves.

They posed a particular problem on two sites: my blog, and the lists.complete.org archives.

The list archives is a completely static site, but it has many pages, so the bots that are ill-behaved absolutely hammer it following links.

My blog runs WordPress. It has fewer pages, but by using PHP, doesn’t need as many hits to start to bog down. Also, there is a Mastodon thundering herd problem, and since I participate on Mastodon, this hits my server.

The solution was one of layers.

I had already added a crawl-delay line to robots.txt. It helped a bit, but many bots these days aren’t well-behaved. Next, I added WP Super Cache to my WordPress installation. I also enabled APCu in PHP and installed APCu Manager. Again, each step helped. Again, not quite enough.

Finally, I added Anubis. Installing it (especially if using the Docker container) was under-documented, but I figured it out. By default, it is designed to block AI bots and try to challenge everything with “Mozilla” in its user-agent (which is most things) with a Javascript challenge.

That’s not quite what I want. If a bot is well-behaved, AI or otherwise, it will respect my robots.txt and I can more precisely control it there. Also, I intentionally support non-Javascript browsers on many of the sites I host, so I wanted to be judicious. Eventually I configured Anubis to only challenge things that present a user-agent that looks fully like a real browser. In other words, real browsers should pass right through, and bad bots pretending to be real browsers will fail.

That was quite effective. It reduced load further to the point where things are ordinarily fairly snappy.

I had previously been using mod_security to block some bots, but it seemed to be getting in the way of the Fediverse at times. When I disabled it, I observed another increase in speed. Anubis was likely going to get rid of those annoying bots itself anyhow.

As a final step, I migrated to a faster hosting option. This post will show me how well it survives the Mastodon thundering herd!

Update: Yes, it handled it quite nicely now.

Memoirs of the Early Internet

The Internet is an amazing place, and occasionally you can find things on the web that have somehow lingered online for decades longer than you might expect.

Today I’ll take you on a tour of some parts of the early Internet.

The Internet, of course, is a “network of networks” and part of its early (and continuing) promise was to provide a common protocol that all sorts of networks can use to interoperate with each other. In the early days, UUCP was one of the main ways universities linked with each other, and eventually UUCP and the Internet sort of merged (but that’s a long story).

Let’s start with some Usenet maps, which were an early way to document the UUCP modem links between universities. Start with this PDF. The first page is a Usenet map (which at the time mostly flowed over UUCP) from April of 1981. Notice that ucbvax, a VAX system at Berkeley, was central to the map.

ucbvax continued to be a central node for UUCP for more than a decade; on page 5 of that PDF, you’ll see that it asks for a “Path from a major node (eg, ucbvax, devcax, harpo, duke)”. Pre-Internet email addresses used a path; eg, mark@ucbvax was duke!decvax!ucbvax!mark to someone. You had to specify the route from your system to the recipient on your email To line. If you gave out your email address on a business card, you would start it from a major node like ucbvax, and the assumption was that everyone would know how to get from their system to the major node.

On August 19, 1994, ucbvax was finally turned off. TCP/IP had driven UUCP into more obscurity; by then, it was mostly used by people without a dedicated Internet connection to get on the Internet, rather than an entire communication network of its own. A few days later, Cliff Frost posted a memoir of ucbvax; an obscurbe bit of Internet lore that is fun to read.

UUCP was ad-hoc, and by 1984 there was an effort to make a machine-parsable map to help automate routing on UUCP. This was called the pathalias project, and there was a paper about it. The Linux network administration guide even includes a section on pathalias.

Because UUCP mainly flowed over phone lines, long distance fees made it quite expensive. In 1985, the Stargate Project was formed, with the idea of distributing Usenet by satellite. The satellite link was short-lived, but the effort eventually morphed into UUNET. It was initially a non-profit, but eventually became a commercial backbone provider, and later ISP. Over a long series of acquisitions, UUNET is now part of Verizon. An article in ;login: is another description of this history.

IAPS has an Internet in 1990 article, which includes both pathalias data and an interesting map of domain names to UUCP paths.

As I was pondering what interesting things a person could do with NNCPNET Internet email, I stumbled across a page on getting FTP files via e-mail. Yes, that used to be a thing! I remember ftpmail@decwrl.dec.com.

It turns out that page is from a copy of EFF’s (Extended) Guide to the Internet from 1994. Wow, what a treasure! It has entries such as A Slice of Life in my Virtual Community, libraries with telnet access, Gopher, A Statement of Principle by Bruce Sterling, and I could go on. You can also get it as a PDF from Internet Archive.

UUCP is still included with modern Linux and BSD distributions. It was part of how I experienced the PC and Internet revolution in rural America. It lacks modern security, but NNCP is to UUCP what ssh is to telnet.

NNCPNET Can Optionally Exchange Internet Email

A few days ago, I announced NNCPNET, the email network based atop NNCP. NNCPNET lets anyone run a real mail server on a network that supports all sorts of topologies for transport, from Internet to USB drives. And verification is done at the NNCP protocol level, so a whole host of Internet email bolt-ons (SPF, DMARC, DKIM, etc.) are unnecessary.

Shortly after announcing NNCPNET, I added an Internet bridge. This lets you get your own DOMAIN.nncpnet.org domain, and from there route email to and from the Internet using a gateway node. Simple, effective, and a way to get real email to and from your laptop or Raspberry Pi without having to have a static IP, SPF, DMARC, DKIM, etc.

It’s a volunteer-run, free, service. Give it a try!

Announcing the NNCPNET Email Network

From 1995 to 2019, I ran my own mail server. It began with a UUCP link, an expensive long-distance call for me then. Later, I ran a mail server in my apartment, then ran it as a VPS at various places.

But running an email server got difficult. You can’t just run it on a residential IP. Now there’s SPF, DKIM, DMARC, and TLS to worry about. I recently reviewed mail hosting services, and don’t get me wrong: I still use one, and probably will, because things like email from my bank are critical.

But we’ve lost the ability to tinker, to experiment, to have fun with email.

Not anymore. NNCPNET is an email system that runs atop NNCP. I’ve written a lot about NNCP, including a less-ambitious article about point-to-point email over NNCP 5 years ago. NNCP is to UUCP what ssh is to telnet: a modernization, with modern security and features. NNCP is an asynchronous, onion-routed, store-and-forward network. It can use as a transport anything from the Internet to a USB stick.

NNCPNET is a set of standards, scripts, and tools to facilitate a broader email network using NNCP as the transport. You can read more about NNCPNET on its wiki!

The “easy mode” is to use the Docker container (multi-arch, so you can use it on your Raspberry Pi) I provide, which bundles:

  • Exim mail server
  • NNCP
  • Verification and routing tools I wrote. Because NNCP packets are encrypted and signed, we get sender verification “for free”; my tools ensure the From: header corresponds with the sending node.
  • Automated nodelist tools; it will request daily nodelist updates and update its configurations accordingly, so new members can be communicated with
  • Integration with the optional, opt-in Internet email bridge

It is open to all. The homepage has a more extensive list of features.

I even have mailing lists running on NNCPNET; see the interesting addresses page for more details.

There is extensive documentation, and of course the source to the whole thing is available.

The gateway to Internet SMTP mail is off by default, but can easily be enabled for any node. It is a full participant, in both directions, with SPF, DKIM, DMARC, and TLS.

You don’t need any inbound ports for any of this. You don’t need an always-on Internet connection. You don’t even need an Internet connection at all. You can run it from your laptop and still use Thunderbird to talk to it via its optional built-in IMAP server.

Why You Should (Still) Use Signal As Much As Possible

As I write this in March 2025, there is a lot of confusion about Signal messenger due to the recent news of people using Signal in government, and subsequent leaks.

The short version is: there was no problem with Signal here. People were using it because they understood it to be secure, not the other way around.

Both the government and the Electronic Frontier Foundation recommend people use Signal. This is an unusual alliance, and in the case of the government, was prompted because it understood other countries had a persistent attack against American telephone companies and SMS traffic.

So let’s dive in. I’ll cover some basics of what security is, what happened in this situation, and why Signal is a good idea.

This post isn’t for programmers that work with cryptography every day. Rather, I hope it can make some of these concepts accessible to everyone else.

What makes communications secure?

When most people are talking about secure communications, they mean some combination of these properties:

  1. Privacy - nobody except the intended recipient can decode a message.
  2. Authentication - guarantees that the person you are chatting with really is the intended recipient.
  3. Ephemerality - preventing a record of the communication from being stored. That is, making it more like a conversation around the table than a written email.
  4. Anonymity - keeping your set of contacts to yourself and even obfuscating the fact that communications are occurring.

If you think about it, most people care the most about the first two. In fact, authentication is a key part of privacy. There is an attack known as man in the middle in which somebody pretends to be the intended recipient. The interceptor reads the messages, and then passes them on to the real intended recipient. So we can’t really have privacy without authentication.

I’ll have more to say about these later. For now, let’s discuss attack scenarios.

What compromises security?

There are a number of ways that security can be compromised. Let’s think through some of them:

Communications infrastructure snooping

Let’s say you used no encryption at all, and connected to public WiFi in a coffee shop to send your message. Who all could potentially see it?

  • The owner of the coffee shop’s WiFi
  • The coffee shop’s Internet provider
  • The recipient’s Internet provider
  • Any Internet providers along the network between the sender and the recipient
  • Any government or institution that can compel any of the above to hand over copies of the traffic
  • Any hackers that compromise any of the above systems

Back in the early days of the Internet, most traffic had no encryption. People were careful about putting their credit cards into webpages and emails because they knew it was easy to intercept them. We have been on a decades-long evolution towards more pervasive encryption, which is a good thing.

Text messages (SMS) follow a similar path to the above scenario, and are unencrypted. We know that all of the above are ways people’s texts can be compromised; for instance, governments can issue search warrants to obtain copies of texts, and China is believed to have a persistent hack into western telcos. SMS fails all four of our attributes of secure communication above (privacy, authentication, ephemerality, and anonymity).

Also, think about what information is collected from SMS and by who. Texts you send could be retained in your phone, the recipient’s phone, your phone company, their phone company, and so forth. They might also live in cloud backups of your devices. You only have control over your own phone’s retention.

So defenses against this involve things like:

  • Strong end-to-end encryption, so no intermediate party – even the people that make the app – can snoop on it.
  • Using strong authentication of your peers
  • Taking steps to prevent even app developers from being able to see your contact list or communication history

You may see some other apps saying they use strong encryption or use the Signal protocol. But while they may do that for some or all of your message content, they may still upload your contact list, history, location, etc. to a central location where it is still vulnerable to these kinds of attacks.

When you think about anonymity, think about it like this: if you send a letter to a friend every week, every postal carrier that transports it – even if they never open it or attempt to peak inside – will be able to read the envelope and know that you communicate on a certain schedule with that friend. The same can be said of SMS, email, or most encrypted chat operators. Signal’s design prevents it from retaining even this information, though nation-states or ISPs might still be able to notice patterns (every time you send something via Signal, your contact receives something from Signal a few milliseconds later). It is very difficult to provide perfect anonymity from well-funded adversaries, even if you can provide very good privacy.

Device compromise

Let’s say you use an app with strong end-to-end encryption. This takes away some of the easiest ways someone could get to your messages. But it doesn’t take away all of them.

What if somebody stole your phone? Perhaps the phone has a password, but if an attacker pulled out the storage unit, could they access your messages without a password? Or maybe they somehow trick or compel you into revealing your password. Now what?

An even simpler attack doesn’t require them to steal your device at all. All they need is a few minutes with it to steal your SIM card. Now they can receive any texts sent to your number - whether from your bank or your friend. Yikes, right?

Signal stores your data in an encrypted form on your device. It can protect it in various ways. One of the most important protections is ephemerality - it can automatically delete your old texts. A text that is securely erased can never fall into the wrong hands if the device is compromised later.

An actively-compromised phone, though, could still give up secrets. For instance, what if a malicious keyboard app sent every keypress to an adversary? Signal is only as secure as the phone it runs on – but still, it protects against a wide variety of attacks.

Untrustworthy communication partner

Perhaps you are sending sensitive information to a contact, but that person doesn’t want to keep it in confidence. There is very little you can do about that technologically; with pretty much any tool out there, nothing stops them from taking a picture of your messages and handing the picture off.

Environmental compromise

Perhaps your device is secure, but a hidden camera still captures what’s on your screen. You can take some steps against things like this, of course.

Human error

Sometimes humans make mistakes. For instance, the reason a reporter got copies of messages recently was because a participant in a group chat accidentally added him (presumably that participant meant to add someone else and just selected the wrong name). Phishing attacks can trick people into revealing passwords or other sensitive data. Humans are, quite often, the weakest link in the chain.

Protecting yourself

So how can you protect yourself against these attacks? Let’s consider:

  • Use a secure app like Signal that uses strong end-to-end encryption where even the provider can’t access your messages
  • Keep your software and phone up-to-date
  • Be careful about phishing attacks and who you add to chat rooms
  • Be aware of your surroundings; don’t send sensitive messages where people might be looking over your shoulder with their eyes or cameras

There are other methods besides Signal. For instance, you could install GnuPG (GPG) on a laptop that has no WiFi card or any other way to connect it to the Internet. You could always type your messages on that laptop, encrypt them, copy the encrypted text to a floppy disk (or USB device), take that USB drive to your Internet computer, and send the encrypted message by email or something. It would be exceptionally difficult to break the privacy of messages in that case (though anonymity would be mostly lost). Even if someone got the password to your “secure” laptop, it wouldn’t do them any good unless they physically broke into your house or something. In some ways, it is probably safer than Signal. (For more on this, see my article How gapped is your air?)

But, that approach is hard to use. Many people aren’t familiar with GnuPG. You don’t have the convenience of sending a quick text message from anywhere. Security that is hard to use most often simply isn’t used. That is, you and your friends will probably just revert back to using insecure SMS instead of this GnuPG approach because SMS is so much easier.

Signal strikes a unique balance of providing very good security while also being practical, easy, and useful. For most people, it is the most secure option available.

Signal is also open source; you don’t have to trust that it is as secure as it says, because you can inspect it for yourself. Also, while it’s not federated, I previously addressed that.

Government use

If you are a government, particularly one that is highly consequential to the world, you can imagine that you are a huge target. Other nations are likely spending billions of dollars to compromise your communications. Signal itself might be secure, but if some other government can add spyware to your phones, or conduct a successful phishing attack, you can still have your communications compromised.

I have no direct knowledge, but I think it is generally understood that the US government maintains communications networks that are entirely separate from the Internet and can only be accessed from secure physical locations and secure rooms. These can be even more secure than the average person using Signal because they can protect against things like environmental compromise, human error, and so forth. The scandal in March of 2025 happened because government employees were using Signal rather than official government tools for sensitive information, had taken advantage of Signal’s ephemerality (laws require records to be kept), and through apparent human error had directly shared this information with a reporter. Presumably a reporter would have lacked access to the restricted communications networks in the first place, so that wouldn’t have been possible.

This doesn’t mean that Signal is bad. It just means that somebody that can spend billions of dollars on security can be more secure than you. Signal is still a great tool for people, and in many cases defeats even those that can spend lots of dollars trying to defeat it.

And remember - to use those restricted networks, you have to go to specific rooms in specific buildings. They are still not as convenient as what you carry around in your pocket.

Conclusion

Signal is practical security. Do you want phone companies reading your messages? How about Facebook or X? Have those companies demonstrated that they are completely trustworthy throughout their entire history?

I say no. So, go install Signal. It’s the best, most practical tool we have.


This post is also available on my website, where it may be periodically updated.

Censorship Is Complicated: What Internet History Says about Meta/Facebook

In light of this week’s announcement by Meta (Facebook, Instagram, Threads, etc), I have been pondering this question: Why am I, a person that has long been a staunch advocate of free speech and encryption, leery of sites that talk about being free speech-oriented? And, more to the point, why an I — a person that has been censored by Facebook for mentioning the Open Source social network Mastodon — not cheering a “lighter touch”?

The answers are complicated, and take me back to the early days of social networking. Yes, I mean the 1980s and 1990s.

Before digital communications, there were barriers to reaching a lot of people. Especially money. This led to a sort of self-censorship: it may be legal to write certain things, but would a newspaper publish a letter to the editor containing expletives? Probably not.

As digital communications started to happen, suddenly people could have their own communities. Not just free from the same kinds of monetary pressures, but free from outside oversight (parents, teachers, peers, community, etc.) When you have a community that the majority of people lack the equipment to access — and wouldn’t understand how to access even if they had the equipment — you have a place where self-expression can be unleashed.

And, as J. C. Herz covers in what is now an unintentional history (her book Surfing on the Internet was published in 1995), self-expression WAS unleashed. She enjoyed the wit and expression of everything from odd corners of Usenet to the text-based open world of MOOs and MUDs. She even talks about groups dedicated to insults (flaming) in positive terms.

But as I’ve seen time and again, if there are absolutely no rules, then whenever a group gets big enough — more than a few dozen people, say — there are troublemakers that ruin it for everyone. Maybe it’s trolling, maybe it’s vicious attacks, you name it — it will arrive and it will be poisonous.

I remember the debates within the Debian community about this. Debian is one of the pillars of the Internet today, a nonprofit project with free speech in its DNA. And yet there were inevitably the poisonous people. Debian took too long to learn that allowing those people to run rampant was causing more harm than good, because having a well-worn Delete key and a tolerance for insults became a requirement for being a Debian developer, and that drove away people that had no desire to deal with such things. (I should note that Debian strikes a much better balance today.)

But in reality, there were never absolutely no rules. If you joined a BBS, you used it at the whim of the owner (the “sysop” or system operator). The sysop may be a 16-yr-old running it from their bedroom, or a retired programmer, but in any case they were letting you use their resources for free and they could kick you off for any or no reason at all. So if you caused trouble, or perhaps insulted their cat, you’re banned. But, in all but the smallest towns, there were other options you could try.

On the other hand, sysops enjoyed having people call their BBSs and didn’t want to drive everyone off, so there was a natural balance at play. As networks like Fidonet developed, a sort of uneasy approach kicked in: don’t be excessively annoying, and don’t be easily annoyed. Like it or not, it seemed to generally work. A BBS that repeatedly failed to deal with troublemakers could risk removal from Fidonet.

On the more institutional Usenet, you generally got access through your university (or, in a few cases, employer). Most universities didn’t really even know they were running a Usenet server, and you were generally left alone. Until you did something that annoyed somebody enough that they tracked down the phone number for your dean, in which case real-world consequences would kick in. A site may face the Usenet Death Penalty — delinking from the network — if they repeatedly failed to prevent malicious content from flowing through their site.

Some BBSs let people from minority communities such as LGBTQ+ thrive in a place of peace from tormentors. A lot of them let people be themselves in a way they couldn’t be “in real life”. And yes, some harbored trolls and flamers.

The point I am trying to make here is that each BBS, or Usenet site, set their own policies about what their own users could do. These had to be harmonized to a certain extent with the global community, but in a certain sense, with BBSs especially, you could just use a different one if you didn’t like what the vibe was at a certain place.

That this free speech ethos survived was never inevitable. There were many attempts to regulate the Internet, and it was thanks to the advocacy of groups like the EFF that we have things like strong encryption and a degree of freedom online.

With the rise of the very large platforms — and here I mean CompuServe and AOL at first, and then Facebook, Twitter, and the like later — the low-friction option of just choosing a different place started to decline. You could participate on a Fidonet forum from any of thousands of BBSs, but you could only participate in an AOL forum from AOL. The same goes for Facebook, Twitter, and so forth. Not only that, but as social media became conceived of as very large sites, it became impossible for a person with enough skill, funds, and time to just start a site themselves. Instead of neading a few thousand dollars of equipment, you’d need tens or hundreds of millions of dollars of equipment and employees.

All that means you can’t really run Facebook as a nonprofit. It is a business. It should be absolutely clear to everyone that Facebook’s mission is not the one they say it is — “[to] give people the power to build community and bring the world closer together.” If that was their goal, they wouldn’t be creating AI users and AI spam and all the rest. Zuck isn’t showing courage; he’s sucking up to Trump and those that will pay the price are those that always do: women and minorities.

Really, the point of any large social network isn’t to build community. It’s to make the owners their next billion. They do that by convincing people to look at ads on their site. Zuck is as much a windsock as anyone else; he will adjust policies in whichever direction he thinks the wind is blowing so as to let him keep putting ads in front of eyeballs, and stomp all over principles — even free speech — doing it. Don’t expect anything different from any large commercial social network either. Bluesky is going to follow the same trajectory as all the others.

The problem with a one-size-fits-all content policy is that the world isn’t that kind of place. For instance, I am a pacifist. There is a place for a group where pacifists can hang out with each other, free from the noise of the debate about pacifism. And there is a place for the debate. Forcing everyone that signs up for the conversation to sign up for the debate is harmful. Preventing the debate is often also harmful. One company can’t square this circle.

Beyond that, the fact that we care so much about one company is a problem on two levels. First, it indicates how succeptible people are to misinformation and such. I don’t have much to offer on that point. Secondly, it indicates that we are too centralized.

We have a solution there: Mastodon. Mastodon is a modern, open source, decentralized social network. You can join any instance, easily migrate your account from one server to another, and so forth. You pick an instance that suits you. There are thousands of others you can choose from. Some aggressively defederate with instances known to harbor poisonous people; some don’t.

And, to harken back to the BBS era, if you have some time, some skill, and a few bucks, you can run your own Mastodon instance.

Personally, I still visit Facebook on occasion because some people I care about are mainly there. But it is such a terrible experience that I rarely do. Meta is becoming irrelevant to me. They are on a path to becoming irrelevant to many more as well. Maybe this is the moment to go “shrug, this sucks” and try something better.

(And when you do, feel free to say hi to me at @jgoerzen@floss.social on Mastodon.)

Review of Reputable, Functional, and Secure Email Service

I last reviewed email services in 2019. That review focused a lot of attention on privacy. At the time, I selected mailbox.org as my provider, and have been using them for these 5 years since. However, both their service and their support have gone significantly downhill since, so it is time for me to look at other options.

Here I am focusing strongly on email. Some of the providers mentioned here provide other services (IM, video calls, groupware, etc.), and to the extent they do, I am ignoring them.

What Matters in 2024

I want to start off by acknowledging that what you need in email probably depends on your circumstances and the country in which you live. For me, I begin by naming that the largest threat most of us face isn’t from state actors but from criminals: hackers, ransomware gangs, etc. It is important to take as many steps as possible to secure one’s account against that. Privacy and security are both part of the mix. I still value privacy but I am acknowledging, as Migadu does, that “Email as we know it and encryption are incompatible.” Although some of these services strongly protect parts of the conversation, the reality is that most people will be emailing people using plain old email services which don’t. For stronger security, something like Signal would be needed. (I wrote about Signal in 2021 also.)

Interestingly, OpenPGP support seems to be something of a standard feature in the providers I reviewed by this point. All or almost all of them provide integration with browser-based encryption as well as server-side encryption if you prefer that.

Although mailbox.org can automatically PGP-encrypt every message that arrives in plaintext, for general use, this is unwieldy; there isn’t good tooling for searching mailboxes where every message is encrypted, etc. So I never enabled that feature at Mailbox. I still value security and privacy, but a pragmatic approach addresses the most pressing threats first.

My criteria

The basic requirements for an email service include:

  1. Ability to use my own domains
  2. Strong privacy policy
  3. Ability for me to use my own IMAP and SMTP clients on both desktop and mobile
  4. It must be extremely reliable
  5. It must not be free
  6. It must have excellent support for those rare occasions when it is needed
  7. Support for basic aliases

Why do I say it must not be free? Because if someone is providing a service with the quality I’m talking about here, and not charging for it, it implies something is fishy: either they are unscrupulous, are financially unstable, or the product is something else like ads. I am not aware of any provider that matches the other criteria with a free account anyhow. These providers range from about $30 to $90 per year, so cheaper than a Netflix subscription.

Immediately, this rules out several options:

  • Proton doesn’t let me use my own clients on mobile (their bridge is desktop-only)
  • Tuta also doesn’t let me use my own clients
  • Posteo doesn’t let me use my own domain
  • mxroute.com lacks a strong privacy policy, and its policy has numerous causes for concern (for instance, “If you repeatedly send email to invalid/unroutable recipients, they may be published on our GitHub”)

I will have a bit more to say about a couple of these providers below.

There are some additional criteria that are strongly desired but not absolutely required:

  1. Ability to set individual access passwords for every device/app
  2. Support for two-factor authentication (2FA/TFA/TOTP) for web-based access
  3. Support for basics in filtering: ability to filter on envelope recipient (so if I get BCC’d, I can still filter), and ability to execute more than one action on filter match (eg, deliver to two folders, or deliver to a folder and forward to someone else)

IMAP and SMTP don’t really support 2FA, so by setting individual passwords for every device, you can at least limit the blast radius and cut off a specific device if something is (or might be) compromised.

The candidates

I considered these providers: Startmail, Mailfence, Runbox, Fastmail, Kolab, Mailbox.org, and Migadu. I’ll review each, and highlight the pricing of the plan I would most likely use. Each provider offers multiple plans; some may be more expensive and some may be cheaper than the one I reviewed. I included a link to each provider’s full pricing information so you can compare for your needs.

I set up trials with each of these (except Mailbox.org, with which I already had a paid account). It so happend that I had actual questions for support for each one, which gave me an opportunity to see how support responded. I did not fabricate questions, and would not have contacted support if I didn’t have real ones. (This means that I asked different questions of each provider, because they were the REAL questions I had.) I’ll jump to the spoiler right now: I eventually chose Migadu, with Fastmail and Mailfence as close seconds.

I looked for providers myself, and also solicited recommendations in a Mastodon thread.

Mailbox.org

I begin with Mailbox, as it was my top choice in 2019 and the incumbent.

Until this year, I had been quite happy with it. I had cause to reach their support less than once a year on average, and each time they replied the same day or next day. Now, however, they are failing on reliability and on support.

Their spam filter has become overly aggressive. It has blocked quite a bit of legitimate mail. When contacting their support about a prior issue earlier this year, they initially took 4 days to reply, and then 6 days to reply after that. Ouch. They had me disable some spam settings.

It didn’t really help. I continue to lose mail. I don’t know how much, because they block a lot of it before it even hits the spam folder. One of my friends texted to say mail was dropping. I raised a new ticket with mailbox, which took them 5 days to reply to. Their reply was unhelpful. “As the Internet is not a static system, unforeseen events can always occur.” Well yes, that’s true, and I get it, false positives exist with email. But this was from an ISP’s mail system with an address that had been established for years, and it was part of a larger pattern of rejecting quite a bit of legit mail. And every interaction with them recently hasn’t resulted in them actually doing anything to resolve anything. It’s just a paragraph or two of reply that does nothing and helps nothing.

When I complained that it took 5 days to reply, they said “We have not been able to reply sooner as we are currently experiencing a high volume of customer enquiries.” Even though their SLA for my account is a not-great “48 business hour” turnaround, they still missed it and their reason is “we’re busy.” I finally asked what RBL had caught the blocked email, since when I checked, the sender wasn’t on any RBL. Mailbox’s reply: they only keep their logs for 7 days, so next time I should contact them within 7 days. Which, of course, I DID; it was them that kept delaying. Ugh! It’s like they’ve become a cable company.

Even worse is how they have been blocking mail from GrapheneOS’s discussion form. See their thread about it. In short, Graphene’s mail server has a clean reputation and Mailbox has no problem with it. But because one of Graphene’s IPv6 webservers has an IPv6 allocation of a size Mailbox doesn’t like, they drop mail. It’s ridiculous, and Mailbox was dismissive of this well-known and well-regarded Open Source project. So if the likes of GrapheneOS can’t get good faith effort to deliver their mail, what chance does an individual like me have?

I’m sorry, but I’m literally paying you to deliver email for me and provide good support. If you can’t do either of those, you don’t get to push that problem down onto me. Hire appropriate staff.

On the technical side, they support aliases, my own clients, and have a reasonable privacy policy. Their 2FA support exists for the web interface (though weirdly not the support site), though it is somewhat weird. They do not support app passwords.

A somewhat unique feature is the @secure.mailbox.org domain. If you try to receive mail at that address, mailbox.org will block it unless it uses TLS. Same for sending. This isn’t E2EE, but it does at least require things not be in plaintext for the last hop to Mailbox.

Verdict: not recommended due to poor reliability and support.

Mailbox.Org summary:

  • Website: https://mailbox.org/en/
  • Reliability: iffy due to over-aggressive spam filtering
  • Support: Poor; takes 4-6 days for a reply and replies are unhelpful
  • Individual access passwords: No
  • 2FA: Yes, but with a PIN instead of a password as the other factor
  • Filtering: Full SIEVE feature set and GUI editor
  • Spam settings: greylisting on/off, reject some/all spam, etc. But they’re insufficient to address Mailbox’s overzealousness, which support says I cannot workaround within the interface.
  • Server storage location: Germany
  • Plan as reviewed: standard [pricing link]
    • Cost per year: EUR 30 (about $33)
    • Mail storage included: 10GB
    • Limits on send/receive volume: none
    • Aliases: 50 on your domain name, 25 on mailbox.org
    • Additional mailboxes: Available; each one at the same fee as the primary mailbox

Startmail

I really wanted to like Startmail. Its “vault” is an interesting idea and should contribute to the security and privacy of an account. They clearly care about privacy.

It falls down in filtering. They have no way to filter on envelope recipient (BCC or similar). Their support confirmed this to me and that’s a showstopper.

Startmail support was also as slow as Mailbox, taking 5 days to respond to me.

Two showstoppers right there.

Verdict: Not recommended due to slow support responsiveness and weak filtering.

Startmail summary:

  • Website: https://www.startmail.com/
  • Reliability: Seems to be fine
  • Support: Mediocre; Took 5 days for a reply, but the reply was helpful
  • Individual app access passwords: Yes
  • 2FA: Yes
  • Filtering: Poor; cannot filter on envelope recipient, and can’t build filters with multiple actions
  • Spam settings: None
  • Server storage location: The Netherlands
  • Plan as reviewed: Custom domain (trial was Personal), [pricing link]
    • Cost per year: $70
    • Mail storage included: 20GB
    • Limits on send/receive volume: none
    • Aliases: unlimited, with lots of features: can set expiration, etc.
    • Additional mailboxes: not available

Kolab

Kolab Now is mainly positioned as a full groupware service, but they do have a email-only option which I investigated. There isn’t much documentation about it compared to other providers, and also not much in the way of settings. You can turn greylisting on or off. And…. that’s it.

It has a full suite of filtering options. They set an X-Envelope-To header which you can use with the arbitrary header match to do the right thing even for BCC situations. Filters can have multiple conditions and multiple actions. It is SIEVE-based and you can download your SIEVE definitions.

If you enable 2FA, you disable IMAP and SMTP; not great.

Verdict: Not an impressive enough email featureset to justify going with it.

Kolab Now summary:

  • Website: https://kolabnow.com/
  • Reliability: Seems to be fine
  • Support: Fine responsiveness (next day)
  • Invidiaul app passwords: no
  • 2FA: Yes, but if you enable it, they disable IMAP and SMTP
  • Filtering: Excellent
  • Spam settings: Only greylisting on/off
  • Server storage location: Switzerland; they have lots of details on their setup
  • Plan as reviewed: “Just email” [pricing link]
    • Cost per year: CHF 60, about $66
    • Mail storage included: 5GB
    • Limitations on send/receive volume: None
    • Aliases: Yes. Not sure if there are limits.
    • Additional mailboxes: Yes if you set up a group account. “Flexible pricing based on user count” is not documented anywhere I could find.

Mailfence

Mailfence is another option, somewhat similar to Startmail but without the unique vault. I had some questions about filters, and support was quite responsive, responding in a couple of hours.

Some of their copy on their website is a bit misleading, but support clarified when I asked them. They do not offer encryption at rest (like most of the entries here).

Mailfence’s filtering system is the kind I’d like to see. It allows multiple conditions and multiple actions for each rule, and has some unique actions as well (notify by SMS or XMPP). Support says that “Recipients” matches envelope recipients. However, one ommission is that I can’t match on arbitrary headers; only the canned list of headers they provide.

They have only two spam settings:

  • spam filter on/off
  • whitelist

Given some recent complaints about their spam filter being overly aggressive, I find this lack of control somewhat concerning. (However, I discount complaints about people begging for more features in free accounts; free won’t provide the kind of service I’m looking for with any provider.) There are generally just very few settings for email as well.

Verdict: Response and helpful support, filtering has the right structure but lacks arbitrary header match. Could be a good option.

Mailfence summary:

  • Website: https://mailfence.com/
  • Reliability: Seems to be fine
  • Support: Excellent responsiveness and helpful replies (after some initial confusion about my question of greylisting)
  • Individual app access passwords: No. You can set a per-service password (eg, an IMAP password), but those will be shared with all devices speaking that protocol.
  • 2FA: Yes
  • Filtering: Good; only misses the ability to filter on arbitrary headers
  • Spam settings: Very few
  • Server storage location: Belgium
  • Plan as reviewed: Entry [pricing link]
    • Cost per year: $42
    • Mail storage included: 10GB, with a maximum of 50,000 messages
    • Limits on send/receive volume: none
    • Aliases: 50. Aliases can’t be deleted once created (there may be an exeption to this for aliases on your own domain rather than mailfence.com)
    • Additional mailboxes: Their page on this is a bit confusing, and the pricing page lacks the information promised. It looks like you can pay the same $42/year for additional mailboxes, with a limit of up to 2 additional paid mailboxes and 2 additional free mailboxes tied to the account.

Runbox

This one came recommended in a Mastodon thread. I had some questions about it, and support response was fantastic – I heard from two people that were co-founders of the company! Even within hours, on a weekend. Incredible! This kind of response was only surpassed by Migadu.

I initially wrote to Runbox with questions about the incoming and outgoing message limits, which I hadn’t seen elsewhere, as well as the bandwidth limit. They said the bandwidth limit is no longer enforced on paid accounts. The incoming and outgoing limits are enforced, and all email (even spam) counts towards the limit. Notably the outgoing limit is per recipient, so if you send 10 messages to your 50-recipient family group, that’s the limit. However, they also indicated a willingness to reset the limit if something happens. Unfortunately, hitting the limit results in a hard bounce (SMTP 5xx) rather than a temporary failure (SMTP 4xx) so it can result in lost mail. This means I’d be worried about some attack or other weirdness causing me to lose mail.

Their filter is a pain point. Here are the challenges:

  • You can’t directly match on a BCC recipient. Support advised to use a “headers” match, which will search for something anywhere in the headers. This works and is probably “good enough” since this data is in the Received: headers, but it is a little more imprecise.
  • They only have a “contains”, not an “equals” operator. So, for instance, a pattern searching for “test@example.com” would also match “newtest@example.com”. Support advised to put the email address in angle brackets to avoid this. That will work… mostly. Angle brackets aren’t always required in headers.
  • There is no way to have multiple actions on the filter (there is just no way to file an incoming message into two folders). This was the ultimate showstopper for me.

Support advised they are planning to upgrade the filter system in the future, but these are the limitations today.

Verdict: A good option if you don’t need much from the filtering system. Lots of privacy emphasis.

Runbox summary:

  • Website: https://runbox.com/
  • Reliability: Seems to be fine, except returning 5xx codes if per-day limits are exceeded
  • Support: Excellent responsiveness and replies from founders
  • Individual app passwords: Yes
  • 2FA: Yes
  • Filtering: Poor
  • Spam settings: Very few
  • Server storage location: Norway
  • Plan as reviewed: Mini [pricing link]
    • Cost per year: $35
    • Mail storage included: 10GB
    • Limited on send/receive volume: Receive 5000 messages/day, Send 500 recipients/day
    • Aliases: 100 on runbox.com; unlimited on your own domain
    • Additional mailboxes: $15/yr each, also with 10GB non-shared storage per mailbox

Fastmail

Fastmail came recommended to me by a friend I’ve known for decades.

Here’s the thing about Fastmail, compared to all the services listed above: It all just works. Everything. Filtering, spam prevention, it is all there, all feature-complete, and all just does the right thing as you’d hope. Their filtering system has a canned dropdown for “To/Cc/Bcc”, it supports multiple conditions and multiple actions, and just does the right thing. (Delivering to multiple folders is a little cumbersome but possible.) It has a particularly strong feature set around administering multiple accounts, including things like whether users can prevent admins from reading their mail.

The not-so-great part of the picture is around privacy. Fastmail is based in Australia, where the government has extensive power around spying on data, even to the point of forcing companies to add wiretap capabilities. Fastmail’s privacy policy states user data may be held in Australia, USA, India, and Netherlands. By default, they share data with unidentified “spam companies”, though you can disable this in settings. On the other hand, they do make a good effort towards privacy.

I contacted support with some questions and got back a helpful response in three hours. However, one of the questions was about in which countries my particular data would be stored, and the support response said they would have to get back to me on that. It’s been several days and no word back.

Verdict: A featureful option that “just works”, with a lot of features for managing family accounts and the like, but lacking in the privacy area.

Fastmail summary:

  • Website: https://www.fastmail.com/
  • Reliability: Seems to be fine
  • Support: Good response time on most questions; dropped the ball on one tha trequired research
  • Individual app access passwords: Yes
  • 2FA: Yes
  • Filtering: Excellent
  • Spam settings: Can set filter aggressiveness, decide whether to share spam data with “spam-fighting companies”, configure how to handle backscatter spam, and evaluate the personal learning filter.
  • Server storage locations: Australia, USA, India, and The Netherlands. Legal jurisdiction is Australia.
  • Plan as reviewed: Individual [pricing link]
    • Cost per year: $60
    • Mail storage included: 50GB
    • Limits on send/receive volume: 300/hour
    • Aliases: Unlimited from what I can see
    • Additional mailboxes: No; requires a different plan for that

Migadu

Migadu was a service I’d never heard of, but came recommended to me on Mastodon.

I listed Migadu last because it is a class of its own compared to all the other options. Every other service is basically a webmail interface with a few extra settings tacked on.

Migadu has a full-featured email admin console in addition. By that I mean you can:

  • View usage graphs (incoming, outgoing, storage) over time
  • Manage DNS (if you want Migadu to run your nameservers)
  • Manage multiple domains, and cross-domain relationships with mailboxes
  • View a limited set of logs
  • Configure accounts, reset their passwords if needed/authorized, etc.
  • Configure email address rewrite rules with wildcards and so forth

Basically, if you were the sort of person that ran your own mail servers back in the day, here is Migadu giving you most of that functionality. Effectively you have a web interface to do all the useful stuff, and they handle the boring and annoying bits. This is a really attractive model.

Migadu support has been fantastic. They are quick to respond, and went above and beyond. I pointed out that their X-Envelope-To header, which is needed for filtering by BCC, wasn’t being added on emails I sent myself. They replied 5 hours later indicating they had added the feature to add X-Envelope-To even for internal mails! Wow! I am impressed.

With Migadu, you buy a pool of resources: storage space and incoming/outgoing traffic. What you do within that pool is up to you. You can set up users (“mailboxes”), aliases, domains, whatever you like. It all just shares the pool. You can restrict users further so that an individual user has access to only a subset of the pool resources.

I was initially concerned about Migadu’s daily send/receive message count limits, but in visiting with support and reading the documentation, what really comes out is that Migadu is a service with a personal touch. Hitting the incoming traffic limit will cause a SMTP temporary fail (4xx) response so you won’t lose legit mail – and support will work with you if it’s a problem for legit uses. In other words, restrictions are “soft” and they are interpreted reasonably.

One interesting thing about Migadu is that they do not offer accounts under their domain. That is, you MUST bring your own domain. That’s pretty easy and cheap, of course. It also puts you in a position of power, because it is easy to migrate email from one provider to another if you own the domain.

Filtering is done via SIEVE. There is a GUI editor which lets you accomplish most things, though it has an odd blind spot where you can’t file a message into multiple folders. However, you can edit a SIEVE ruleset directly and you get the full SIEVE featureset, which is extensive (and does support filing a message into multiple folders). I note that the SIEVE :envelope match doesn’t work, but Migadu adds an X-Envelope-To header which is just as good.

I particularly love a company that tells you all the reasons you might not want to use them. Migadu’s pro/con list is an honest drawbacks list (of course, their homepage highlights all the features!).

Verdict: Fantastically powerful, excellent support, and good privacy. I chose this one.

Migadu summary:

  • Website: https://migadu.com/
  • Reliability: Excellent
  • Support: Fantastic. Good response times and they added a feature (or fixed a bug?) a few hours after I requested it.
  • Individual access passwords: Yes. Create “identities” to support them.
  • 2FA: Yes, on both the admin interface and the webmail interface
  • Filtering: Excellent, based on SIEVE. GUI editor doesn’t support multiple actions when filing into a folder, but full SIEVE functionality is exposed.
  • Spam settings:
    • On the domain level, filter aggressiveness, Greylisting on/off, black and white lists
    • On the mailbox level, filter aggressiveness, black and whitelists, action to take with spam; compatible with filters.
  • Server storage location: France; legal jurisdiction Switzerland
  • Plan as reviewed: mini [pricing link]
    • Cost per year: $90
    • Mail storage included: 30GB (“soft” quota)
    • Limits on send/receive volume: 1000 messgaes in/day, 100 messages out/day (“soft” quotas)
    • Aliases: Unlimited on an unlimited number of domains
    • Additional mailboxes: Unlimited and free; uses pooled quotas, but individual quotas can be set

Others

Here are a few others that I didn’t think worthy of getting a trial:

  • mxroute was recommended by several. Lots of concerning things in their policy, such as:
    • if you repeatedly send mail to unroutable recipients, they may publish the addresses on Github
    • they will terminate your account if they think you are “rude” or want to contest a charge
    • they reserve the right to cancel your service at any time for any (or no) reason.
  • Proton keeps coming up, and I will not consider it so long as I am locked into their client on mobile.
  • Skiff comes up sometimes, but they were acquired by Notion.
  • Disroot comes up; this discussion highlights a number of reasons why I avoid them. Their Terms of Service (ToS) is inconsistent with a general-purpose email account (I guess for targeting nonprofits and activists, that could make sense). Particularly laughable is that they claim to be friends of Open Source, but then would take down your account if you upload “copyrighted” material. News flash: in order for an Open Source license to be meaningful, the underlying work is copyrighted. It is perfectly legal to upload copyrighted material when you wrote it or have the license to do so!

Conclusions

There are a lot of good options for email hosting today, and in particular I appreciate the excellent personal support from companies like Migadu and Runbox. Support small businesses!

Try the Last Internet Kermit Server

$ grep kermit /etc/services
kermit          1649/tcp

What is this mysterious protocol? Who uses it and what is its story?

This story is a winding one, beginning in 1981. Kermit is, to the best of my knowledge, the oldest actively-maintained software package with an original developer still participating. It is also a scripting language, an Internet server, a (scriptable!) SSH client, and a file transfer protocol.

And my first use of it was talking to my HP-48GX calculator over a 9600bps serial link. Yes, that calculator had a Kermit server built in.

But let’s back up and talk about serial ports and Modems.

Serial Ports and Modems

In my piece The PC & Internet Revolution in Rural America, I recently talked about getting a modem – what an excitement it was to get one! I realize that many people today have never used a serial line or a modem, so let’s briefly discuss.

Before Ethernet and Wifi took off in a big way, in the 1990s-2000s, two computers would talk to each other over a serial line and a modem. By modern standards, these were slow; 300bps was a common early speed. They also (at least in the beginning) had no kind of error checking. Characters could be dropped or changed. Sometimes even those speeds were faster than the receiving device could handle. Some serial links were 7-bit, and wouldn’t even pass all 7-bit characters; for instance, sending a Ctrl-S could lock up a remote until you sent Ctrl-Q.

And computers back in the 1970s and 1980s weren’t as uniform as they are now. They used different character sets, different line endings, and even had different notions of what a file is. Today’s notion of a file as whatever set of binary bytes an application wants it to be was by no means universal; some systems treated a file as a set of fixed-length records, for instance.

So there were a lot of challenges in reliably moving files between systems. Kermit was introduced to reliably move files between systems using serial lines, automatically working around the varieties of serial lines, detecting errors and retransmitting, managing transmit speeds, and adapting between architectures as appropriate. Quite a task! And perhaps this explains why it was supported on a calculator with a primitive CPU by today’s standards.

Serial communication, by the way, is still commonplace, though now it isn’t prominent in everyone’s home PC setup. It’s used a lot in industrial equipment, avionics, embedded systems, and so forth.

The key point about serial lines is that they aren’t inherently multiplexed or packetized. Whereas an Ethernet network is designed to let many dozens of applications use it at once, a serial line typically runs only one (unless it is something like PPP, which is designed to do multiplexing over the serial line).

So it become useful to be able to both log in to a machine and transfer files with it. That is, incidentally, still useful today.

Kermit and XModem/ZModem

I wondered: why did we end up with two diverging sets of protocols, created at about the same time? The Kermit website has the answer: essentially, BBSs could assume 8-bit clean connections, so XModem and ZModem had much less complexity to worry about. Kermit, on the other hand, was highly flexible. Although ZModem came out a few years before Kermit had its performance optimizations, by about 1993 Kermit was on par or faster than ZModem.

Beyond serial ports

As LANs and the Internet came to be popular, people started to use telnet (and later ssh) to connect to remote systems, rather than serial lines and modems. FTP was an early way to transfer files across the Internet, but it had its challenges. Kermit added telnet support, as well as later support for ssh (as a wrapper around the ssh command you already know). Now you could easily log in to a machine and exchange files with it without missing a beat.

And so it was that the Internet Kermit Service Daemon (IKSD) came into existence. It allows a person to set up a Kermit server, which can authenticate against local accounts or present anonymous access akin to FTP.

And so I established the quux.org Kermit Server, which runs the Unix IKSD (part of the Debian ckermit package).

Trying Out the quux.org Kermit Server

There are more instructions on the quux.org Kermit Server page! You can connect to it using either telnet or the kermit program. I won’t duplicate all of the information here, but here’s what it looks like to connect:

$ kermit
C-Kermit 10.0 Beta.08, 15 Dec 2022, for Linux+SSL (64-bit)
 Copyright (C) 1985, 2022,
  Trustees of Columbia University in the City of New York.
  Open Source 3-clause BSD license since 2011.
Type ? or HELP for help.
(/tmp/t/) C-Kermit>iksd /user:anonymous kermit.quux.org
 DNS Lookup...  Trying 135.148.101.37...  Reverse DNS Lookup... (OK)
Connecting to host glockenspiel.complete.org:1649
 Escape character: Ctrl-\ (ASCII 28, FS): enabled
Type the escape character followed by C to get back,
or followed by ? to see other options.
----------------------------------------------------

 >>> Welcome to the Internet Kermit Service at kermit.quux.org <<<

To log in, use 'anonymous' as the username, and any non-empty password

Internet Kermit Service ready at Fri Aug  4 22:32:17 2023
C-Kermit 10.0 Beta.08, 15 Dec 2022
kermit

Enter e-mail address as Password: [redacted]

Anonymous login.

You are now connected to the quux kermit server.

Try commands like HELP, cd gopher, dir, and the like.  Use INTRO
for a nice introduction.

(~/) IKSD>

You can even recursively download the entire Kermit mirror: over 1GB of files!

Conclusions

So, have fun. Enjoy this experience from the 1980s.

And note that Kermit also makes a better ssh client than ssh in a lot of ways; see ideas on my Kermit page.

This page also has a permanent home on my website, where it may be periodically updated.

Recommendations for Tools for Backing Up and Archiving to Removable Media

I have several TB worth of family photos, videos, and other data. This needs to be backed up — and archived.

Backups and archives are often thought of as similar. And indeed, they may be done with the same tools at the same time. But the goals differ somewhat:

Backups are designed to recover from a disaster that you can fairly rapidly detect.

Archives are designed to survive for many years, protecting against disaster not only impacting the original equipment but also the original person that created them.

Reflecting on this, it implies that while a nice ZFS snapshot-based scheme that supports twice-hourly backups may be fantastic for that purpose, if you think about things like family members being able to access it if you are incapacitated, or accessibility in a few decades’ time, it becomes much less appealing for archives. ZFS doesn’t have the wide software support that NTFS, FAT, UDF, ISO-9660, etc. do.

This post isn’t about the pros and cons of the different storage media, nor is it about the pros and cons of cloud storage for archiving; these conversations can readily be found elsewhere. Let’s assume, for the point of conversation, that we are considering BD-R optical discs as well as external HDDs, both of which are too small to hold the entire backup set.

What would you use for archiving in these circumstances?

Establishing goals

The goals I have are:

  • Archives can be restored using Linux or Windows (even though I don’t use Windows, this requirement will ensure the broadest compatibility in the future)
  • The archival system must be able to accommodate periodic updates consisting of new files, deleted files, moved files, and modified files, without requiring a rewrite of the entire archive dataset
  • Archives can ideally be mounted on any common OS and the component files directly copied off
  • Redundancy must be possible. In the worst case, one could manually copy one drive/disc to another. Ideally, the archiving system would automatically track making n copies of data.
  • While a full restore may be a goal, simply finding one file or one directory may also be a goal. Ideally, an archiving system would be able to quickly tell me which discs/drives contain a given file.
  • Ideally, preserves as much POSIX metadata as possible (hard links, symlinks, modification date, permissions, etc). However, for the archiving case, this is less important than for the backup case, with the possible exception of modification date.
  • Must be easy enough to do, and sufficiently automatable, to allow frequent updates without error-prone or time-consuming manual hassle

I would welcome your ideas for what to use. Below, I’ll highlight different approaches I’ve looked into and how they stack up.

Basic copies of directories

The initial approach might be one of simply copying directories across. This would work well if the data set to be archived is smaller than the archival media. In that case, you could just burn or rsync a new copy with every update and be done. Unfortunately, this is much less convenient with data of the size I’m dealing with. rsync is unavailable in that case. With some datasets, you could manually design some rsyncs to store individual directories on individual devices, but that gets unwieldy fast and isn’t scalable.

You could use something like my datapacker program to split the data across multiple discs/drives efficiently. However, updates will be a problem; you’d have to re-burn the entire set to get a consistent copy, or rely on external tools like mtree to reflect deletions. Not very convenient in any case.

So I won’t be using this.

tar or zip

While you can split tar and zip files across multiple media, they have a lot of issues. GNU tar’s incremental mode is clunky and buggy; zip is even worse. tar files can’t be read randomly, making it extremely time-consuming to extract just certain files out of a tar file.

The only thing going for these formats (and especially zip) is the wide compatibility for restoration.

dar

Here we start to get into the more interesting tools. Dar is, in my opinion, one of the best Linux tools that few people know about. Since I first wrote about dar in 2008, it’s added some interesting new features; among them, binary deltas and cloud storage support. So, dar has quite a few interesting features that I make use of in other ways, and could also be quite helpful here:

  • Dar can both read and write files sequentially (streaming, like tar), or with random-access (quick seek to extract a subset without having to read the entire archive)
  • Dar can apply compression to individual files, rather than to the archive as a whole, faciliting both random access and resilience (corruption in one file doesn’t invalidate all subsequent files). Dar also supports numerous compression algorithms including gzip, bzip2, xz, lzo, etc., and can omit compressing already-compressed files.
  • The end of each dar file contains a central directory (dar calls this a catalog). The catalog contains everything necessary to extract individual files from the archive quickly, as well as everything necessary to make a future incremental archive based on this one. Additionally, dar can make and work with “isolated catalogs” — a file containing the catalog only, without data.
  • Dar can split the archive into multiple pieces called slices. This can best be done with fixed-size slices (–slice and –first-slice options), which let the catalog regord the slice number and preserves random access capabilities. With the –execute option, dar can easily wait for a given slice to be burned, etc.
  • Dar normally stores an entire new copy of a modified file, but can optionally store an rdiff binary delta instead. This has the potential to be far smaller (think of a case of modifying metadata for a photo, for instance).

Additionally, dar comes with a dar_manager program. dar_manager makes a database out of dar catalogs (or archives). This can then be used to identify the precise archive containing a particular version of a particular file.

All this combines to make a useful system for archiving. Isolated catalogs are tiny, and it would be easy enough to include the isolated catalogs for the entire set of archives that came before (or even the dar_manager database file) with each new incremental archive. This would make restoration of a particular subset easy.

The main thing to address with dar is that you do need dar to extract the archive. Every dar release comes with source code and a win64 build. dar also supports building a statically-linked Linux binary. It would therefore be easy to include win64 binary, Linux binary, and source with every archive run. dar is also a part of multiple Linux and BSD distributions, which are archived around the Internet. I think this provides a reasonable future-proofing to make sure dar archives will still be readable in the future.

The other challenge is user ability. While dar is highly portable, it is fundamentally a CLI tool and will require CLI abilities on the part of users. I suspect, though, that I could write up a few pages of instructions to include and make that a reasonably easy process. Not everyone can use a CLI, but I would expect a person that could follow those instructions could be readily-enough found.

One other benefit of dar is that it could easily be used with tapes. The LTO series is liked by various hobbyists, though it could pose formidable obstacles to non-hobbyists trying to aceess data in future decades. Additionally, since the archive is a big file, it lends itself to working with par2 to provide redundancy for certain amounts of data corruption.

git-annex

git-annex is an interesting program that is designed to facilitate managing large sets of data and moving it between repositories. git-annex has particular support for offline archive drives and tracks which drives contain which files.

The idea would be to store the data to be archived in a git-annex repository. Then git-annex commands could generate filesystem trees on the external drives (or trees to br burned to read-only media).

In a post about using git-annex for blu-ray backups, an earlier thread about DVD-Rs was mentioned.

This has a few interesting properties. For one, with due care, the files can be stored on archival media as regular files. There are some different options for how to generate the archives; some of them would place the entire git-annex metadata on each drive/disc. With that arrangement, one could access the individual files without git-annex. With git-annex, one could reconstruct the final (or any intermediate) state of the archive appropriately, handling deltions, renames, etc. You would also easily be able to know where copies of your files are.

The practice is somewhat more challenging. Hundreds of thousands of files — what I would consider a medium-sized archive — can pose some challenges, running into hours-long execution if used in conjunction with the directory special remote (but only minutes-long with a standard git-annex repo).

Ruling out the directory special remote, I had thought I could maybe just work with my files in git-annex directly. However, I ran into some challenges with that approach as well. I am uncomfortable with git-annex mucking about with hard links in my source data. While it does try to preserve timestamps in the source data, these are lost on the clones. I wrote up my best effort to work around all this.

In a forum post, the author of git-annex comments that “I don’t think that CDs/DVDs are a particularly good fit for git-annex, but it seems a couple of users have gotten something working.” The page he references is Managing a large number of files archived on many pieces of read-only medium. Some of that discussion is a bit dated (for instance, the directory special remote has the importtree feature that implements what was being asked for there), but has some interesting tips.

git-annex supplies win64 binaries, and git-annex is included with many distributions as well. So it should be nearly as accessible as dar in the future. Since git-annex would be required to restore a consistent recovery image, similar caveats as with dar apply; CLI experience would be needed, along with some written instructions.

Bacula and BareOS

Although primarily tape-based archivers, these do also also nominally support drives and optical media. However, they are much more tailored as backup tools, especially with the ability to pull from multiple machines. They require a database and extensive configuration, making them a poor fit for both the creation and future extractability of this project.

Conclusions

I’m going to spend some more time with dar and git-annex, testing them out, and hope to write some future posts about my experiences.