Category Archives: Linux

The PC & Internet Revolution in Rural America

Inspired by several others (such as Alex Schroeder’s post and Szczeżuja’s prompt), as well as a desire to get this down for my kids, I figure it’s time to write a bit about living through the PC and Internet revolution where I did: outside a tiny town in rural Kansas. And, as I’ve been back in that same area for the past 15 years, I reflect some on the challenges that continue to play out.

Although the stories from the others were primarily about getting online, I want to start by setting some background. Those of you that didn’t grow up in the same era as I did probably never realized that a typical business PC setup might cost $10,000 in today’s dollars, for instance. So let me start with the background.

Nothing was easy

This story begins in the 1980s. Somewhere around my Kindergarten year of school, around 1985, my parents bought a TRS-80 Color Computer 2 (aka CoCo II). It had 64K of RAM and used a TV for display and sound.

This got you the computer. It didn’t get you any disk drive or anything, no joysticks (required by a number of games). So whenever the system powered down, or it hung and you had to power cycle it – a frequent event – you’d lose whatever you were doing and would have to re-enter the program, literally by typing it in.

The floppy drive for the CoCo II cost more than the computer, and it was quite common for people to buy the computer first and then the floppy drive later when they’d saved up the money for that.

I particularly want to mention that computers then didn’t come with a modem. What would be like buying a laptop or a tablet without wifi today. A modem, which I’ll talk about in a bit, was another expensive accessory. To cobble together a system in the 80s that was capable of talking to others – with persistent storage (floppy, or hard drive), screen, keyboard, and modem – would be quite expensive. Adjusted for inflation, if you’re talking a PC-style device (a clone of the IBM PC that ran DOS), this would easily be more expensive than the Macbook Pros of today.

Few people back in the 80s had a computer at home. And the portion of those that had even the capability to get online in a meaningful way was even smaller.

Eventually my parents bought a PC clone with 640K RAM and dual floppy drives. This was primarily used for my mom’s work, but I did my best to take it over whenever possible. It ran DOS and, despite its monochrome screen, was generally a more capable machine than the CoCo II. For instance, it supported lowercase. (I’m not even kidding; the CoCo II pretty much didn’t.) A while later, they purchased a 32MB hard drive for it – what luxury!

Just getting a machine to work wasn’t easy. Say you’d bought a PC, and then bought a hard drive, and a modem. You didn’t just plug in the hard drive and it would work. You would have to fight it every step of the way. The BIOS and DOS partition tables of the day used a cylinder/head/sector method of addressing the drive, and various parts of that those addresses had too few bits to work with the “big” drives of the day above 20MB. So you would have to lie to the BIOS and fdisk in various ways, and sort of work out how to do it for each drive. For each peripheral – serial port, sound card (in later years), etc., you’d have to set jumpers for DMA and IRQs, hoping not to conflict with anything already in the system. Perhaps you can now start to see why USB and PCI were so welcomed.

Sharing and finding resources

Despite the two computers in our home, it wasn’t as if software written on one machine just ran on another. A lot of software for PC clones assumed a CGA color display. The monochrome HGC in our PC wasn’t particularly compatible. You could find a TSR program to emulate the CGA on the HGC, but it wasn’t particularly stable, and there’s only so much you can do when a program that assumes color displays on a monitor that can only show black, dark amber, or light amber.

So I’d periodically get to use other computers – most commonly at an office in the evening when it wasn’t being used.

There were some local computer clubs that my dad took me to periodically. Software was swapped back then; disks copied, shareware exchanged, and so forth. For me, at least, there was no “online” to download software from, and selling software over the Internet wasn’t a thing at all.

Three Different Worlds

There were sort of three different worlds of computing experience in the 80s:

  1. Home users. Initially using a wide variety of software from Apple, Commodore, Tandy/RadioShack, etc., but eventually coming to be mostly dominated by IBM PC clones
  2. Small and mid-sized business users. Some of them had larger minicomputers or small mainframes, but most that I had contact with by the early 90s were standardized on DOS-based PCs. More advanced ones had a network running Netware, most commonly. Networking hardware and software was generally too expensive for home users to use in the early days.
  3. Universities and large institutions. These are the places that had the mainframes, the earliest implementations of TCP/IP, the earliest users of UUCP, and so forth.

The difference between the home computing experience and the large institution experience were vast. Not only in terms of dollars – the large institution hardware could easily cost anywhere from tens of thousands to millions of dollars – but also in terms of sheer resources required (large rooms, enormous power circuits, support staff, etc). Nothing was in common between them; not operating systems, not software, not experience. I was never much aware of the third category until the differences started to collapse in the mid-90s, and even then I only was exposed to it once the collapse was well underway.

You might say to me, “Well, Google certainly isn’t running what I’m running at home!” And, yes of course, it’s different. But fundamentally, most large datacenters are running on x86_64 hardware, with Linux as the operating system, and a TCP/IP network. It’s a different scale, obviously, but at a fundamental level, the hardware and operating system stack are pretty similar to what you can readily run at home. Back in the 80s and 90s, this wasn’t the case. TCP/IP wasn’t even available for DOS or Windows until much later, and when it was, it was a clunky beast that was difficult.

One of the things Kevin Driscoll highlights in his book called Modem World – see my short post about it – is that the history of the Internet we usually receive is focused on case 3: the large institutions. In reality, the Internet was and is literally a network of networks. Gateways to and from Internet existed from all three kinds of users for years, and while TCP/IP ultimately won the battle of the internetworking protocol, the other two streams of users also shaped the Internet as we now know it. Like many, I had no access to the large institution networks, but as I’ve been reflecting on my experiences, I’ve found a new appreciation for the way that those of us that grew up with primarily home PCs shaped the evolution of today’s online world also.

An Era of Scarcity

I should take a moment to comment about the cost of software back then. A newspaper article from 1985 comments that WordPerfect, then the most powerful word processing program, sold for $495 (or $219 if you could score a mail order discount). That’s $1360/$600 in 2022 money. Other popular software, such as Lotus 1-2-3, was up there as well. If you were to buy a new PC clone in the mid to late 80s, it would often cost $2000 in 1980s dollars. Now add a printer – a low-end dot matrix for $300 or a laser for $1500 or even more. A modem: another $300. So the basic system would be $3600, or $9900 in 2022 dollars. If you wanted a nice printer, you’re now pushing well over $10,000 in 2022 dollars.

You start to see one barrier here, and also why things like shareware and piracy – if it was indeed even recognized as such – were common in those days.

So you can see, from a home computer setup (TRS-80, Commodore C64, Apple ][, etc) to a business-class PC setup was an order of magnitude increase in cost. From there to the high-end minis/mainframes was another order of magnitude (at least!) increase. Eventually there was price pressure on the higher end and things all got better, which is probably why the non-DOS PCs lasted until the early 90s.

Increasing Capabilities

My first exposure to computers in school was in the 4th grade, when I would have been about 9. There was a single Apple ][ machine in that room. I primarily remember playing Oregon Trail on it. The next year, the school added a computer lab. Remember, this is a small rural area, so each graduating class might have about 25 people in it; this lab was shared by everyone in the K-8 building. It was full of some flavor of IBM PS/2 machines running DOS and Netware. There was a dedicated computer teacher too, though I think she was a regular teacher that was given somewhat minimal training on computers. We were going to learn typing that year, but I did so well on the very first typing program that we soon worked out that I could do programming instead. I started going to school early – these machines were far more powerful than the XT at home – and worked on programming projects there.

Eventually my parents bought me a Gateway 486SX/25 with a VGA monitor and hard drive. Wow! This was a whole different world. It may have come with Windows 3.0 or 3.1 on it, but I mainly remember running OS/2 on that machine. More on that below.

Programming

That CoCo II came with a BASIC interpreter in ROM. It came with a large manual, which served as a BASIC tutorial as well. The BASIC interpreter was also the shell, so literally you could not use the computer without at least a bit of BASIC.

Once I had access to a DOS machine, it also had a basic interpreter: GW-BASIC. There was a fair bit of software written in BASIC at the time, but most of the more advanced software wasn’t. I wondered how these .EXE and .COM programs were written. I could find vague references to DEBUG.EXE, assemblers, and such. But it wasn’t until I got a copy of Turbo Pascal that I was able to do that sort of thing myself. Eventually I got Borland C++ and taught myself C as well. A few years later, I wanted to try writing GUI programs for Windows, and bought Watcom C++ – much cheaper than the competition, and it could target Windows, DOS (and I think even OS/2).

Notice that, aside from BASIC, none of this was free, and none of it was bundled. You couldn’t just download a C compiler, or Python interpreter, or whatnot back then. You had to pay for the ability to write any kind of serious code on the computer you already owned.

The Microsoft Domination

Microsoft came to dominate the PC landscape, and then even the computing landscape as a whole. IBM very quickly lost control over the hardware side of PCs as Compaq and others made clones, but Microsoft has managed – in varying degrees even to this day – to keep a stranglehold on the software, and especially the operating system, side. Yes, there was occasional talk of things like DR-DOS, but by and large the dominant platform came to be the PC, and if you had a PC, you ran DOS (and later Windows) from Microsoft.

For awhile, it looked like IBM was going to challenge Microsoft on the operating system front; they had OS/2, and when I switched to it sometime around the version 2.1 era in 1993, it was unquestionably more advanced technically than the consumer-grade Windows from Microsoft at the time. It had Internet support baked in, could run most DOS and Windows programs, and had introduced a replacement for the by-then terrible FAT filesystem: HPFS, in 1988. Microsoft wouldn’t introduce a better filesystem for its consumer operating systems until Windows XP in 2001, 13 years later. But more on that story later.

Free Software, Shareware, and Commercial Software

I’ve covered the high cost of software already. Obviously $500 software wasn’t going to sell in the home market. So what did we have?

Mainly, these things:

  1. Public domain software. It was free to use, and if implemented in BASIC, probably had source code with it too.
  2. Shareware
  3. Commercial software (some of it from small publishers was a lot cheaper than $500)

Let’s talk about shareware. The idea with shareware was that a company would release a useful program, sometimes limited. You were encouraged to “register”, or pay for, it if you liked it and used it. And, regardless of whether you registered it or not, were told “please copy!” Sometimes shareware was fully functional, and registering it got you nothing more than printed manuals and an easy conscience (guilt trips for not registering weren’t necessarily very subtle). Sometimes unregistered shareware would have a “nag screen” – a delay of a few seconds while they told you to register. Sometimes they’d be limited in some way; you’d get more features if you registered. With games, it was popular to have a trilogy, and release the first episode – inevitably ending with a cliffhanger – as shareware, and the subsequent episodes would require registration. In any event, a lot of software people used in the 80s and 90s was shareware. Also pirated commercial software, though in the earlier days of computing, I think some people didn’t even know the difference.

Notice what’s missing: Free Software / FLOSS in the Richard Stallman sense of the word. Stallman lived in the big institution world – after all, he worked at MIT – and what he was doing with the Free Software Foundation and GNU project beginning in 1983 never really filtered into the DOS/Windows world at the time. I had no awareness of it even existing until into the 90s, when I first started getting some hints of it as a port of gcc became available for OS/2. The Internet was what really brought this home, but I’m getting ahead of myself.

I want to say again: FLOSS never really entered the DOS and Windows 3.x ecosystems. You’d see it make a few inroads here and there in later versions of Windows, and moreso now that Microsoft has been sort of forced to accept it, but still, reflect on its legacy. What is the software market like in Windows compared to Linux, even today?

Now it is, finally, time to talk about connectivity!

Getting On-Line

What does it even mean to get on line? Certainly not connecting to a wifi access point. The answer is, unsurprisingly, complex. But for everyone except the large institutional users, it begins with a telephone.

The telephone system

By the 80s, there was one communication network that already reached into nearly every home in America: the phone system. Virtually every household (note I don’t say every person) was uniquely identified by a 10-digit phone number. You could, at least in theory, call up virtually any other phone in the country and be connected in less than a minute.

But I’ve got to talk about cost. The way things worked in the USA, you paid a monthly fee for a phone line. Included in that monthly fee was unlimited “local” calling. What is a local call? That was an extremely complex question. Generally it meant, roughly, calling within your city. But of course, as you deal with things like suburbs and cities growing into each other (eg, the Dallas-Ft. Worth metroplex), things got complicated fast. But let’s just say for simplicity you could call others in your city.

What about calling people not in your city? That was “long distance”, and you paid – often hugely – by the minute for it. Long distance rates were difficult to figure out, but were generally most expensive during business hours and cheapest at night or on weekends. Prices eventually started to come down when competition was introduced for long distance carriers, but even then you often were stuck with a single carrier for long distance calls outside your city but within your state. Anyhow, let’s just leave it at this: local calls were virtually free, and long distance calls were extremely expensive.

Getting a modem

I remember getting a modem that ran at either 1200bps or 2400bps. Either way, quite slow; you could often read even plain text faster than the modem could display it. But what was a modem?

A modem hooked up to a computer with a serial cable, and to the phone system. By the time I got one, modems could automatically dial and answer. You would send a command like ATDT5551212 and it would dial 555-1212. Modems had speakers, because often things wouldn’t work right, and the telephone system was oriented around speech, so you could hear what was happening. You’d hear it wait for dial tone, then dial, then – hopefully – the remote end would ring, a modem there would answer, you’d hear the screeching of a handshake, and eventually your terminal would say CONNECT 2400. Now your computer was bridged to the other; anything going out your serial port was encoded as sound by your modem and decoded at the other end, and vice-versa.

But what, exactly, was “the other end?”

It might have been another person at their computer. Turn on local echo, and you can see what they did. Maybe you’d send files to each other. But in my case, the answer was different: PC Magazine.

PC Magazine and CompuServe

Starting around 1986 (so I would have been about 6 years old), I got to read PC Magazine. My dad would bring copies that were being discarded at his office home for me to read, and I think eventually bought me a subscription directly. This was not just a standard magazine; it ran something like 350-400 pages an issue, and came out every other week. This thing was a monster. It had reviews of hardware and software, descriptions of upcoming technologies, pages and pages of ads (that often had some degree of being informative to them). And they had sections on programming. Many issues would talk about BASIC or Pascal programming, and there’d be a utility in most issues. What do I mean by a “utility in most issues”? Did they include a floppy disk with software?

No, of course not. There was a literal program listing printed in the magazine. If you wanted the utility, you had to type it in. And a lot of them were written in assembler, so you had to have an assembler. An assembler, of course, was not free and I didn’t have one. Or maybe they wrote it in Microsoft C, and I had Borland C, and (of course) they weren’t compatible. Sometimes they would list the program sort of in binary: line after line of a BASIC program, with lines like “64, 193, 253, 0, 53, 0, 87” that you would type in for hours, hopefully correctly. Running the BASIC program would, if you got it correct, emit a .COM file that you could then run. They did have a rudimentary checksum system built in, but it wasn’t even a CRC, so something like swapping two numbers you’d never notice except when the program would mysteriously hang.

Eventually they teamed up with CompuServe to offer a limited slice of CompuServe for the purpose of downloading PC Magazine utilities. This was called PC MagNet. I am foggy on the details, but I believe that for a time you could connect to the limited PC MagNet part of CompuServe “for free” (after the cost of the long-distance call, that is) rather than paying for CompuServe itself (because, OF COURSE, that also charged you per the minute.) So in the early days, I would get special permission from my parents to place a long distance call, and after some nerve-wracking minutes in which we were aware every minute was racking up charges, I could navigate the menus, download what I wanted, and log off immediately.

I still, incidentally, mourn what PC Magazine became. As with computing generally, it followed the mass market. It lost its deep technical chops, cut its programming columns, stopped talking about things like how SCSI worked, and so forth. By the time it stopped printing in 2009, it was no longer a square-bound 400-page beheamoth, but rather looked more like a copy of Newsweek, but with less depth.

Continuing with CompuServe

CompuServe was a much larger service than just PC MagNet. Eventually, our family got a subscription. It was still an expensive and scarce resource; I’d call it only after hours when the long-distance rates were cheapest. Everyone had a numerical username separated by commas; mine was 71510,1421. CompuServe had forums, and files. Eventually I would use TapCIS to queue up things I wanted to do offline, to minimize phone usage online.

CompuServe eventually added a gateway to the Internet. For the sum of somewhere around $1 a message, you could send or receive an email from someone with an Internet email address! I remember the thrill of one time, as a kid of probably 11 years, sending a message to one of the editors of PC Magazine and getting a kind, if brief, reply back!

But inevitably I had…

The Godzilla Phone Bill

Yes, one month I became lax in tracking my time online. I ran up my parents’ phone bill. I don’t remember how high, but I remember it was hundreds of dollars, a hefty sum at the time. As I watched Jason Scott’s BBS Documentary, I realized how common an experience this was. I think this was the end of CompuServe for me for awhile.

Toll-Free Numbers

I lived near a town with a population of 500. Not even IN town, but near town. The calling area included another town with a population of maybe 1500, so all told, there were maybe 2000 people total I could talk to with a local call – though far fewer numbers, because remember, telephones were allocated by the household. There was, as far as I know, zero modems that were a local call (aside from one that belonged to a friend I met in around 1992). So basically everything was long-distance.

But there was a special feature of the telephone network: toll-free numbers. Normally when calling long-distance, you, the caller, paid the bill. But with a toll-free number, beginning with 1-800, the recipient paid the bill. These numbers almost inevitably belonged to corporations that wanted to make it easy for people to call. Sales and ordering lines, for instance. Some of these companies started to set up modems on toll-free numbers. There were few of these, but they existed, so of course I had to try them!

One of them was a company called PennyWise that sold office supplies. They had a toll-free line you could call with a modem to order stuff. Yes, online ordering before the web! I loved office supplies. And, because I lived far from a big city, if the local K-Mart didn’t have it, I probably couldn’t get it. Of course, the interface was entirely text, but you could search for products and place orders with the modem. I had loads of fun exploring the system, and actually ordered things from them – and probably actually saved money doing so. With the first order they shipped a monster full-color catalog. That thing must have been 500 pages, like the Sears catalogs of the day. Every item had a part number, which streamlined ordering through the modem.

Inbound FAXes

By the 90s, a number of modems became able to send and receive FAXes as well. For those that don’t know, a FAX machine was essentially a special modem. It would scan a page and digitally transmit it over the phone system, where it would – at least in the early days – be printed out in real time (because the machines didn’t have the memory to store an entire page as an image). Eventually, PC modems integrated FAX capabilities.

There still wasn’t anything useful I could do locally, but there were ways I could get other companies to FAX something to me. I remember two of them.

One was for US Robotics. They had an “on demand” FAX system. You’d call up a toll-free number, which was an automated IVR system. You could navigate through it and select various documents of interest to you: spec sheets and the like. You’d key in your FAX number, hang up, and US Robotics would call YOU and FAX you the documents you wanted. Yes! I was talking to a computer (of a sorts) at no cost to me!

The New York Times also ran a service for awhile called TimesFax. Every day, they would FAX out a page or two of summaries of the day’s top stories. This was pretty cool in an era in which I had no other way to access anything from the New York Times. I managed to sign up for TimesFax – I have no idea how, anymore – and for awhile I would get a daily FAX of their top stories. When my family got its first laser printer, I could them even print these FAXes complete with the gothic New York Times masthead. Wow! (OK, so technically I could print it on a dot-matrix printer also, but graphics on a 9-pin dot matrix is a kind of pain that is a whole other article.)

My own phone line

Remember how I discussed that phone lines were allocated per household? This was a problem for a lot of reasons:

  1. Anybody that tried to call my family while I was using my modem would get a busy signal (unable to complete the call)
  2. If anybody in the house picked up the phone while I was using it, that would degrade the quality of the ongoing call and either mess up or disconnect the call in progress. In many cases, that could cancel a file transfer (which wasn’t necessarily easy or possible to resume), prompting howls of annoyance from me.
  3. Generally we all had to work around each other

So eventually I found various small jobs and used the money I made to pay for my own phone line and my own long distance costs. Eventually I upgraded to a 28.8Kbps US Robotics Courier modem even! Yes, you heard it right: I got a job and a bank account so I could have a phone line and a faster modem. Uh, isn’t that why every teenager gets a job?

Now my local friend and I could call each other freely – at least on my end (I can’t remember if he had his own phone line too). We could exchange files using HS/Link, which had the added benefit of allowing split-screen chat even while a file transfer is in progress. I’m sure we spent hours chatting to each other keyboard-to-keyboard while sharing files with each other.

Technology in Schools

By this point in the story, we’re in the late 80s and early 90s. I’m still using PC-style OSs at home; OS/2 in the later years of this period, DOS or maybe a bit of Windows in the earlier years. I mentioned that they let me work on programming at school starting in 5th grade. It was soon apparent that I knew more about computers than anybody on staff, and I started getting pulled out of class to help teachers or administrators with vexing school problems. This continued until I graduated from high school, incidentally – often to my enjoyment, and the annoyance of one particular teacher who, I must say, I was fine with annoying in this way.

That’s not to say that there was institutional support for what I was doing. It was, after all, a small school. Larger schools might have introduced BASIC or maybe Logo in high school. But I had already taught myself BASIC, Pascal, and C by the time I was somewhere around 12 years old. So I wouldn’t have had any use for that anyhow.

There were programming contests occasionally held in the area. Schools would send teams. My school didn’t really “send” anybody, but I went as an individual. One of them was run by a local college (but for jr. high or high school students. Years later, I met one of the professors that ran it. He remembered me, and that day, better than I did. The programming contest had problems one could solve in BASIC or Logo. I knew nothing about what to expect going into it, but I had lugged my computer and screen along, and asked him, “Can I write my solutions in C?” He was, apparently, stunned, but said sure, go for it. I took first place that day, leading to some rather confused teams from much larger schools.

The Netware network that the school had was, as these generally were, itself isolated. There was no link to the Internet or anything like it. Several schools across three local counties eventually invested in a fiber-optic network linking them together. This built a larger, but still closed, network. Its primary purpose was to allow students to be exposed to a wider variety of classes at high schools. Participating schools had an “ITV room”, outfitted with cameras and mics. So students at any school could take classes offered over ITV at other schools. For instance, only my school taught German classes, so people at any of those participating schools could take German. It was an early “Zoom room.” But alongside the TV signal, there was enough bandwidth to run some Netware frames. By about 1995 or so, this let one of the schools purchase some CD-ROM software that was made available on a file server and could be accessed by any participating school. Nice! But Netware was mainly about file and printer sharing; there wasn’t even a facility like email, at least not on our deployment.

BBSs

My last hop before the Internet was the BBS. A BBS was a computer program, usually ran by a hobbyist like me, on a computer with a modem connected. Callers would call it up, and they’d interact with the BBS. Most BBSs had discussion groups like forums and file areas. Some also had games. I, of course, continued to have that most vexing of problems: they were all long-distance.

There were some ways to help with that, chiefly QWK and BlueWave. These, somewhat like TapCIS in the CompuServe days, let me download new message posts for reading offline, and queue up my own messages to send later. QWK and BlueWave didn’t help with file downloading, though.

BBSs get networked

BBSs were an interesting thing. You’d call up one, and inevitably somewhere in the file area would be a BBS list. Download the BBS list and you’ve suddenly got a list of phone numbers to try calling. All of them were long distance, of course. You’d try calling them at random and have a success rate of maybe 20%. The other 80% would be defunct; you might get the dreaded “this number is no longer in service” or the even more dreaded angry human answering the phone (and of course a modem can’t talk to a human, so they’d just get silence for probably the nth time that week). The phone company cared nothing about BBSs and recycled their numbers just as fast as any others.

To talk to various people, or participate in certain discussion groups, you’d have to call specific BBSs. That’s annoying enough in the general case, but even more so for someone paying long distance for it all, because it takes a few minutes to establish a connection to a BBS: handshaking, logging in, menu navigation, etc.

But BBSs started talking to each other. The earliest successful such effort was FidoNet, and for the duration of the BBS era, it remained by far the largest. FidoNet was analogous to the UUCP that the institutional users had, but ran on the much cheaper PC hardware. Basically, BBSs that participated in FidoNet would relay email, forum posts, and files between themselves overnight. Eventually, as with UUCP, by hopping through this network, messages could reach around the globe, and forums could have worldwide participation – asynchronously, long before they could link to each other directly via the Internet. It was almost entirely volunteer-run.

Running my own BBS

At age 13, I eventually chose to set up my own BBS. It ran on my single phone line, so of course when I was dialing up something else, nobody could dial up me. Not that this was a huge problem; in my town of 500, I probably had a good 1 or 2 regular callers in the beginning.

In the PC era, there was a big difference between a server and a client. Server-class software was expensive and rare. Maybe in later years you had an email client, but an email server would be completely unavailable to you as a home user. But with a BBS, I could effectively run a server. I even ran serial lines in our house so that the BBS could be connected from other rooms! Since I was running OS/2, the BBS didn’t tie up the computer; I could continue using it for other things.

FidoNet had an Internet email gateway. This one, unlike CompuServe’s, was free. Once I had a BBS on FidoNet, you could reach me from the Internet using the FidoNet address. This didn’t support attachments, but then email of the day didn’t really, either.

Various others outside Kansas ran FidoNet distribution points. I believe one of them was mgmtsys; my memory is quite vague, but I think they offered a direct gateway and I would call them to pick up Internet mail via FidoNet protocols, but I’m not at all certain of this.

Pros and Cons of the Non-Microsoft World

As mentioned, Microsoft was and is the dominant operating system vendor for PCs. But I left that world in 1993, and here, nearly 30 years later, have never really returned. I got an operating system with more technical capabilities than the DOS and Windows of the day, but the tradeoff was a much smaller software ecosystem. OS/2 could run DOS programs, but it ran OS/2 programs a lot better. So if I were to run a BBS, I wanted one that had a native OS/2 version – limiting me to a small fraction of available BBS server software. On the other hand, as a fully 32-bit operating system, there started to be OS/2 ports of certain software with a Unix heritage; most notably for me at the time, gcc. At some point, I eventually came across the RMS essays and started to be hooked.

Internet: The Hunt Begins

I certainly was aware that the Internet was out there and interesting. But the first problem was: how the heck do I get connected to the Internet?

ISPs weren’t really a thing; the first one in my area (though still a long-distance call) started in, I think, 1994. One service that one of my teachers got me hooked up with was Learning Link. Learning Link was a nationwide collaboration of PBS stations and schools, designed to build on the educational mission of PBS. The nearest Learning Link station was more than a 3-hour drive away… but critically, they had a toll-free access number, and my teacher convinced them to let me use it. I connected via a terminal program and a modem, like with most other things. I don’t remember much about it, but I do remember a very important thing it had: Gopher. That was my first experience with Gopher.

Learning Link was hosted by a Unix derivative (Xenix), but it didn’t exactly give everyone a shell. I seem to recall it didn’t have open FTP access either. The Gopher client had FTP access at some point; I don’t recall for sure if it did then. If it did, then when a Gopher server referred to an FTP server, I could get to it. (I am unclear at this point if I could key in an arbitrary FTP location, or knew how, at that time.) I also had email access there, but I don’t recall exactly how; probably Pine. If that’s correct, that would have dated my Learning Link access as no earlier than 1992.

I think my access time to Learning Link was limited. And, since the only way to get out on the Internet from there was Gopher and Pine, I was somewhat limited in terms of technology as well. I believe that telnet services, for instance, weren’t available to me.

Computer labs

There was one place that tended to have Internet access: colleges and universities. In 7th grade, I participated in a program that resulted in me being invited to visit Duke University, and in 8th grade, I participated in National History Day, resulting in a trip to visit the University of Maryland. I probably sought out computer labs at both of those. My most distinct memory was finding my way into a computer lab at one of those universities, and it was full of NeXT workstations. I had never seen or used NeXT before, and had no idea how to operate it. I had brought a box of floppy disks, unaware that the DOS disks probably weren’t compatible with NeXT.

Closer to home, a small college had a computer lab that I could also visit. I would go there in summer or when it wasn’t used with my stack of floppies. I remember downloading disk images of FLOSS operating systems: FreeBSD, Slackware, or Debian, at the time. The hash marks from the DOS-based FTP client would creep across the screen as the 1.44MB disk images would slowly download. telnet was also available on those machines, so I could telnet to things like public-access Archie servers and libraries – though not Gopher. Still, FTP and telnet access opened up a lot, and I learned quite a bit in those years.

Continuing the Journey

At some point, I got a copy of the Whole Internet User’s Guide and Catalog, published in 1994. I still have it. If it hadn’t already figured it out by then, I certainly became aware from it that Unix was the dominant operating system on the Internet. The examples in Whole Internet covered FTP, telnet, gopher – all assuming the user somehow got to a Unix prompt. The web was introduced about 300 pages in; clearly viewed as something that wasn’t page 1 material. And it covered the command-line www client before introducing the graphical Mosaic. Even then, though, the book highlighted Mosaic’s utility as a front-end for Gopher and FTP, and even the ability to launch telnet sessions by clicking on links. But having a copy of the book didn’t equate to having any way to run Mosaic. The machines in the computer lab I mentioned above all ran DOS and were incapable of running a graphical browser. I had no SLIP or PPP (both ways to run Internet traffic over a modem) connectivity at home. In short, the Web was something for the large institutional users at the time.

CD-ROMs

As CD-ROMs came out, with their huge (for the day) 650MB capacity, various companies started collecting software that could be downloaded on the Internet and selling it on CD-ROM. The two most popular ones were Walnut Creek CD-ROM and Infomagic. One could buy extensive Shareware and gaming collections, and then even entire Linux and BSD distributions. Although not exactly an Internet service per se, it was a way of bringing what may ordinarily only be accessible to institutional users into the home computer realm.

Free Software Jumps In

As I mentioned, by the mid 90s, I had come across RMS’s writings about free software – most probably his 1992 essay Why Software Should Be Free. (Please note, this is not a commentary on the more recently-revealed issues surrounding RMS, but rather his writings and work as I encountered them in the 90s.) The notion of a Free operating system – not just in cost but in openness – was incredibly appealing. Not only could I tinker with it to a much greater extent due to having source for everything, but it included so much software that I’d otherwise have to pay for. Compilers! Interpreters! Editors! Terminal emulators! And, especially, server software of all sorts. There’d be no way I could afford or run Netware, but with a Free Unixy operating system, I could do all that. My interest was obviously piqued. Add to that the fact that I could actually participate and contribute – I was about to become hooked on something that I’ve stayed hooked on for decades.

But then the question was: which Free operating system? Eventually I chose FreeBSD to begin with; that would have been sometime in 1995. I don’t recall the exact reasons for that. I remember downloading Slackware install floppies, and probably the fact that Debian wasn’t yet at 1.0 scared me off for a time. FreeBSD’s fantastic Handbook – far better than anything I could find for Linux at the time – was no doubt also a factor.

The de Raadt Factor

Why not NetBSD or OpenBSD? The short answer is Theo de Raadt. Somewhere in this time, when I was somewhere between 14 and 16 years old, I asked some questions comparing NetBSD to the other two free BSDs. This was on a NetBSD mailing list, but for some reason Theo saw it and got a flame war going, which CC’d me. Now keep in mind that even if NetBSD had a web presence at the time, it would have been minimal, and I would have – not all that unusually for the time – had no way to access it. I was certainly not aware of the, shall we say, acrimony between Theo and NetBSD. While I had certainly seen an online flamewar before, this took on a different and more disturbing tone; months later, Theo randomly emailed me under the subject “SLIME” saying that I was, well, “SLIME”. I seem to recall periodic emails from him thereafter reminding me that he hates me and that he had blocked me. (Disclaimer: I have poor email archives from this period, so the full details are lost to me, but I believe I am accurately conveying these events from over 25 years ago)

This was a surprise, and an unpleasant one. I was trying to learn, and while it is possible I didn’t understand some aspect or other of netiquette (or Theo’s personal hatred of NetBSD) at the time, still that is not a reason to flame a 16-year-old (though he would have had no way to know my age). This didn’t leave any kind of scar, but did leave a lasting impression; to this day, I am particularly concerned with how FLOSS projects handle poisonous people. Debian, for instance, has come a long way in this over the years, and even Linus Torvalds has turned over a new leaf. I don’t know if Theo has.

In any case, I didn’t use NetBSD then. I did try it periodically in the years since, but never found it compelling enough to justify a large switch from Debian. I never tried OpenBSD for various reasons, but one of them was that I didn’t want to join a community that tolerates behavior such as Theo’s from its leader.

Moving to FreeBSD

Moving from OS/2 to FreeBSD was final. That is, I didn’t have enough hard drive space to keep both. I also didn’t have the backup capacity to back up OS/2 completely. My BBS, which ran Virtual BBS (and at some point also AdeptXBBS) was deleted and reincarnated in a different form. My BBS was a member of both FidoNet and VirtualNet; the latter was specific to VBBS, and had to be dropped. I believe I may have also had to drop the FidoNet link for a time. This was the biggest change of computing in my life to that point. The earlier experiences hadn’t literally destroyed what came before. OS/2 could still run my DOS programs. Its command shell was quite DOS-like. It ran Windows programs. I was going to throw all that away and leap into the unknown.

I wish I had saved a copy of my BBS; I would love to see the messages I exchanged back then, or see its menu screens again. I have little memory of what it looked like. But other than that, I have no regrets. Pursuing Free, Unixy operating systems brought me a lot of enjoyment and a good career.

That’s not to say it was easy. All the problems of not being in the Microsoft ecosystem were magnified under FreeBSD and Linux. In a day before EDID, monitor timings had to be calculated manually – and you risked destroying your monitor if you got them wrong. Word processing and spreadsheet software was pretty much not there for FreeBSD or Linux at the time; I was therefore forced to learn LaTeX and actually appreciated that. Software like PageMaker or CorelDraw was certainly nowhere to be found for those free operating systems either. But I got a ton of new capabilities.

I mentioned the BBS didn’t shut down, and indeed it didn’t. I ran what was surely a supremely unique oddity: a free, dialin Unix shell server in the middle of a small town in Kansas. I’m sure I provided things such as pine for email and some help text and maybe even printouts for how to use it. The set of callers slowly grew over the time period, in fact.

And then I got UUCP.

Enter UUCP

Even throughout all this, there was no local Internet provider and things were still long distance. I had Internet Email access via assorted strange routes, but they were all… strange. And, I wanted access to Usenet. In 1995, it happened.

The local ISP I mentioned offered UUCP access. Though I couldn’t afford the dialup shell (or later, SLIP/PPP) that they offered due to long-distance costs, UUCP’s very efficient batched processes looked doable. I believe I established that link when I was 15, so in 1995.

I worked to register my domain, complete.org, as well. At the time, the process was a bit lengthy and involved downloading a text file form, filling it out in a precise way, sending it to InterNIC, and probably mailing them a check. Well I did that, and in September of 1995, complete.org became mine. I set up sendmail on my local system, as well as INN to handle the limited Usenet newsfeed I requested from the ISP. I even ran Majordomo to host some mailing lists, including some that were surprisingly high-traffic for a few-times-a-day long-distance modem UUCP link!

The modem client programs for FreeBSD were somewhat less advanced than for OS/2, but I believe I wound up using Minicom or Seyon to continue to dial out to BBSs and, I believe, continue to use Learning Link. So all the while I was setting up my local BBS, I continued to have access to the text Internet, consisting of chiefly Gopher for me.

Switching to Debian

I switched to Debian sometime in 1995 or 1996, and have been using Debian as my primary OS ever since. I continued to offer shell access, but added the WorldVU Atlantis menuing BBS system. This provided a return of a more BBS-like interface (by default; shell was still an uption) as well as some BBS door games such as LoRD and TradeWars 2002, running under DOS emulation.

I also continued to run INN, and ran ifgate to allow FidoNet echomail to be presented into INN Usenet-like newsgroups, and netmail to be gated to Unix email. This worked pretty well. The BBS continued to grow in these days, peaking at about two dozen total user accounts, and maybe a dozen regular users.

Dial-up access availability

I believe it was in 1996 that dial up PPP access finally became available in my small town. What a thrill! FINALLY! I could now FTP, use Gopher, telnet, and the web all from home. Of course, it was at modem speeds, but still.

(Strangely, I have a memory of accessing the Web using WebExplorer from OS/2. I don’t know exactly why; it’s possible that by this time, I had upgraded to a 486 DX2/66 and was able to reinstall OS/2 on the old 25MHz 486, or maybe something was wrong with the timeline from my memories from 25 years ago above. Or perhaps I made the occasional long-distance call somewhere before I ditched OS/2.)

Gopher sites still existed at this point, and I could access them using Netscape Navigator – which likely became my standard Gopher client at that point. I don’t recall using UMN text-mode gopher client locally at that time, though it’s certainly possible I did.

The city

Starting when I was 15, I took computer science classes at Wichita State University. The first one was a class in the summer of 1995 on C++. I remember being worried about being good enough for it – I was, after all, just after my HS freshman year and had never taken the prerequisite C class. I loved it and got an A! By 1996, I was taking more classes.

In 1996 or 1997 I stayed in Wichita during the day due to having more than one class. So, what would I do then but… enjoy the computer lab? The CS dept. had two of them: one that had NCD X terminals connected to a pair of SunOS servers, and another one running Windows. I spent most of the time in the Unix lab with the NCDs; I’d use Netscape or pine, write code, enjoy the University’s fast Internet connection, and so forth.

In 1997 I had graduated high school and that summer I moved to Wichita to attend college. As was so often the case, I shut down the BBS at that time. It would be 5 years until I again dealt with Internet at home in a rural community.

By the time I moved to my apartment in Wichita, I had stopped using OS/2 entirely. I have no memory of ever having OS/2 there. Along the way, I had bought a Pentium 166, and then the most expensive piece of computing equipment I have ever owned: a DEC Alpha, which, of course, ran Linux.

ISDN

I must have used dialup PPP for a time, but I eventually got a job working for the ISP I had used for UUCP, and then PPP. While there, I got a 128Kbps ISDN line installed in my apartment, and they gave me a discount on the service for it. That was around 3x the speed of a modem, and crucially was always on and gave me a public IP. No longer did I have to use UUCP; now I got to host my own things! By at least 1998, I was running a web server on www.complete.org, and I had an FTP server going as well.

Even Bigger Cities

In 1999 I moved to Dallas, and there got my first broadband connection: an ADSL link at, I think, 1.5Mbps! Now that was something! But it had some reliability problems. I eventually put together a server and had it hosted at an acquantaince’s place who had SDSL in his apartment. Within a couple of years, I had switched to various kinds of proper hosting for it, but that is a whole other article.

In Indianapolis, I got a cable modem for the first time, with even tighter speeds but prohibitions on running “servers” on it. Yuck.

Challenges

Being non-Microsoft continued to have challenges. Until the advent of Firefox, a web browser was one of the biggest. While Netscape supported Linux on i386, it didn’t support Linux on Alpha. I hobbled along with various attempts at emulators, old versions of Mosaic, and so forth. And, until StarOffice was open-sourced as Open Office, reading Microsoft file formats was also a challenge, though WordPerfect was briefly available for Linux.

Over the years, I have become used to the Linux ecosystem. Perhaps I use Gimp instead of Photoshop and digikam instead of – well, whatever somebody would use on Windows. But I get ZFS, and containers, and so much that isn’t available there.

Yes, I know Apple never went away and is a thing, but for most of the time period I discuss in this article, at least after the rise of DOS, it was niche compared to the PC market.

Back to Kansas

In 2002, I moved back to Kansas, to a rural home near a different small town in the county next to where I grew up. Over there, it was back to dialup at home, but I had faster access at work. I didn’t much care for this, and thus began a 20+-year effort to get broadband in the country. At first, I got a wireless link, which worked well enough in the winter, but had serious problems in the summer when the trees leafed out. Eventually DSL became available locally – highly unreliable, but still, it was something. Then I moved back to the community I grew up in, a few miles from where I grew up. Again I got DSL – a bit better. But after some years, being at the end of the run of DSL meant I had poor speeds and reliability problems. I eventually switched to various wireless ISPs, which continues to the present day; while people in cities can get Gbps service, I can get, at best, about 50Mbps. Long-distance fees are gone, but the speed disparity remains.

Concluding Reflections

I am glad I grew up where I did; the strong community has a lot of advantages I don’t have room to discuss here. In a number of very real senses, having no local services made things a lot more difficult than they otherwise would have been. However, perhaps I could say that I also learned a lot through the need to come up with inventive solutions to those challenges. To this day, I think a lot about computing in remote environments: partially because I live in one, and partially because I enjoy visiting places that are remote enough that they have no Internet, phone, or cell service whatsoever. I have written articles like Tools for Communicating Offline and in Difficult Circumstances based on my own personal experience. I instinctively think about making protocols robust in the face of various kinds of connectivity failures because I experience various kinds of connectivity failures myself.

(Almost) Everything Lives On

In 2002, Gopher turned 10 years old. It had probably been about 9 or 10 years since I had first used Gopher, which was the first way I got on live Internet from my house. It was hard to believe. By that point, I had an always-on Internet link at home and at work. I had my Alpha, and probably also at least PCMCIA Ethernet for a laptop (many laptops had modems by the 90s also). Despite its popularity in the early 90s, less than 10 years after it came on the scene and started to unify the Internet, it was mostly forgotten.

And it was at that moment that I decided to try to resurrect it. The University of Minnesota finally released it under an Open Source license. I wrote the first new gopher server in years, pygopherd, and introduced gopher to Debian. Gopher lives on; there are now quite a few Gopher clients and servers out there, newly started post-2002. The Gemini protocol can be thought of as something akin to Gopher 2.0, and it too has a small but blossoming ecosystem.

Archie, the old FTP search tool, is dead though. Same for WAIS and a number of the other pre-web search tools. But still, even FTP lives on today.

And BBSs? Well, they didn’t go away either. Jason Scott’s fabulous BBS documentary looks back at the history of the BBS, while Back to the BBS from last year talks about the modern BBS scene. FidoNet somehow is still alive and kicking. UUCP still has its place and has inspired a whole string of successors. Some, like NNCP, are clearly direct descendents of UUCP. Filespooler lives in that ecosystem, and you can even see UUCP concepts in projects as far afield as Syncthing and Meshtastic. Usenet still exists, and you can now run Usenet over NNCP just as I ran Usenet over UUCP back in the day (which you can still do as well). Telnet, of course, has been largely supplanted by ssh, but the concept is more popular now than ever, as Linux has made ssh be available on everything from Raspberry Pi to Android.

And I still run a Gopher server, looking pretty much like it did in 2002.

This post also has a permanent home on my website, where it may be periodically updated.

I Finally Found a Solid Debian Tablet: The Surface Go 2

I have been looking for a good tablet for Debian for… well, years. I want thin, light, portable, excellent battery life, and a servicable keyboard.

For a while, I tried a Lenovo Chromebook Duet. It meets the hardware requirements, well sort of. The problem is with performance and the OS. I can run Debian inside the ChromeOS Linux environment. That works, actually pretty well. But it is slow. Terribly, terribly, terribly slow. Emacs takes minutes to launch. apt-gets also do. It has barely enough RAM to keep its Chrome foundation happy, let alone a Linux environment also. But basically it is too slow to be servicable. Not just that, but I ran into assorted issues with having it tied to a Google account – particularly being unable to login unless I had Internet access after an update. That and my growing concern over Google’s privacy practices led me sort of write it off.

I have a wonderful System76 Lemur Pro that I’m very happy with. Plenty of RAM, a good compromise size between portability and screen size at 14.1″, and so forth. But a 10″ goes-anywhere it’s not.

I spent quite a lot of time looking at thin-and-light convertible laptops of various configurations. Many of them were quite expensive, not as small as I wanted, or had dubious Linux support. To my surprise, I wound up buying a Surface Go 2 from the Microsoft store, along with the Type Cover. They had a pretty good deal on it since the Surface Go 3 is out; the highest-processor model of the Go 2 is roughly similar to the Go 3 in terms of performance.

There is an excellent linux-surface project out there that provides very good support for most Surface devices, including the Go 2 and 3.

I put Debian on it. I had a fair bit of hassle with EFI, and wound up putting rEFInd on it, which mostly solved those problems. (I did keep a Windows partition, and if it comes up for some reason, the easiest way to get it back to Debian is to use the Windows settings tool to reboot into advanced mode, and then select the appropriate EFI entry to boot from there.)

Researching on-screen keyboards, it seemed like Gnome had the most mature. So I wound up with Gnome (my other systems are using KDE with tiling, but I figured I’d try Gnome on it.) Almost everything worked without additional tweaking, the one exception being the cameras. The cameras on the Surfaces are a known point of trouble and I didn’t bother to go to all the effort to get them working.

With 8GB of RAM, I didn’t put ZFS on it like I do on other systems. Performance is quite satisfactory, including for Rust development. Battery life runs about 10 hours with light use; less when running a lot of cargo builds, of course.

The 1920×1280 screen is nice at 10.5″. Gnome with Wayland does a decent job of adjusting to this hi-res configuration.

I took this as my only computer for a trip from the USA to Germany. It was a little small at times; though that was to be expected. It let me take a nicely small bag as a carryon, and being light, it was pleasant to carry around in airports. It served its purpose quite well.

One downside is that it can’t be powered by a phone charger like my Chromebook Duet can. However, I found a nice slim 65W Anker charger that could charge it and phones simultaneously that did the job well enough (I left the Microsoft charger with the proprietary connector at home).

The Surface Go 2 maxes out at a 128GB SSD. That feels a bit constraining, especially since I kept Windows around. However, it also has a micro SD slot, so you can put LUKS and ext4 on that and use it as another filesystem. I popped a micro SD I had lying around into there and that felt a lot better storage-wise. I could also completely zap Windows, but that would leave no way to get firmware updates and I didn’t really want to do that. Still, I don’t use Windows and that could be an option also.

All in all, I’m pretty pleased with it. Around $600 for a fully-functional Debian tablet, with a keyboard is pretty nice.

I had been hoping for months that the Pinetab would come back into stock, because I’d much rather support a Linux hardware vendor, but for now I think the Surface Go series is the most solid option for a Linux tablet.

Pipe Issue Likely a Kernel Bug

Saturday, I wrote in Pipes, deadlocks, and strace annoyingly fixing them about an issue where a certain pipeline seems to have a deadlock. I described tracing it into kernel code. Indeed, it appears to be kernel bug 212295, which has had a patch for over a year that has never been merged.

After continuing to dig into the issue, I eventually reported it as a bug in ZFS. One of the ZFS people connected this to an older issue my searching hadn’t uncovered.

rincebrain summarized:

I believe, if I understand the bug correctly, it only triggers if you F_SETPIPE_SZ when the writer has put nonzero but not a full unit’s worth in yet, which is why the world isn’t on fire screaming about this – you need to either have a very slow but nonzero or otherwise very strange write pattern to hit it, which is why it doesn’t come up in, say, the CI or most of my testbeds, but my poor little SPARC (440 MHz, 1c1t) and Raspberry Pis were not so fortunate.

You might recall in Saturday’s post that I explained that Filespooler reads a few bytes from the gpg/zstdcat pipeline before spawning and connecting it to zfs receive. I think this is the critical piece of the puzzle; it makes it much more likely to encounter the kernel bug. zfs receive is calls F_SETPIPE_SZ when it starts. Let’s look at how this could be triggered:

In the pre-Filespooler days, the gpg|zstdcat|zfs pipeline was all being set up at once. There would be no data sent to zfs receive until gpg had initialized and begun to decrypt the data, and then zstdcat had begun to decompress it. Those things almost certainly took longer than zfs receive’s initialization, meaning that usually F_SETPIPE_SZ would have been invoked before any data entered the pipe.

After switching to Filespooler, the particular situation here has Filespooler reading somewhere around 100 bytes from the gpg|zstdcat part of the pipeline before ever invoking zfs receive. zstdcat generally emits more than 100 bytes at a time. Therefore, when Filespooler invokes zfs receive and hooks the pipeline up to it, it has a very high chance of there already being data in the pipeline when zfs receive uses F_SETPIPE_SZ. This means that the chances of encountering the conditions that trigger the particular kernel bug are also elevated.

ZFS is integrating a patch to no longer use F_SETPIPE_SZ in zfs receive. I have applied that on my local end to see what happens, and hopefully in a day or two will know for sure if it resolves things.

In the meantime, I hope you enjoyed this little exploration. It resulted in a new bug report to Rust as well as digging up an existing kernel bug. And, interestingly, no bugs in filespooler. Sometimes the thing that changed isn’t the source of the bug!

Pipes, deadlocks, and strace annoyingly fixing them

This is a complex tale I will attempt to make simple(ish). I’ve (re)learned more than I cared to about the details of pipes, signals, and certain system calls – and the solution is still elusive.

For some time now, I have been using NNCP to back up my files. These backups are sent to my backup system, which effectively does this to process them (each ZFS send is piped to a shell script that winds up running this):

gpg -q -d | zstdcat -T0 | zfs receive -u -o readonly=on "$STORE/$DEST"

This processes tens of thousands of zfs sends per week. Recently, having written Filespooler, I switched to sending the backups using Filespooler over NNCP. Now fspl (the Filespooler executable) opens the file for each stream and then connects it to what amounts to this pipeline:

bash -c 'gpg -q -d 2>/dev/null | zstdcat -T0' | zfs receive -u -o readonly=on "$STORE/$DEST"

Actually, to be more precise, it spins up the bash part of it, reads a few bytes from it, and then connects it to the zfs receive.

And this works well — almost always. In something like 1/1000 of the cases, it deadlocks, and I still don’t know why. But I can talk about the journey of trying to figure it out (and maybe some of you will have some ideas).

Filespooler is written in Rust, and uses Rust’s Command system. Effectively what happens is this:

  1. The fspl process has a File handle, which after forking but before invoking bash, it dup2’s to stdin.
  2. The connection between bash and zfs receive is a standard Unix pipe.

I cannot get the problem to duplicate when I run the entire thing under strace -f. So I am left trying to peek at it from the outside. What happens if I try to attach to each component with strace -p?

  • bash is blocking in wait4(), which is expected.
  • gpg is blocking in write().
  • If I attach to zstdcat with strace -p, then all of a sudden the deadlock is cleared and everything resumes and completes normally.
  • Attaching to zfs receive with strace -p causes no output at all from strace for a few seconds, then zfs just writes “cannot receive incremental stream: incomplete stream” and exits with error code 1.

So the plot thickens! Why would connecting to zstdcat and zfs receive cause them to actually change behavior? strace works by using the ptrace system call, and ptrace in a number of cases requires sending SIGSTOP to a process. In a complicated set of circumstances, a system call may return EINTR when a SIGSTOP is received, with the idea that the system call should be retried. I can’t see, from either zstdcat or zfs, if this is happening, though.

So I thought, “how about having Filespooler manually copy data from bash to zfs receive in a read/write loop instead of having them connected directly via a pipe?” That is, there would be two pipes going there: one where Filespooler reads from the bash command, and one where it writes to zfs. If nothing else, I could instrument it with debugging.

And so I did, and I found that when it deadlocked, it was deadlocking on write — but with no discernible pattern as to where or when. So I went back to directly connected.

In analyzing straces, I found a Rust bug which I reported in which it is failing to close the read end of a pipe in the parent post-fork. However, having implemented a workaround for this, it doesn’t prevent the deadlock so this is orthogonal to the issue at hand.

Among the two strange things here are things returning to normal when I attach strace to zstdcat, and things crashing when I attach strace to zfs. I decided to investigate the latter.

It turns out that the ZFS code that is reading from stdin during zfs receive is in the kernel module, not userland. Here is the part that is triggering the “imcomplete stream” error:

                int err = zfs_file_read(fp, (char *)buf + done,
                    len - done, &resid);
                if (resid == len - done) {
                        /*
                         * Note: ECKSUM or ZFS_ERR_STREAM_TRUNCATED indicates
                         * that the receive was interrupted and can
                         * potentially be resumed.
                         */
                        err = SET_ERROR(ZFS_ERR_STREAM_TRUNCATED);
                }

resid is an output parameter with the number of bytes remaining from a short read, so in this case, if the read produced zero bytes, then it sets that error. What’s zfs_file_read then?

It boils down to a thin wrapper around kernel_read(). This winds up calling __kernel_read(), which calls read_iter on the pipe, which is pipe_read(). That’s where I don’t have the knowledge to get into the weeds right now.

So it seems likely to me that the problem has something to do with zfs receive. But, what, and why does it only not work in this one very specific situation, and only so rarely? And why does attaching strace to zstdcat make it all work again? I’m indeed puzzled!

Update 2022-06-20: See the followup post which identifies this as likely a kernel bug and explains why this particular use of Filespooler made it easier to trigger.

KDE: A Nice Tiling Envieonment and a Surprisingly Awesome DE

I recently wrote that managing an external display on Linux shouldn’t be this hard. I went down a path of trying out some different options before finally landing at an unexpected place: KDE. I say “unexpected” because I find tiling window managers are just about a necessity.

Background: xmonad

Until a few months ago, I’d been using xmonad for well over a decade. Configurable, minimal, and very nice; it suited me well.

However, xmonad is getting somewhat long in the tooth. xmobar, which is commonly used with it, barely supports many modern desktop environments. I prefer DEs for the useful integrations they bring: everything from handling mount of USB sticks to display auto-switching and sound switching. xmonad itself can’t run with modern Gnome (whether or not it runs well under KDE 5 seems to be a complicated question, according to wikis, but in any case, there is no log applet for KDE 5). So I was left with XFCE and such, but the isues I identified in the “shouldn’t be this hard” article were bad enough that I just could not keep going that way.

An attempt: Gnome and PaperWM

So I tried Gnome under Wayland, reasoning that Wayland might stand a chance of doing things well where X couldn’t. There are several tiling window extensions available for the Gnome 3 shell. Most seemed to be rather low-quality, but an exception was PaperWM and I eventually decided on it. I never quite decided if I liked its horizontal tape of windows or not; it certainly is unique in any case.

I was willing to tolerate my usual list of Gnome problems for the sake of things working. For instance:

  • The Windows-like “settings are spread out in three different programs and some of them require editing the registry[dconf]”. Finding all the options for keybindings and power settings was a real chore, but done.
  • Some file dialog boxes (such as with the screenshot-taking tool) just do not let me type in a path to save a file, insisting that I first navigate to a directory and then type in a name.
  • General lack of available settings or hiding settings from people.
  • True focus-follows-mouse was incompatible with keyboard window switching (PaperWM or no); with any focus-follows-mouse enabled, using Alt-Tab or any other method to switch to other windows would instantly have focus returned to whatever the mouse was over.

Under Wayland, I found a disturbing lack of logs. There was nothing like /var/log/Xorg.0.log, nothing like ~/.xsession-errors, just nothing. Searching for answers on this revealed a lot of Wayland people saying “it’s a Gnome issue” and the trail going cold at that point.

And there was a weird problem that I just could not solve. After the laptop was suspended and we-awakened, I would be at a lock screen. I could type in my password, but when hitting Enter, the thing would then tend to freeze. Why, I don’t know. It seemed related to Gnome shell; when I switched Gnome from Wayland to X11, it would freeze but eventually return to the unlock screen, at which point I’d type in my password and it would freeze again. I spent a long time tracking down logs to see what was happening, but I couldn’t figure it out. All those hard resets were getting annoying.

Enter KDE

So I tried KDE. I had seen mentions of kwin-tiling, a KDE extension for tiling windows. I thought I’d try this setup.

I was really impressed by KDE’s quality. Not only did it handle absolutely every display-related interaction correctly by default, with no hangs ever, all relevant settings were clearly presented in one place. The KDE settings screens were a breath of fresh air – lots of settings available, all at one place, and tons of features I hadn’t seen elsewhere.

Here are some of the things I was pleasantly surprised by with KDE:

  • Applications can declare classes of notifications. These can be managed Android-style in settings. Moreover, you can associate a shell command to run with a notification in any class. People use this to do things like run commands when a display locks and so forth.
  • KDE Connect is a seriously impressive piece of software. It integrates desktops with Android devices in a way that’s reminiscent of non-free operating systems – and with 100% Free Software (the phone app is even in F-Droid!). Notifications from the phone can appear on the desktop, and their state is synchronized; dismiss it on the desktop and it dismisses on the phone, too. Get a SMS or Signal message on the phone? You can reply directly from the desktop. Share files in both directions, mount a directory tree from the phone on the desktop, “find my phone”, use the phone as a presentation remote for the desktop, shared clipboard, sending links between devices, control the phone media player from the desktop… Really, really impressive.
  • The shortcut settings in KDE really work and are impressive. Unlike Gnome, if you try to assign the same shortcut to multiple things, you are warned and prevented from doing this. As with Gnome, you can also bind shorcuts to arbitrary actions.
  • This shouldn’t be exciting, but I was just using Gnome, so… The panel! I can put things wherever I want them! I can put it at the top of the screen, the bottom, or even the sides! It lets all my regular programs (eg, Nextcloud) put their icons up there without having to install two different extensions, each of which handles a different set of apps! I shouldn’t be excited about all this, because Gnome actually used to have these features years ago… [gripe gripe]
  • Initially I was annoyed that Firefox notifications weren’t showing up in the notification history as they did in Gnome… but that was, of course, a setting, easily fixed!
  • There is a Plasma Integration plugin for Firefox (and other browsers including Chrome). It integrates audio and video playback, download status, etc. with the rest of KDE and KDE Connect. Result: if you like, when a call comes in to your phone, Youtube is paused. Or, you can right-click to share a link to your phone via KDE Connect, and so forth. You can right click on a link, and share via Bluetooth, Nextcloud (it must have somehow registered with KDE), KDE Connect, email, etc.

Tiling

So how about the tiling system, kwin-tiling? The out of the box experience is pretty nice. There are fewer built-in layouts than with xmonad, but the ones that are there are doing a decent job for me, and in some cases are more configurable (those that have a large window pane are configurable on its location, not forcing it to be on the left as with many systems.) What’s more, thanks to the flexibility in the KDE shortcut settings, I can configure it to be nearly keystroke-identical to xmonad!

Issues Encountered

I encountered a few minor issues:

  • There appears to be no way to tell it to “power down the display immediately after it is locked, every time” instead of waiting for some timeout to elapse. This is useful when I want to switch monitor inputs to something else.
  • Firefox ESR seems to have some rendering issues under KDE for some reason, but switching to the latest stable release direct from Mozilla seems to fix that.

In short, I’m very impressed.

Managing an External Display on Linux Shouldn’t Be This Hard

I first started using Linux and FreeBSD on laptops in the late 1990s. Back then, there were all sorts of hassles and problems, from hangs on suspend to pure failure to boot. I still worry a bit about suspend on unknown hardware, but by and large, the picture of Linux on laptops has dramatically improved over the last years. So much so that now I can complain about what would once have been a minor nit: dealing with external monitors.

I have a USB-C dock that provides both power and a Thunderbolt display output over the single cable to the laptop. I think I am similar to most people in wanting the following behavior from the laptop:

  • When the lid is closed, suspend if no external monitor is connected. If an external monitor is connected, shut off the built-in display and use the external one exclusively, but do not suspend.
  • Lock the screen automatically after a period of inactivity.
  • While locked, all connected displays should be powered down.
  • When an external display is connected, begin using it automatically.
  • When an external display is disconnected, stop using it. If the lid is closed when the external display is disconnected, go into suspend mode.

This sounds so simple. But somehow on Linux we’ve split up these things into a dozen tiny bits:

  • In /etc/systemd/logind.conf, there are settings about what to do when the lid is opened or closed.
  • Various desktop environments have overlapping settings covering the same things.
  • Then there are the display managers (gdm3, lightdm, etc) that also get in on the act, and frequently have DIFFERENT settings, set in different places, from the desktop environments. And, what’s more, they tend to be involved with locking these days.
  • Then there are screensavers (gnome-screensaver, xscreensaver, etc.) that also enter the picture, and also have settings in these areas.

Problems I’ve Seen

My problems don’t even begin with laptops, but with my desktop, running XFCE with xmonad and lightdm. My desktop is hooked to a display that has multiple inputs. This scenario (reproducible in both buster and bullseye) causes the display to be unusable until a reboot on the desktop:

  1. Be logged in and using the desktop
  2. Without locking the desktop screen, switch the display input to another device
  3. Keep the display input on another device long enough for the desktop screen to auto-lock
  4. At this point, it is impossible to re-awaken the desktop screen.

I should not here that the problems aren’t limited to Debian, but also extend to Ubuntu and various hardware.

Lightdm: which greeter?

At some point while troubleshooting things after upgrading my laptop to bullseye, I noticed that while both were running lightdm, I had different settings and a different appearance between the two. Upon further investigation, I realized that one hat slick-greeter and lightdm-settings installed, while the other had lightdm-gtk-greeter and lightdm-gtk-greeter-settings installed. Very strange.

XFCE: giving up

I eventually gave up on making lightdm work. No combination of settings or greeters would make things work reliably when changing screen configurations. I installed xscreensaver. It doesn’t hang, but it does sometimes take a few tries before it figures out what device to display on.

Worse, since updating from buster to bullseye, XFCE no longer automatically switches audio output when the docking station is plugged in, and there seems to be no easy way to convince Pulseaudio to do this.

X-Based Gnome and derivatives… sigh.

I also tried Gnome, Mate, and Cinnamon, and all of them had various inabilities to configure things to act the way I laid out above.

I’ve long not been a fan of Gnome’s way of hiding things from the user. It now has a Windows-like situation of three distinct settings programs (settings, tweaks, and dconf editor), which overlap in strange ways and interact with systemd in even stranger ways. Gnome 3 make it quite non-intuitive to make app icons from various programs work, and so forth.

Trying Wayland

I recently decided to set up an older laptop that I hadn’t used in awhile. After reading up on Wayland, I decided to try Gnome 3 under Wayland. Both the Debian and Arch wikis note that KDE is buggy on Wayland. Gnome is the only desktop environment that supports it then, unless I want to go with Sway. There’s some appeal to Sway to this xmonad user, but I’ve read of incompatibilities of Wayland software when Gnome’s not available, so I opted to try Gnome.

Well, it’s better. Not perfect, but better. After finding settings buried in a ton of different Settings and Tweaks boxes, I had it mostly working, except gdm3 would never shut off power to the external display. Eventually I found /etc/gdm3/greeter.dconf-defaults, and aadded:

sleep-inactive-ac-timeout=60
sleep-inactive-ac-type='blank'
sleep-inactive-battery-timeout=120
sleep-inactive-battery-type='suspend'

Of course, these overlap with but are distinct from the same kinds of things in Gnome settings.

Sway?

Running without Gnome seems like a challenge; Gnome is switching audio output appropriately, for instance. I am looking at some of the Gnome Shell tiling window manager extensions and hope that some of them may work for me.

Excellent Experience with Debian Bullseye

I’ve appreciated the bullseye upgrade, like most Debian upgrades. I’m not quite sure how, since I was already running a backports kernel, but somehow the entire system is snappier. Maybe newer X or something? I’m really pleased with it. Hardware integration is even nicer now, particularly the automatic driverless support for scanners in addition to the existing support for printers.

All in all, a very nice upgrade, and pretty painless.

I experienced a few odd situations.

For one, I had been using Gnome Flashback. Since xmonad-log-applet didn’t compile there (due to bitrot in the log applet, not flashback), and I had been finding Gnome Flashback to be a rather dusty and forgotten corner of Gnome for a long time, I decided to try Mate.

Mate just seemed utterly unable to handle a situation with a laptop and an external monitor very well. I want to use only the external monitor with the laptop lid is closed, and it just couldn’t remember how to do the right thing – external monitor on, laptop monitor off, laptop not put into suspend. gdm3 also didn’t seem to be able to put the external monitor to sleep, either, causing a few nights of wasted power.

So off I went to XFCE, which I had been using for years on my workstation anyhow. Lots more settings available in XFCE, plus things Just Worked there. Odd that XFCE, the thin and light DE, is now the one that has the most relevant settings. It seems the Gnome “let’s remove a bunch of features” approach has extended to MATE as well.

When I switched to XFCE, I also removed gdm3 from my system, leaving lightdm as the only DM on it. That matched what my desktop machine was using, and also what task-xfce-desktop called for. But strangely, the XFCE settings for lightdm were completely different between the laptop and the desktop. It turns out that with lightdm, you can have the lightdm-gtk-greeter and the accompanying lightdm-gtk-greeter-settings, or slick-greeter and the accompanying lightdm-settings. One machine had one greeter and settings, and the other had the other. Why, I don’t know. But lightdm-gtk-greeter-settings had the necessary options for putting monitors to sleep on the login screen, so I went with it.

This does highlight a bit of a weakness in Debian upgrades. There is SO MUCH choice in Debian, which I highly value. At some point, almost certainly without my conscious choice, one machine got one greeter and another got the other. Despite both having task-xfce-desktop installed, they got different desktop experiences. There isn’t a great way to say “OK, I know I had a bunch of things installed before, but NOW I want the default bullseye experience”.

But overall, it is an absolutely fantastic distribution. It is great to see this nonprofit community distribution continue to have such quality on such an immense scale. And hard to believe I’ve been a Debian developer for 25 years. That seems almost impossible!

Remote Directory Tree Comparison, Optionally Asynchronous and Airgapped

Note: this is another article in my series on asynchronous communication in Linux with UUCP and NNCP.

In the previous installment on store-and-forward backups, I mentioned how easy it is to do with ZFS, and some of the tools that can be used to do it without ZFS. A lot of those tools are a bit less robust, so we need some sort of store-and-forward mechanism to verify backups. To be sure, verifying backups is good with ANY scheme, and this could be used with ZFS backups also.

So let’s say you have a shiny new backup scheme in place, and you’d like to verify that it’s working correctly. To do that, you need to compare the source directory tree on machine A with the backed-up directory tree on machine B.

Assuming a conventional setup, here are some ways you might consider to do that:

  • Just copy everything from machine A to machine B and compare locally
  • Or copy everything from machine A to a USB drive, plug that into machine B, and compare locally
  • Use rsync in dry-run mode and see if it complains about anything

The first two options are not particularly practical for large datasets, though I note that the second is compatible with airgapping. Using rsync requires both systems to be online at the same time to perform the comparison.

What would be really nice here is a tool that would write out lots of information about the files on a system: their names, sizes, last modified dates, maybe even sha256sum and other data. This file would be far smaller than the directory tree itself, would compress nicely, and could be easily shipped to an airgapped system via NNCP, UUCP, a USB drive, or something similar.

Tool choices

It turns out there are already quite a few tools in Debian (and other Free operating systems) to do this, and half of them are named mtree (though, of course, not all mtrees are compatible with each other.) We’ll look at some of the options here.

I’ve made a simple test directory for illustration purposes with these commands:

mkdir test
cd test
echo hi > hi
ln -s hi there
ln hi foo
touch empty
mkdir emptydir
mkdir somethingdir
cd somethingdir
ln -s ../there

I then also used touch to set all files to a consistent timestamp for illustration purposes.

Tool option: getfacl (Debian package: acl)

This comes with the acl package, but can be used with other than ACL purposes. Unfortunately, it doesn’t come with a tool to directly compare its output with a filesystem (setfacl, for instance, can apply the permissions listed but won’t compare.) It ignores symlinks and doesn’t show sizes or dates, so is ineffective for our purposes.

Example output:

$ getfacl --numeric -R test
...
# file: test/hi
# owner: 1000
# group: 1000
user::rw-
group::r--
other::r--
...

Tool option: fmtree, the FreeBSD mtree (Debian package: freebsd-buildutils)

fmtree can prepare a “specification” based on a directory tree, and compare a directory tree to that specification. The comparison also is aware of files that exist in a directory tree but not in the specification. The specification format is a bit on the odd side, but works well enough with fmtree. Here’s a sample output with defaults:

$ fmtree -c -p test
...
# .
/set type=file uid=1000 gid=1000 mode=0644 nlink=1
.               type=dir mode=0755 nlink=4 time=1610421833.000000000
    empty       size=0 time=1610421833.000000000
    foo         nlink=2 size=3 time=1610421833.000000000
    hi          nlink=2 size=3 time=1610421833.000000000
    there       type=link mode=0777 time=1610421833.000000000 link=hi

... skipping ...

# ./somethingdir
/set type=file uid=1000 gid=1000 mode=0777 nlink=1
somethingdir    type=dir mode=0755 nlink=2 time=1610421833.000000000
    there       type=link time=1610421833.000000000 link=../there
# ./somethingdir
..

..

You might be wondering here what it does about special characters, and the answer is that it has octal escapes, so it is 8-bit clean.

To compare, you can save the output of fmtree to a file, then run like this:

cd test
fmtree < ../test.fmtree

If there is no output, then the trees are identical. Change something and you get a line of of output explaining each difference. You can also use fmtree -U to change things like modification dates to match the specification.

fmtree also supports quite a few optional keywords you can add with -K. They include things like file flags, user/group names, various tipes of hashes, and so forth. I'll note that none of the options can let you determine which files are hardlinked together.

Here's an excerpt with -K sha256digest added:

    empty       size=0 time=1610421833.000000000 \
                sha256digest=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    foo         nlink=2 size=3 time=1610421833.000000000 \
                sha256digest=98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4

If you include a sha256digest in the spec, then when you verify it with fmtree, the verification will also include the sha256digest. Obviously fmtree -U can't correct a mismatch there, but of course it will detect and report it.

Tool option: mtree, the NetBSD mtree (Debian package: mtree-netbsd)

mtree produces (by default) output very similar to fmtree. With minor differences (such as the name of the sha256digest in the output), the discussion above about fmtree also applies to mtree.

There are some differences, and the most notable is that mtree adds a -C option which reads a spec and converts it to a "format that's easier to parse with various tools." Here's an example:

$ mtree -c -K sha256digest -p test | mtree -C
. type=dir uid=1000 gid=1000 mode=0755 nlink=4 time=1610421833.0 flags=none 
./empty type=file uid=1000 gid=1000 mode=0644 nlink=1 size=0 time=1610421833.0 flags=none 
./foo type=file uid=1000 gid=1000 mode=0644 nlink=2 size=3 time=1610421833.0 flags=none 
./hi type=file uid=1000 gid=1000 mode=0644 nlink=2 size=3 time=1610421833.0 flags=none 
./there type=link uid=1000 gid=1000 mode=0777 nlink=1 link=hi time=1610421833.0 flags=none 
./emptydir type=dir uid=1000 gid=1000 mode=0755 nlink=2 time=1610421833.0 flags=none 
./somethingdir type=dir uid=1000 gid=1000 mode=0755 nlink=2 time=1610421833.0 flags=none 
./somethingdir/there type=link uid=1000 gid=1000 mode=0777 nlink=1 link=../there time=1610421833.0 flags=none 

Most definitely an improvement in both space and convenience, while still retaining the relevant information. Note that if you want the sha256digest in the formatted output, you need to pass the -K to both mtree invocations. I could have done that here, but it is easier to read without it.

mtree can verify a specification in either format. Given what I'm about to show you about bsdtar, this should illustrate why I bothered to package mtree-netbsd for Debian.

Unlike fmtree, the mtree -U command will not adjust modification times based on the spec, but it will report on differences.

Tool option: bsdtar (Debian package: libarchive-tools)

bsdtar is a fascinating program that can work with many formats other than just tar files. Among the formats it supports is is the NetBSD mtree "pleasant" format (mtree -C compatible).

bsdtar can also convert between the formats it supports. So, put this together: bsdtar can convert a tar file to an mtree specification without extracting the tar file. bsdtar can also use an mtree specification to override the permissions on files going into tar -c, so it is a way to prepare a tar file with things owned by root without resorting to tools like fakeroot.

Let's look at how this can work:

$ cd test
$ bsdtar --numeric -cf - --format=mtree .

. time=1610472086.318593729 mode=755 gid=1000 uid=1000 type=dir
./empty time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=0
./foo nlink=2 time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=3
./hi nlink=2 time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=3
./ormat\075mtree time=1610472086.318593729 mode=644 gid=1000 uid=1000 type=file size=5632
./there time=1610421833.0 mode=777 gid=1000 uid=1000 type=link link=hi
./emptydir time=1610421833.0 mode=755 gid=1000 uid=1000 type=dir
./somethingdir time=1610421833.0 mode=755 gid=1000 uid=1000 type=dir
./somethingdir/there time=1610421833.0 mode=777 gid=1000 uid=1000 type=link link=../there

You can use mtree -U to verify that as before. With the --options mtree: set, you can also add hashes and similar to the bsdtar output. Since bsdtar can use input from tar, pax, cpio, zip, iso9660, 7z, etc., this capability can be used to create verification of the files inside quite a few different formats. You can convert with bsdtar -cf output.mtree --format=mtree @input.tar. There are some foibles with directly using these converted files with mtree -U, but usually minor changes will get it there.

Side mention: stat(1) (Debian package: coreutils)

This tool isn't included because it won't operate recursively, but is a tool in the similar toolbox.

Putting It Together

I will still be developing a complete non-ZFS backup system for NNCP (or UUCP) in a future post. But in the meantime, here are some ideas you can reflect on:

  • Let's say your backup scheme involves sending a full backup every night. On the source system, you could pipe the generated tar file through something like tee >(bsdtar -cf bcakup.mtree @-) to generate an mtree file in-band while generating the tar file. This mtree file could be shipped over for verification.
  • Perhaps your backup scheme involves sending incremental backup data via rdup or even ZFS, but you would like to periodically verify that everything is good -- that an incremental didn't miss something. Something like mtree -K sha256 -c -x -p / | mtree -C -K sha256 would let you accomplish that.

I will further develop at least one of these ideas in a future post.

Bonus: cross-tool comparisons

In my mtree-netbsd packaging, I added tests like this to compare between tools:

fmtree -c -K $(MTREE_KEYWORDS) | mtree
mtree -c -K $(MTREE_KEYWORDS) | sed -e 's/\(md5\|sha1\|sha256\|sha384\|sha512\)=/\1digest=/' -e 's/rmd160=/ripemd160digest=/' | fmtree
bsdtar -cf - --options 'mtree:uname,gname,md5,sha1,sha256,sha384,sha512,device,flags,gid,link,mode,nlink,size,time,uid,type,uname' --format mtree . | mtree

Airgapped / Asynchronous Backups with ZFS over NNCP

In my previous articles in the series on asynchronous communication with the modern NNCP tool, I talked about its use for asynchronous, potentially airgapped, backups. The first article, How & Why To Use Airgapped Backups laid out the foundations for this. Now let’s dig into the details.

Today’s post will cover ZFS, because it has a lot of features that make it very easy to support in this setup. Non-ZFS backups will be covered later.

The setup is actually about as simple as it is for SSH, but since people are less familiar with this kind of communication, I’m going to try to go into more detail here.

Assumptions

I am assuming a setup where:

  • The machines being backed up run ZFS
  • The disk(s) that hold the backups are also running ZFS
  • zfs send / receive is desired as an efficient way to transport the backups
  • The machine that holds the backups may have no network connection whatsoever
  • Backups will be sent encrypted over some sort of network to a spooling machine, which temporarily holds them until they are transported to the destination backup system and ingested there. This system will be unable to decrypt the data streams it temporarily stores.

Hardware

Let’s start with hardware for the machine to hold the backups. I initially considered a Raspberry Pi 4 with 8GB of RAM. That would probably have been a suitable machine, at least for smaller backup sets. However, none of the Raspberry Pi machines support hardware AES encryption acceleration, and my Pi4 benchmarks as about 60MB/s for AES encryption. I want my backups to be encrypted, and decided this would just be too slow for my purposes. Again, if you don’t need encrypted backups or don’t care that much about performance — may people probably fall into this category — you can have a fully-functional Raspberry Pi 4 system for under $100 that would make a fantastic backup server.

I wound up purchasing a Qotom-Q355G4 micro PC with a Core i5 for about $315. It has USB 3 ports and is designed as a rugged, long-lasting system. I have been using one of their older Celeron-based models as my router/firewall for a number of years now and it’s been quite reliable.

For backup storage, you can get a USB 3 external drive. My own preference is to get a USB 3 “toaster” (device that lets me plug in SATA drives) so that I have more control over the underlying medium and can save the expense and hassle of a bunch of power supplies. In a future post, I will discuss drive rotation so you always have an offline drive.

Then, there is the question of transport to the backup machine. A simple solution would be to have a heavily-firewalled backup system that has no incoming ports open but makes occasional outgoing connections to one specific NNCP daemon on the spooling machine. However, for airgapped operation, it would also be very simple to use nncp-xfer to transport the data across on a USB stick or some such. You could set up automounting for a specific USB stick – plug it in, all the spooled data is moved over, then plug it in to the backup system and it’s processed, and any outbound email traffic or whatever is copied to the USB stick at that point too. The NNCP page has some more commentary about this kind of setup.

Both are fairly easy to set up, and NNCP is designed to be transport-agnostic, so in this article I’m going to focus on how to integrate ZFS with NNCP.

Operating System

Of course, it should be no surprise that I set this up on Debian.

As an added step, I did all the configuration in Ansible stored in a local git repo. This adds a lot of work, but it means that it is trivial to periodically wipe and reinstall if any security issue is suspected. The git repo can be copied off to another system for storage and takes the system from freshly-installed to ready-to-use state.

Security

There is, of course, nothing preventing you from running NNCP as root. The zfs commands, obviously, need to be run as root. However, from a privilege separation standpoint, I have chosen to run everything relating to NNCP as a nncp user. NNCP already does encryption, but if you prefer to have zero knowledge of the data even to NNCP, it’s trivial to add gpg to the pipeline as well, and in fact I’ll be demonstrating that in a future post for other reasons.

Software

Besides NNCP, there needs to be a system that generates the zfs send streams. For this project, I looked at quite a few. Most were designed to inspect the list of snapshots on a remote end, compare it to a list on the local end, and calculate a difference from there. This, of course, won’t work for this situation.

I realized my own simplesnap project was very close to being able to do this. It already used an algorithm of using specially-named snapshots on the machine being backed up, so never needed any communication about what snapshots were present where. All it needed was a few more options to permit sending to a stream instead of zfs receive. I made those changes and they are available in simplesnap 2.0.0 or above. That version has also been uploaded to sid, and will work fine as-is on buster as well.

Preparing NNCP

I’m going to assume three hosts in this setup:

  • laptop is the machine being backed up. Of course, you may have quite a few of these.
  • spooler holds the backup data until the backup system picks it up
  • backupsvr holds the backups

The basic NNCP workflow documentation covers the basic steps. You’ll need to run nncp-cfgnew on each machine. This generates a basic configuration, along with public and private keys for that machine. You’ll copy the public key sets to the configurations of the other machines as usual. On the laptop, you’ll add a via line like this:

backupsvr: {
  id: ....
  exchpub: ...
  signpub: ...
  noisepub: ...
  via: ["spooler"]

This tells NNCP that data destined for backupsvr should always be sent via spooler first.

You can then arrange for the nncp-daemon to run on the spooler, and nncp-caller or nncp-call on the backupsvr. Or, alternatively, airgapped between the two with nncp-xfer.

Generating Backup Data

Now, on the laptop, install simplesnap (2.0.0 or above). Although you won’t be backing up to the local system, simplesnap still maintains a hostlock in ZFS. Prepate a dataset for it:

zfs create tank/simplesnap
zfs set org.complete.simplesnap:exclude=on tank/simplesnap

Then, create a script /usr/local/bin/runsimplesnap like this:

#!/bin/bash

set -e

simplesnap --store tank/simplesnap --setname backups --local --host `hostname` \
   --receivecmd /usr/local/bin/simplesnap-queue \
   --noreap

su nncp -c '/usr/local/nncp/bin/nncp-toss -noprogress -quiet'

if ip addr | grep -q 192.168.65.64; then
  su nncp -c '/usr/local/nncp/bin/nncp-call -noprogress -quiet -onlinedeadline 1 spooler'
fi

The call to simplesnap sets it up to send the data to simplesnap-queue, which we’ll create in a moment. The –receivmd, plus –noreap, sets it up to run without ZFS on the local system.

The call to nncp-toss will process any previously-received inbound NNCP packets, if there are any. Then, in this example, we do a very basic check to see if we’re on the LAN (checking 192.168.65.64), and if so, will establish a connection to the spooler to transmit the data. If course, you could also do this over the Internet, with tor, or whatever, but in my case, I don’t want to automatically do this in case I’m tethered to mobile. I figure if I want to send backups in that case, I can fire up nncp-call myself. You can also use nncp-caller to set up automated connections on other schedules; there are a lot of options.

Now, here’s what /usr/local/bin/simplesnap-queue looks like:

#!/bin/bash

set -e
set -o pipefail

DEST="`echo $1 | sed 's,^tank/simplesnap/,,'`"

echo "Processing $DEST" >&2
# stdin piped to this
su nncp -c "/usr/local/nncp/bin/nncp-exec -nice B -noprogress backupsvr zfsreceive '$DEST'" >&2
echo "Queued for $DEST" >&2

This is a pretty simple script. simplesnap will call it with a path based on the –store, with the hostname after; so, for instance, tank/simplesnap/laptop/root or some such. This script strips off the leading tank/simplesnap (which is a local fragment), leaving the host and dataset paths. Then it just pipes it to nncp-exec. -nice B classifies it as low-priority bulk data (so if you have some more important interactive data, it would be sent first), then passes it to whatever the backupsvr defines as zfsreceive.

Receiving ZFS backups

In the NNCP configuration on the recipient’s side, in the laptop section, we define what command it’s allowed to run as zfsreceive:

      exec: {
        zfsreceive: ["/usr/bin/sudo", "-H", "/usr/local/bin/nncp-zfs-receive"]
      }

We authorize the nncp user to run this under sudo in /etc/sudoers.d/local–nncp:

Defaults env_keep += "NNCP_SENDER"
nncp ALL=(root) NOPASSWD: /usr/local/bin/nncp-zfs-receive

The NNCP_SENDER is the public key ID of the sending node when nncp-toss processes the incoming data. We can use that for sanity checking later.

Now, here’s a basic nncp-zfs-receive script:

#!/bin/bash
set -e
set -o pipefail

STORE=backups/simplesnap
DEST="$1"

# now process stdin
runcommand zfs receive -o readonly=on -x mountpoint "$STORE/$DEST"

And there you have it — all the basics are in place.

Update 2020-12-30: An earlier version of this article had “zfs receive -F” instead of “zfs receive -o readonly=on -x mountpoint”. These changed arguments are more robust.
Update 2021-01-04: I am now recommending “zfs receive -u -o readonly=on”; see my successor article for more.

Enhancements

You could enhance the nncp-zfs-receive script to improve logging and error handling. For instance:

#!/bin/bash

set -e
set -o pipefail

STORE=backups/simplesnap
# $1 will be the host/dataset

DEST="$1"
HOST="`echo "$1" | sed 's,/.*,,g'`"
if [ -z "$HOST" ]; then
   echo "Malformed command line"
   exit 5
fi

# Log a message
logit () {
   logger -p info -t "`basename "$0"`[$$]" "$1"
}

# Log an error message
logerror () {
   logger -p err -t "`basename "$0"`[$$]" "$1"
}

# Log stdin with the given code.  Used normally to log stderr.
logstdin () {
   logger -p info -t "`basename "$0"`[$$/$1]"
}

# Run command, logging stderr and exit code
runcommand () {
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
}
exiterror () {
   logerror "$1"
   echo "$1" 1>&2
   exit 10
}

# Sanity check

if [ "$HOST" = "laptop" ]; then
  if [ "$NNCP_SENDER" != "12345678" ]; then
    exiterror "Host $HOST doesn't match sender $NNCP_SENDER"
  fi
else
  exiterror "Unknown host $HOST"
fi

runcommand zfs receive -F "$STORE/$DEST"

Now you’ll capture the ZFS receive output in syslog in a friendly way, so you can look back later why things failed if they did.

Further notes on NNCP

nncp-toss will examine the exit code from an invocation. If it is nonzero, it will keep the command (and associated stdin) in the queue and retry it on the next invocation. NNCP does not guarantee order of execution, so it is possible in some cases that ZFS streams may be received in the wrong order. That is fine here; zfs receive will exit with an error, and nncp-toss will just run it again after the dependent snapshots have been received. For non-ZFS backups, a simple sequence number can handle this issue.

Long-Range Radios: A Perfect Match for Unix Protocols From The 70s

It seems I’ve been on a bit of a vintage computing kick lately. After connecting an original DEC vt420 to Linux and resurrecting some old operating systems, I dove into UUCP.

In fact, it so happened that earlier in the week, my used copy of Managing UUCP & Usenet (its author list includes none other than Tim O’Reilly) arrived. I was reading about the challenges of networking in the 70s: half-duplex lines, slow transmission rates, and modems that had separate dialers. And then I stumbled upon long-distance radio. It turns out that a lot of modern long-distance radio has much in common with the challenges of communication in the 1970s – 1990s, and some of our old protocols might be particularly well-suited for it. Let me explain — I’ll start with the old software, and then talk about the really cool stuff going on in hardware (some radios that can send a signal for 10-20km or more with very little power!), and finally discuss how to bring it all together.

UUCP

UUCP, for those of you that may literally have been born after it faded in popularity, is a batch system for exchanging files and doing remote execution. For users, the uucp command copies files to or from a remote system, and uux executes commands on a remote system. In practical terms, the most popular use of this was to use uux to execute rmail on the remote system, which would receive an email message on stdin and inject it into the system’s mail queue. All UUCP commands are queued up and transmitted when a “call” occurs — over a modem, TCP, ssh pipe, whatever.

UUCP had to deal with all sorts of line conditions: very slow lines (300bps), half-duplex lines, noisy and error-prone communication, poor or nonexistent flow control, even 7-bit communication. It supports a number of different transport protocols that can accommodate these varying conditions. It turns out that these mesh fairly perfectly with some properties of modern long-distance radio.

AX.25

The AX.25 stack is a frame-based protocol used by amateur radio folks. Its air speed is 300bps, 1200bps, or (rarely) 9600bps. The Linux kernel has support for the AX.25 protocol and it is quite possible to run TCP/IP atop it. I have personally used AX.25 to telnet to a Linux box 15 miles away over a 1200bps air speed, and have also connected all the way from Kansas to Texas and Indiana using 300bps AX.25 using atmospheric skip. AX.25 has “connected” packets (as TCP) and unconnected/broadcast ones (similar to UDP) and is a error-detected protocol with retransmit. The radios generally used with AX.25 are always half-duplex and some of them have iffy carrier detection (which means collision is frequent). Although the whole AX.25 stack has grown rare in recent years, a subset of it is still in wide use as the basis for APRS.

A lot of this is achieved using equipment that’s not particularly portable: antennas on poles, radios that transmit with anywhere from 1W to 100W of power (even 1W is far more than small portable devices normally use), etc. Also, under the regulations of the amateur radio service, transmitters must be managed by a licensed operator and cannot be encrypted.

Nevertheless, AX.25 is just a protocol and it could, of course, run on other kinds of carriers than traditional amateur radios.

Long-range low-power radios

There is a lot being done with radios these days, much of which I’m not going to discuss. I’m not covering very short-range links such as Bluetooth, ZigBee, etc. Nor am I covering longer-range links that require large and highly-directional antennas (such as some are doing in the 2.4GHz and 5GHz bands). What I’m covering is long-range links that can be used by portable devices.

There is always a compromise in radios, and if we are going to achieve long-range links with poor antennas and low power, the compromise is going to be in bitrate. These technologies may scale down to as low at 300bps or up to around 115200bps. They can, as a side bonus, often be quite cheap.

HC-12 radios

HC-12 is a radio board, commonly used with Arduino, that sports 500bps to 115200bps communication. According to the vendor, in 500bps mode, the range is 1800m or 0.9mi, while at 115200bps, the range is 100m or 328ft. They’re very cheap, at around $5 each.

There are a few downsides to HC-12. One is that the lowest air bitrate is 500bps, but the lowest UART bitrate is 1200bps, and they have no flow control. So, if you are running in long-range mode, “only small packets can be sent: max 60 bytes with the interval of 2 seconds.” This would pose a challenge in many scenarios: though not much for UUCP, which can be perfectly well configured to have a 60-byte packet size and a window size of 1, which would wait for a remote ACK before proceeding.

Also, they operate over 433.4-473.0 MHz which appears to fall outside the license-free bands. It seems that many people using HC-12 are doing so illegally. With care, it would be possible to operate it under amateur radio rules, since this range is mostly within the 70cm allocation, but then it must follow amateur radio restrictions.

LoRa radios

LoRa is a set of standards for long range radios, which are advertised as having a range of 15km (9mi) or more in rural areas, and several km in cities.

LoRa can be done in several ways: the main LoRa protocol, and LoRaWAN. LoRaWAN expects to use an Internet gateway, which will tell each node what frequency to use, how much power to use, etc. LoRa is such that a commercial operator could set up roughly one LoRaWAN gateway per city due to the large coverage area, and some areas have good LoRa coverage due to just such operators. The difference between the two is roughly analogous to the difference between connecting two machines with an Ethernet crossover cable, and a connection over the Internet; LoRaWAN includes more protocol layers atop the basic LoRa. I have yet to learn much about LoRaWAN; I’ll follow up later on that point.

The speed of LoRa ranges from (and different people will say different things here) about 500bps to about 20000bps. LoRa is a packetized protocol, and the maximum packet size depends

LoRa sensors often advertise battery life in the months or years, and can be quite small. The protocol makes an excellent choice for sensors in remote or widely dispersed areas. LoRa transceiver boards for Arduino can be found for under $15 from places like Mouser.

I wound up purchasing two LoStik USB LoRa radios from Amazon. With some experimentation, with even very bad RF conditions (tiny antennas, one of them in the house, the other in a car), I was able to successfully decode LoRa packets from 2 miles away! And these aren’t even the most powerful transmitters available.

Talking UUCP over LoRa

In order to make this all work, I needed to write interface software; the LoRa radios don’t just transmit things straight out. So I wrote lorapipe. I have successfully transmitted files across this UUCP link!

Developing lorapipe was somewhat more challenging than I expected. For one, the LoRa modem raw protocol isn’t well-suited to rapid fire packet transmission; after receiving each packet, the modem exits receive mode and must be told to receive again. Collisions with protocols that ACKd data and had a receive window — which are many — were a problem so bad that it rendered some of the protocols unusable. I wound up adding a “expect more data after this packet” byte to every transmission, and have the receiver not transmit until it believes the sender is finished. This dramatically improved things. There’s more detail on this in my lorapipe documentation.

So far, I have successfully communicated over LoRa using UUCP, kermit, and YMODEM. KISS support will be coming next.

I am also hoping to discover the range I can get from this thing if I use more proper antennas (outdoor) and transmitters capable of transmitting with more power.

All in all, a fun project so far.