All posts by John Goerzen

In Which COVID-19 Misinformation Leads To A Bunch of Graphs Made With Rust

August 13, 2020UncategorizedJohn Goerzen

A funny — and by funny, I mean sad — thing has happened. Recently the Kansas Department of Health and Environment (KDHE) has been analyzing data from the patchwork implementation of mask requirements in Kansas. They came to a conclusion that shouldn’t be surprising to anyone: masks help. They published a chart showing this. A right-wing propaganda publication got ahold of this, and claimed the numbers were “doctored” because there were two-different Y-axes.

I set about to analyze the data myself from public sources, and produced graphs of various kinds using a single Y-axis and supporting the idea that the graphs were not, in fact, doctored. Here’s one graph that’s showing that:

In order to do that, I had imported COVID-19 data from various public sources. Many states in the US are large enough to have significant variation in COVID-19 conditions, and many of the source people look at don’t show county-level data over time. I wanted to do that.

Eventually, I wrote covid19db, which ingests data from a number of public sources and generates a SQLite database file. Using Github Actions, this file is automatically updated every morning and available for download. Or, you can download the code and generate a database yourself locally.

Then, I wrote covid19ks, which generates various pretty graphs covering the data. These graphs, incidentally, turn out to highlight just how poorly the United States is doing compared to the rest of the industrialized world.

I hope that these resources, and especially covid19db, might be useful to others that would like to analyze the data. The code isn’t the prettiest since it was done in a hurry, but I think that functionally this is useful.

COVID-19 is serious for all ages. Treat it like WWII

March 19, 2020Uncategorizedcovid19, healthJohn Goerzen

Today I’d like to post a few updates about COVID-19 which I have gathered from credible sources, as well as some advice also gathered from credible sources.

Summary

Coronavirus causes health impacts requiring hospitalization in a significant percentage of all adult age groups.
Coronavirus also can cause no symptoms at all in many, especially children.
Be serious about social distancing.

COVID-19 is serious for young adults too

According to this report based on a CDC analysis, between 14% and 20% of people aged 20 to 44 require hospitalization due to COVID-19. That’s enough to be taken seriously. See also this CNN story.

Act as if you are a carrier because you may be infected and not even know it, even children

Information on this is somewhat preliminary, but it is certainly known that a certain set of cases is asymptomatic. This article discusses manifestations in children, while this summary of a summary (note: not original research) suggests that 17.9% of people may not even know they are infected.

How serious is this? Serious.

This excellent article by Daniel W. Johnson, MD, is a very good read. Among the points it makes:

Anyone that says it’s no big deal is wrong.
If we treat this like WWI or WWII and everyone does the right things, we will be harmed but OK. If many but not all people do the right things, we’ll be like Italy. If we blow it off, our health care system and life as we know it will be crippled.
If we don’t seriously work to flatten the curve, many lives will be needlessly lost

Advice

I’m going to just copy Dr. Johnson’s advice here:

You and your kids should stay home. This includes not going to church, not going to the gym, not going anywhere.

Do not travel for enjoyment until this is done. Do not travel for work unless your work truly requires it.

Avoid groups of people. Not just crowds, groups. Just be around your immediate family. I think kids should just play with siblings at this point – no play dates, etc.

When you must leave your home (to get groceries, to go to work), maintain a distance of six feet from people. REALLY stay away from people with a cough or who look sick.

When you do get groceries, etc., buy twice as much as you normally do so that you can go to the store half as often. Use hand sanitizer immediately after your transaction, and immediately after you unload the groceries.

I’m not saying people should not go to work. Just don’t leave the house for anything unnecessary, and if you can work from home, do it.

Everyone on this email, besides Mom and Dad, are at low risk for severe disease if/when they contract COVID-19. While this is great, that is not the main point. When young, well people fail to do social distancing and hygiene, they pick up the virus and transmit it to older people who are at higher risk for critical illness or death. So everyone needs to stay home. Even young people.

Tell every person over 60, and every person with significant medical conditions, to avoid being around people. Please do not have your kids visit their grandparents if you can avoid it. FaceTime them.

Our nation is the strongest one in the world. We have been through other extreme challenges and succeeded many times before. We WILL return to normal life. Please take these measures now to flatten the curve, so that we can avoid catastrophe.

I’d also add that many supermarkets offer delivery or pickup options that allow you to get your groceries without entering the store. Some are also offering to let older people shop an hour before the store opens to the general public. These could help you minimize your exposure.

Other helpful links

Here is a Reddit megathread with state-specific unemployment resources.

Scammers are already trying to prey on people. Here are some important tips to avoid being a victim.

Although there are varying opinions, some are recommending avoiding ibuprofen when treating COVID-19.

Bill Gates had some useful advice. Here’s a summary emphasizing the need for good testing.

It Doesn’t Take Much to Make Someone’s Day

March 14, 2020UncategorizedJohn Goerzen

In times like these, it is natural to fear. Viruses, incompetent leadership, economic hardship, even death. But remember this:

When I was a boy and I would see scary things in the news, my mother would say to me, “Look for the helpers. You will always find people who are helping.” To this day, especially in times of “disaster,” I remember my mother’s words, and I am always comforted by realizing that there are still so many helpers – so many caring people in this world.

– Fred Rogers

This is so true. The examples are everywhere. Here in the United States, our federal government has been weak responding to COVID-19 — but others have stepped up. Institutions big and small across the country are following the science and closing or taking other steps to slow the spread of coronavirus, even in areas it hasn’t yet been detected, because this is the right thing to do. People are helping their neighbors, or giving up their favorite activities to do their part to slow the spread of COVID-19. I work for a company that’s publicly-traded on the NYSE, and it shut down all its offices globally. And kept paying the janitors and other office staff.

Some people are in a vulnerable place today. To them: remember the helpers. There are doctors and nurses, officials, neighbors the care, everywhere.

To those that are able: be a helper. It doesn’t take much to brighten someone’s day. Maybe a phone call or video call. Maybe delivering groceries to a neighbor that’s quarantined. Maybe acts of grace and understanding to the stressed people around you, trying their best to get by in the face of a lack of information and certainty. Maybe giving up some activities you enjoy, in order to help slow the spread of COVID-19, even if you personally aren’t especially vulnerable.

I am reminded of this quote, part of a story about a dying cancer patient:

“Don’t forget that it doesn’t take much to make someone’s day.”

The Fundamental Problem in Python 3

November 22, 2019TechnologypythonJohn Goerzen

“In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.”

– Douglas Adams

This expands on my recent post The Incredible Disaster of Python 3. I seem to have annoyed the Internet…

Back in the mists of time, Unix was invented. Today the descendants of Unix, whether literal or in spirit, power the majority of the world’s cell phones, most of the most popular sites on the Internet, etc. And among this very popular architecture, there lies something that has made people very angry at times: on a Unix filesystem, 254 bytes are valid in filenames. The two that are not are 0x00 and the slash character. Otherwise, they are valid in virtually any combination (the special entries “.” and “..” being the exception).

This property has led to a whole host of bugs, particularly in shell scripts. A filename with a leading dash might look like a parameter to a tool. Filenames can contain newline characters, space characters, control characters, and so forth; running ls in a directory with maliciously-named files could certainly scramble one’s terminal. These bugs continue to persist, though modern shells offer techniques that — while optional — can be used to avoid most of these classes of bugs.

It should be noted here that not every valid stream of bytes constitutes a stream of bytes that can be decoded as UTF-8. This is a departure from earlier encoding schemes such as iso-8859-1 and cp437; you might get gibberish, but “garbage in, garbage out” was a thing and if your channel was 8-bit clean, your gibberish would survive unmodified.

Unicode brings many advantages, and has rightly become the predominant standard for text encoding. But the previous paragraph highlights one of the challenges, and this challenge, along with some others, are at the heart of the problem with Python 3. That is because, at a fundamental level, Python 3’s notion of a filename is based on a fiction. Not only that, but it tries to introduce strongly-typed strings into what is fundamentally a weakly-typed, dynamic language.

A quick diversion: The Rust Perspective

The Unicode problem is a problematic one, and it may be impossible to deal with it with complete elegance. Many approaches exist; here I will describe Rust’s string-related types, of which there are three for our purposes:

The String (and related &str) is used for textual data and contains bytes that are guaranteed to be valid UTF-8 at all times
The Vec<u8> (and related [u8]) is a representation of pure binary bytes, of which all 256 possible characters are valid in any combination, whether or not it forms valid UTF-8
And the Path, which represents a path name on the system.

The Path uses the underlying operating system’s appropriate data type (here I acknowledge that Windows is very different from POSIX in this regard, though I don’t go into that here). Compile-time errors are generated when these types are mixed without proper safe conversion.

The Python Fiction

Python, in contrast, has only two types; roughly analogous to the String and the Vec<u8> in Rust. Critically, most of the Python standard library treats a filename as a String – that is, a sequence of valid Unicode code points, which is a subset of the valid POSIX filenames.

Do you see what we just did here? We’ve set up another shell-like situation in which filenames that are valid on the system create unexpected behaviors in a language. Only this time, it’s not \n, it’s things like \xF7.

From a POSIX standpoint, the correct action would have been to use the bytes type for filenames; this would mandate proper encode/decode calls by the user, but it would have been quite clear. It should be noted that some of the most core calls in Python, such as open(), do accept both bytes and strings, but this behavior is by no means consistent in the standard library, and some parts of the library that process filenames (for instance, listdir in its most common usage) return strings.

The Plot Thickens

At some point, it was clearly realized that this behavior was leading to a lot of trouble on POSIX systems. Having a listdir() function be unable (in its common usage; see below) to handle certain filenames was clearly not going to work. So Python introduced its surrogate escape. When using surrogate escapes, when attempting to decode a binary byte that is not valid in UTF-8, it is replaced with a multibyte UTF-8 sequence from Unicode code space that is otherwise rarely used. Then, when converted back to a binary sequence, this Unicode code point is converted to the same original byte. However, this is not a systemwide default and in many cases must be specifically requested.

And now you see this is both an ugly kludge and a violation of the promise of what a string is supposed to be in Python 3, since this doesn’t represent a valid Unicode character at all, but rather a token for the notion that “there was a byte here that we couldn’t convert to Unicode.” Now you have a string that the system thinks is Unicode, that looks like Unicode, that you can process as Unicode — substituting, searching, appending, etc — but which is actually at least partially representing things that should rightly be unrepresentable in Unicode.

And, of course, surrogate escapes are not universally used by even the Python standard library either. So we are back to the problem we had in Python 2: what the heck is a string, anyway? It might be all valid Unicode, it might have surrogate escapes in it, it might have been decoded from the wrong locale (because life isn’t perfect), and so forth.

Unicode Realities

The article pragmatic Unicode highlights these facts:

Computers are built on bytes
The world needs more than 256 symbols
You cannot infer the encoding of bytes — you must be told, or have to guess
Sometimes you are told wrong

I have no reason to quibble with this. How, then, does that stack up with this code from Python? (From zipfile.py, distributed as part of Python)

            if flags & 0x800:
                # UTF-8 file names extension
                filename = filename.decode('utf-8')
            else:
                # Historical ZIP filename encoding
                filename = filename.decode('cp437')

There is a reason that Python can’t extract a simple ZIP file properly. The snippet above violated the third rule by inferring a cp437 encoding when it shouldn’t. But it’s worse; the combination of factors leads extracall() to essentially convert a single byte from CP437 to a multibyte Unicode code point on extraction, rather than simply faithfully reproducing the bytestream that was the filename. Oh, and it doesn’t use surrogate escapes. Good luck with that one.

It gets even worse

Let’s dissect Python’s disastrous documentation on Unicode filenames.

First, we begin with the premise that there is no filename encoding in POSIX. Filenames are just blobs of bytes. There is no filename encoding!

What about $LANG and friends? They give hints about the environment, languages for interfaces, and terminal encoding. They can often be the best HINT as to how we should render characters and interpret filenames. But they do not subvert the fundamental truth, which is that POSIX filenames do not have to conform to UTF-8.

So, back to the Python documentation. Here are the problems with it:

It says that there will be a filesystem encoding if you set LANG or LC_CTYPE, falling back to UTF-8 if not specified. As we have already established, UTF-8 can’t handle POSIX filenames.
It gets worse: “The os.listdir() function returns filenames, which raises an issue: should it return the Unicode version of filenames, or should it return bytes containing the encoded versions? os.listdir() can do both”. So we are somewhat tacitly admitting here that str was a poor choice for filenames, but now we try to have it every which way. This is going to end badly.
And then there’s this gem: “Note that on most occasions, you should can just stick with using Unicode with these APIs. The bytes APIs should only be used on systems where undecodable file names can be present; that’s pretty much only Unix systems now.” Translation: Our default advice is to pretend the problem doesn’t exist, and will cause your programs to be broken or crash on POSIX.

Am I just living in the past?

This was the most common objection raised to my prior post. “Get over it, the world’s moved on.” Sorry, no. I laid out the case for proper handling of this in my previous post. But let’s assume that your filesystems are all new, with shiny UTF-8 characters. It’s STILL a problem. Why? Because it is likely that an errant or malicious non-UTF-8 sequence will cause a lot of programs to crash or malfunction.

We know how this story goes. All the shell scripts that do the wrong thing when “; rm” is in a filename, for instance. Now, Python is not a shell interpreter, but if you have a program that crashes on a valid filename, you have — at LEAST — a vector for denial of service. Depending on the circumstances, it could turn into more.

Conclusion

Some Python 3 code is going to crash or be unable to process certain valid POSIX filenames.
Some Python 3 code might use surrogate escapes to handle them.
Some Python 3 code — part of Python itself even — just assumes it’s all from cp437 (DOS) and converts it that way.
Some people recommend using latin-1 instead of surrogate escapes – even official Python documentation covers this.

The fact is: A Python string is the WRONG data type for a POSIX filename, and so numerous, incompatible kludges have been devised to work around this problem. There is no consensus on which kludge to use, or even whether or not to use one, even within Python itself, let alone the wider community. We are going to continue having these problems as long as Python continues to use a String as the fundamental type of a filename.

Doing the right thing in Python 3 is extremely hard, not obvious, and rarely taught. This is a recipe for a generation of buggy code. Easy things should be easy; hard things should be possible. Opening a file correctly should be easy. Sadly I fear we are in for many years of filename bugs in Python, because this would be hard to fix now.

Resources

(For even more fun, consider command line parameters and environment variables! I’m annoyed enough with filenames to leave those alone for now.)

The Incredible Disaster of Python 3

November 20, 2019SoftwarepythonJohn Goerzen

Update 2019-11-22: A successor article to this one dives into some of the underlying complaints.

I have long noted issues with Python 3’s bytes/str separation, which is designed to have a type “bytes” that is a simple list of 8-bit characters, and “str” which is a Unicode string. After apps started using Python 3, I started noticing issues: they couldn’t open filenames that were in ISO-8859-1, gpodder couldn’t download podcasts with 8-bit characters in their title, etc. I have files on my system dating back to well before widespread Unicode support in Linux.

Due to both upstream and Debian deprecation of Python 2, I have been working to port pygopherd to Python 3. I was not looking forward to this task. It turns out that the string/byte types in Python 3 are even more of a disaster than I had at first realized.

Background: POSIX filenames

On POSIX platforms such as Unix, a filename consists of one or more 8-bit bytes, which may be any 8-bit value other than 0x00 or 0x2F (‘/’). So a file named “test\xf7.txt” is perfectly acceptable on a Linux system, and in ISO-8859-1, that filename would contain the division sign ÷. Any language that can’t process valid filenames has serious bugs – and Python is littered with these bugs.

Inconsistencies in Types

Before we get to those bugs, let’s look at this:

>>> "/foo"[0]
'/'
>>> "/foo"[0] == '/'
True
>>> b"/foo"[0]
47
>>> b"/foo"[0] == '/'     # this will fail anyhow because bytes never equals str
False
>>> b"/foo"[0] == b'/'
False
>>> b"/foo"[0] == b'/'[0]
True

Look at those last two items. With the bytes type, you can’t compare a single element of a list to a single character, even though you still can with a str. I have no explanation for this mysterious behavior, though thankfully the extensive tests I wrote in 2003 for pygopherd did cover it.

Bugs in the standard library

A whole class of bugs arise because parts of the standard library will accept str or bytes for filenames, while other parts accept only str. Here are the particularly egregious examples I ran into.

Python 3’s zipfile module is full of absolutely terrible code. As I reported in Python bug 38861, even a simple zipfile.extractall() fails to faithfully reproduce filenames contained in a ZIP file. Not only that, but there is egregious code like this in zipfile.py:

            if flags & 0x800:
                # UTF-8 file names extension
                filename = filename.decode('utf-8')
            else:
                # Historical ZIP filename encoding
                filename = filename.decode('cp437')

I can assure you that zip on Unix was not mystically converting filenames from iso-8859-* to cp437 (which was from DOS, and almost unheard-of on Unix). Or how about this gem:

    def _encodeFilenameFlags(self):
        try:
            return self.filename.encode('ascii'), self.flag_bits
        except UnicodeEncodeError:
            return self.filename.encode('utf-8'), self.flag_bits | 0x800

This combines to a situation where perfectly valid filenames cannot be processed by the zipfile module, valid filenames are mangled on extraction, and unwanted and incorrect character set conversions are performed. zipfile has no mechanism to access ZIP filenames as bytes.

How about the dbm module? It simply has no way to specify a filename as bytes, and absolutely can’t open a file named “text\x7f”. There is simply no way to make that happen. I reported this in Python bug 38864.

Update 2019-11-20: As is pointed out in the comments, there is a way to encode this byte in a Unicode string in Python, so “absolutely can’t open” was incorrect. However, I strongly suspect that little code uses that approach and it remains a problem.

I should note that a simple open(b"foo\x7f.txt", "w") works. The lowest-level calls are smart enough to handle this, but the ecosystem built atop them is uneven at best. It certainly doesn’t help that things like b"foo" + "/" are runtime crashers.

Larger Consequences of These Issues

I am absolutely convinced that these are not the only two modules distributed with Python itself that are incapable of opening or processing valid files on a Unix system. I fully expect that these issues are littered throughout the library. Nobody appears to be testing for them. Nobody appears to care about them.

It is part of a worrying trend I have been seeing lately of people cutting corners and failing to handle valid things that have been part of the system for years. We are, by example and implementation, teaching programmers that these shortcuts are fine, that it’s fine to use something that is required to be utf-8 to refer to filenames on Linux, etc. A generation of programmers will grow up writing code that is incapable of processing files with perfectly valid names. I am thankful that grep, etc. aren’t written in Python, because if they were, they’d crash all the time.

Here are some other examples:

When running “git status” on my IBM3151 terminal connected to Linux, I found it would clear the screen each time. Huh. Apparently git assumes that if you’re using it from a terminal, the terminal supports color, and it doesn’t bother using terminfo; it just sends ANSI sequences assuming that everything uses them. The IBM3151 doesn’t by default. (GNU tools like ls get this right) This is but one egregious example of a whole suite of tools that fail to use the ncurses/terminfo libraries that we’ve had for years to properly abstract these things.
A whole suite of tools, including ssh, tmux, and so forth, blindly disable handling of XON/XOFF on the terminal, neglecting the fact that this is actually quite important for some serial lines. Thankfully I can at least wrap things in GNU Screen to get proper XON/XOFF handling.
The Linux Keyspan USB serial driver doesn’t even implement XON/XOFF handling at all.

Now, you might make an argument “Well, ISO-8859-* is deprecated. We’ve all moved on to Unicode!” And you would be, of course, wrong. Unix had roughly 30 years of history before xterm supported UTF-8. It would be quite a few more years until UTF-8 reached the status of default for many systems; it wasn’t until Debian etch in 2007 that Debian used utf-8 by default. Files with contents or names in other encoding schemes exist and people find value in old files. “Just rename them all!” you might say. In some situations, that might work, but consider — how many symlinks would it break? How many scripts that refer to things by filenames would it break? The answer is most certainly nonzero. There is no harm in having files laying about the system in other encoding schemes — except to buggy software that can’t cope. And this post doesn’t even concern the content of files, which is a whole additional problem, though thankfully the situation there is generally at least somewhat better.

There are also still plenty of systems that can’t handle multibyte characters (and in various embedded or mainframe contexts, can’t even handle 8-bit characters). Not all terminals support ANSI. It requires only correct thinking (“What is a valid POSIX filename? OK, our datatypes better support that then”) to do the right thing.

Update 1, 2019-11-21: Here is an article dating back to 2014 about the Unicode issues in Python 3, which goes into quite a bit of detail about it. It lays out a compelling case for the issues with its attempt to implement a replacement for cat in python 2 and 3. The Practical Python porting for systems programmers is also relevant and, like me, highlights many of these same issues. Finally, this is not the first time I raised issues; I wrote The Python Unicode Mess more than a year ago. Unfortunately, as I am now working to port a larger codebase, the issues I raised before are more acute, and I have discovered more. At this point, I am extremely unlikely to use Python for any new project due to these issues.

A Mystery of Unix History

November 19, 2019UncategorizedJohn Goerzen

I wrote recently about buying a Digital (DEC) vt420 and hooking it up to Linux. Among my observations on the vt420, which apparently were among the most popular to use with Unix systems, are these:

DEC keyboards had no Esc key
DEC keyboards had no key that could be used as Alt or Meta

The two most popular historic editors on Unix, vi and emacs, both make heavy use of these features (Emacs using Esc when Alt or Meta is unavailable). Some of the later entries in the DEC terminal line, especially the vt510, supported key remapping or alternative keyboards, which can address the Esc issue, but not entirely.

According to the EmacsOnTerminal page and other research, at least the vt100 through the vt420 lacked Esc by default. Ctrl-3 and Ctrl-[ could send the character. However, this is downright terrible for both vi and Emacs (as this is the only way to trigger meta commands in Emacs).

What’s more, it seems almost none of these old serial terminal support hardware flow control, and flow control is an absolute necessity on many. That implies XON/XOFF, which use Ctrl-S and Ctrl-Q — both of which are commonly used in Emacs.

Both vi and Emacs trace their roots back to the 1970s and were widely used in the serial terminal era, running on hardware dominated by DEC and its serial terminals.

So my question is: why would both of these editors be developed in such a way that they are downright inconvenient to use on the hardware on which they most frequently ran?

Update 2019-11-20: It appears that the vt100 did have the Esc key, but it was dropped with the vt220. At least the vt420 and later, and possibly as far back as the vt220, let you map one of a few other keys to be Esc. This still leaves the Ctrl-S mystery in Emacs though.

TCP/IP over LoRa radios

October 31, 2019Technologyamateur radio, lora, radioJohn Goerzen

As I wrote yesterday, I have been experimenting with LoRa radios. Today, I got TCP/IP working over them!

The AX.25 protocol did indeed turn out to be well-suited to this. It’s simple and works. The performance is, predictably, terrible; ping times around 500-600ms, but it does work. I fired up ssh, ran emacs, did a bit with bash, and — yep! Very cool. I tried mosh as well, thinking it would be great for this, but for some reason it just flooded the link with endless packets and was actually rather terrible.

I wrote up how to use it. It’s not even all that hard!

Pretty satisfying seeing this work.

Long-Range Radios: A Perfect Match for Unix Protocols From The 70s

October 30, 2019Linux, Programmingamateur radio, lora, radioJohn Goerzen

It seems I’ve been on a bit of a vintage computing kick lately. After connecting an original DEC vt420 to Linux and resurrecting some old operating systems, I dove into UUCP.

In fact, it so happened that earlier in the week, my used copy of Managing UUCP & Usenet (its author list includes none other than Tim O’Reilly) arrived. I was reading about the challenges of networking in the 70s: half-duplex lines, slow transmission rates, and modems that had separate dialers. And then I stumbled upon long-distance radio. It turns out that a lot of modern long-distance radio has much in common with the challenges of communication in the 1970s – 1990s, and some of our old protocols might be particularly well-suited for it. Let me explain — I’ll start with the old software, and then talk about the really cool stuff going on in hardware (some radios that can send a signal for 10-20km or more with very little power!), and finally discuss how to bring it all together.

UUCP

UUCP, for those of you that may literally have been born after it faded in popularity, is a batch system for exchanging files and doing remote execution. For users, the uucp command copies files to or from a remote system, and uux executes commands on a remote system. In practical terms, the most popular use of this was to use uux to execute rmail on the remote system, which would receive an email message on stdin and inject it into the system’s mail queue. All UUCP commands are queued up and transmitted when a “call” occurs — over a modem, TCP, ssh pipe, whatever.

UUCP had to deal with all sorts of line conditions: very slow lines (300bps), half-duplex lines, noisy and error-prone communication, poor or nonexistent flow control, even 7-bit communication. It supports a number of different transport protocols that can accommodate these varying conditions. It turns out that these mesh fairly perfectly with some properties of modern long-distance radio.

AX.25

The AX.25 stack is a frame-based protocol used by amateur radio folks. Its air speed is 300bps, 1200bps, or (rarely) 9600bps. The Linux kernel has support for the AX.25 protocol and it is quite possible to run TCP/IP atop it. I have personally used AX.25 to telnet to a Linux box 15 miles away over a 1200bps air speed, and have also connected all the way from Kansas to Texas and Indiana using 300bps AX.25 using atmospheric skip. AX.25 has “connected” packets (as TCP) and unconnected/broadcast ones (similar to UDP) and is a error-detected protocol with retransmit. The radios generally used with AX.25 are always half-duplex and some of them have iffy carrier detection (which means collision is frequent). Although the whole AX.25 stack has grown rare in recent years, a subset of it is still in wide use as the basis for APRS.

A lot of this is achieved using equipment that’s not particularly portable: antennas on poles, radios that transmit with anywhere from 1W to 100W of power (even 1W is far more than small portable devices normally use), etc. Also, under the regulations of the amateur radio service, transmitters must be managed by a licensed operator and cannot be encrypted.

Nevertheless, AX.25 is just a protocol and it could, of course, run on other kinds of carriers than traditional amateur radios.

Long-range low-power radios

There is a lot being done with radios these days, much of which I’m not going to discuss. I’m not covering very short-range links such as Bluetooth, ZigBee, etc. Nor am I covering longer-range links that require large and highly-directional antennas (such as some are doing in the 2.4GHz and 5GHz bands). What I’m covering is long-range links that can be used by portable devices.

There is always a compromise in radios, and if we are going to achieve long-range links with poor antennas and low power, the compromise is going to be in bitrate. These technologies may scale down to as low at 300bps or up to around 115200bps. They can, as a side bonus, often be quite cheap.

HC-12 radios

HC-12 is a radio board, commonly used with Arduino, that sports 500bps to 115200bps communication. According to the vendor, in 500bps mode, the range is 1800m or 0.9mi, while at 115200bps, the range is 100m or 328ft. They’re very cheap, at around $5 each.

There are a few downsides to HC-12. One is that the lowest air bitrate is 500bps, but the lowest UART bitrate is 1200bps, and they have no flow control. So, if you are running in long-range mode, “only small packets can be sent: max 60 bytes with the interval of 2 seconds.” This would pose a challenge in many scenarios: though not much for UUCP, which can be perfectly well configured to have a 60-byte packet size and a window size of 1, which would wait for a remote ACK before proceeding.

Also, they operate over 433.4-473.0 MHz which appears to fall outside the license-free bands. It seems that many people using HC-12 are doing so illegally. With care, it would be possible to operate it under amateur radio rules, since this range is mostly within the 70cm allocation, but then it must follow amateur radio restrictions.

LoRa radios

LoRa is a set of standards for long range radios, which are advertised as having a range of 15km (9mi) or more in rural areas, and several km in cities.

LoRa can be done in several ways: the main LoRa protocol, and LoRaWAN. LoRaWAN expects to use an Internet gateway, which will tell each node what frequency to use, how much power to use, etc. LoRa is such that a commercial operator could set up roughly one LoRaWAN gateway per city due to the large coverage area, and some areas have good LoRa coverage due to just such operators. The difference between the two is roughly analogous to the difference between connecting two machines with an Ethernet crossover cable, and a connection over the Internet; LoRaWAN includes more protocol layers atop the basic LoRa. I have yet to learn much about LoRaWAN; I’ll follow up later on that point.

The speed of LoRa ranges from (and different people will say different things here) about 500bps to about 20000bps. LoRa is a packetized protocol, and the maximum packet size depends

LoRa sensors often advertise battery life in the months or years, and can be quite small. The protocol makes an excellent choice for sensors in remote or widely dispersed areas. LoRa transceiver boards for Arduino can be found for under $15 from places like Mouser.

I wound up purchasing two LoStik USB LoRa radios from Amazon. With some experimentation, with even very bad RF conditions (tiny antennas, one of them in the house, the other in a car), I was able to successfully decode LoRa packets from 2 miles away! And these aren’t even the most powerful transmitters available.

Talking UUCP over LoRa

In order to make this all work, I needed to write interface software; the LoRa radios don’t just transmit things straight out. So I wrote lorapipe. I have successfully transmitted files across this UUCP link!

Developing lorapipe was somewhat more challenging than I expected. For one, the LoRa modem raw protocol isn’t well-suited to rapid fire packet transmission; after receiving each packet, the modem exits receive mode and must be told to receive again. Collisions with protocols that ACKd data and had a receive window — which are many — were a problem so bad that it rendered some of the protocols unusable. I wound up adding a “expect more data after this packet” byte to every transmission, and have the receiver not transmit until it believes the sender is finished. This dramatically improved things. There’s more detail on this in my lorapipe documentation.

So far, I have successfully communicated over LoRa using UUCP, kermit, and YMODEM. KISS support will be coming next.

I am also hoping to discover the range I can get from this thing if I use more proper antennas (outdoor) and transmitters capable of transmitting with more power.

All in all, a fun project so far.

Resurrecting Ancient Operating Systems on Debian, Raspberry Pi, and Docker

October 4, 2019TechnologyJohn Goerzen

I wrote recently about my son playing Zork on a serial terminal hooked up to a PDP-11, and how I eventually bought a vt420 (ok, some vt420s and vt510s, I couldn’t stop at one) and hooked it up to a Raspberry Pi.

This led me down another path: there is a whole set of hardware and software that I’ve never used. For some, it fell out of favor before I could read (and for others, before I was even born).

The thing is – so many of these old systems have a legacy that we live in today. So much so, in fact, that we are now seeing articles about how modern CPUs are fast PDP-11 emulators in a sense. The PDP-11, and its close association with early Unix, lives on in the sense that its design influenced microprocessors and operating systems to this day. The DEC vt100 terminal is, nowadays, known far better as that thing that is emulated, but it was, in fact, a physical thing. Some goes back into even mistier times; Emacs, for instance, grew out of the MIT ITS project but was later ported to TOPS-20 before being associated with Unix. vi grew up in 2BSD, and according to wikipedia, was so large it could barely fit in the memory of a PDP-11/70. Also in 2BSD, a buggy version of Zork appeared — so buggy, in fact, that the save game option was broken. All of this happened in the late 70s.

When we think about the major developments in computing, we often hear of companies like IBM, Microsoft, and Apple. Of course their contributions are undeniable, and emulators for old versions of DOS are easily available for every major operating system, plus many phones and tablets. But as the world is moving heavily towards Unix-based systems, the Unix heritage is far more difficult to access.

My plan with purchasing and setting up an old vt420 wasn’t just to do that and then leave. It was to make it useful for modern software, and also to run some of these old systems under emulation.

To that end, I have released my vintage computing collection – both a script for setting up on a system, and a docker image. You can run Emacs and TECO on TOPS-20, zork and vi on 2BSD, even Unix versions 5, 6, and 7 on a PDP-11. And for something particularly rare, RDOS on a Data General Nova. I threw in some old software compiled for modern systems: Zork, Colossal Cave, and Gopher among them. The bsdgames collection and some others are included as well.

I hope you enjoy playing with the emulated big-iron systems of the 70s and 80s. And in a dramatic turnabout of scale and cost, those machines which used to cost hundreds of thousands of dollars can now be run far faster under emulation on a $35 Raspberry Pi.

Connecting A Physical DEC vt420 to Linux

September 30, 2019Hardware, LinuxJohn Goerzen

John and Oliver trip to Vintage Computer Festival Midwest 2019. Oliver playing Zork on the Micro PDP-11

Inspired by a weekend visit to Vintage Computer Festival Midwest at which my son got to play Zork on an amber console hooked up to a MicroPDP-11 running 2BSD, I decided it was time to act on my long-held plan to get a real old serial console hooked up to Linux.

Not being satisfied with just doing it for the kicks, I wanted to make it actually usable. 30-year-old DEC hardware meets Raspberry Pi. I thought this would be pretty easy, but it turns out is was a lot more complicated than I realized, involving everything from nonstandard serial connectors to long-standing kernel bugs!

Selecting a Terminal — And Finding Parts

I wanted something in amber for that old-school feel. Sadly I didn’t have the forethought to save any back in the 90s when they were all being thrown out, because now they’re rare and can be expensive. Search eBay and pretty soon you find a scattering of DEC terminals, the odd Bull or Honeywell, some Sperrys, and assorted oddballs that don’t speak any kind of standard protocol. I figured, might as well get a vt, since we’re still all emulating them now, 40+ years later. Plus, my old boss from my university days always had stories about DEC. I wish he were still around to see this.

I selected the vt420 because I was able to find them, and it has several options for font size, letting more than 24 lines fit on a screen.

Now comes the challenge: most of the vt420s never had a DB25 RS-232 port. The VT420-J, an apparently-rare international model, did, but it is exceptionally rare. The rest use a DEC-specific port called the MMJ. Thankfully, it is electrically compatible with RS-232, and I managed to find the DEC H8571-J adapter as well as a BC16E MMJ cable that I need.

I also found a vt510 (with “paperwhite” instead of amber) in unknown condition. I purchased it, and thankfully it is also working. The vt510 is an interesting device; for that model, they switched to using a PS/2 keyboard connector, and it can accept either a DEC VT keyboard or a PC keyboard. It also supports full key remapping, so Control can be left of A as nature intended. However, there’s something about amber that is just so amazing to use again.

Preparing the Linux System

I thought I would use a Raspberry Pi as a gateway for this. With built-in wifi, that would let me ssh to other machines in my house without needing to plug in a serial cable – I could put the terminal wherever. Alternatively, I can plug in a USB-to-serial adapter to my laptop and just plug the terminal into it when I want. I wound up with a Raspberry Pi 4 kit that included some heatsinks.

I had two USB-to-serial adapters laying around: a Keyspan USA-19HS and a Digi I/O Edgeport/1. I started with the Keyspan on a Raspberry Pi 4 on the grounds that I didn’t have the needed Edgeport/1 firmware file laying about already. The Raspberry Pi does have serial capability integrated, but it doesn’t use RS-232 voltages and there have been reports of it dropping characters sometimes, so I figured the easy path would be a USB adapter. That turned out to be only partially right.

Serial Terminals with systemd

I have never set up a serial getty with systemd — it has, in fact, been quite a long while since I’ve done anything involving serial other than the occasional serial console (which is a bit different purpose).

It would have taken a LONG time to figure this out, but thanks to an article about the topic, it was actually pretty easy in the end. I didn’t set it up as a serial console, but spawning a serial getty did the trick. I wound up modifying the command like this:

ExecStart=-/sbin/agetty -8 -o '-p -- \\u' %I 19200 vt420

The vt420 supports speeds up to 38400 and the vt510 supports up to 115200bps. However, neither can process plain text at faster than 19200 so there is no point to higher speeds. And, as you are about to see, they can’t necessarily even muster 19200 all the time.

Flow Control: Oh My

The unfortunate reality with these old terminals is that the processor in them isn’t actually able to keep up with line speeds. Any speed above 4800bps can exceed processor capabilities when “expensive” escape sequences are sent. That means that proper flow control is a must. Unfortunately, the vt420 doesn’t support any form of hardware flow control. XON/XOFF is all it’ll do. Yeah, that stinks.

So I hooked the thing up to my desktop PC with a null-modem cable, and started to tinker. I should be able to send a Ctrl-S down the line and the output from the pi should immediately stop. It didn’t. Huh. I verified it was indeed seeing the Ctrl-S (open emacs, send Ctrl-S, and it goes into search mode). So something, somehow, was interfering.

After a considerable amount of head scratching, I finally busted out the kernel source. I discovered that the XON/XOFF support is part of the serial driver in Linux, and that — ugh — the keyspan serial driver never actually got around to implementing it. Oops. That’s a wee bit of a bug. I plugged in the Edgeport/1 instead of the Keyspan and magically XON/XOFF started working.

Well, for a bit.

You see, flow control is a property of the terminal that can be altered by programs on a running system. It turns out that a lot of programs have opinions about it, and those opinions generally run along the lines of “nobody could possibly be using XON/XOFF, so I’m going to turn it off.” Emacs is an offender here, but it can be configured. Unfortunately, the most nasty offender here is ssh, which contains this code that is ALWAYS run when using a pty to connect to a remote system (which is for every interactive session):

tio.c_iflag &= ~(ISTRIP | INLCR | IGNCR | ICRNL | IXON | IXANY | IXOFF);

Yes, so when you use ssh, your local terminal no longer does flow control. If you are particularly lucky, the remote end may recognize your XON/XOFF characters and process them. Unfortunately, the added latency and buffering in going through ssh and the network is likely to cause bursts of text to exceed the vt420’s measly 100-ish-byte buffer. You just can’t let the remote end handle flow control with ssh. I managed to solve this via GNU Screen; more on that later.

The vt510 supports hardware flow control! Unfortunately, it doesn’t use CTS/RTS pins, but rather DTR/DSR. This was a reasonably common method in the day, but appears to be totally unsupported in Linux. Bother. I see some mentions that FreeBSD supports DTR/DSR flow (dtrflow and dsrflow in stty outputs). It definitely looks like the Linux kernel has never plumbed out the reaches of RS-232 very well. It should be possible to build a cable to swap DTR/DSR over to CTS/RTS, but since the vt420 doesn’t support any of this anyhow, I haven’t bothered.

Character Sets

Back when the vt420 was made, it was pretty hot stuff that it was one of the first systems to support the new ISO-8859-1 standard. DEC was rather proud of this. It goes without saying that the terminal knows nothing of UTF-8.

Nowadays, of course, we live in a Unicode world. A lot of software crashes on ISO-8859-1 input (I’m looking at you, Python 3). Although I have old files from old systems that have ISO-8859-1 encoding, they are few and far between, and UTF-8 rules the roost now.

I can, of course, just set LANG=en_US and that will do — well, something. man, for instance, renders using ISO-8859-1 characters. But that setting doesn’t imply that any layer of the tty system actually converts output from UTF-8 to ISO-8859-1. For instance, if I have a file with a German character in it and use ls, nothing is going to convert it from UTF-8 to ISO-8859-1.

GNU Screen also, as it happens, mostly solves this.

GNU Screen to the rescue, somewhat

It turns out that GNU Screen has features that can address both of these issues. Here’s how I used it.

First, in my .bashrc, I set this:

if [ `tty` = "/dev/ttyUSB0" ]; then stty -iutf8 export LANG=en_US export MANOPT="-E ascii" fi

Then, in my .screenrc, I put this:

defflow on defencoding UTF-8

This tells screen that the default flow control mode is on, and that the default encoding for the pty that screen creates is UTF-8. It determines the encoding for the physical terminal for the environment, and correctly figures it to be ISO-8859-1. It then maps between the two! Yes!

My little ssh connecting script then does just this:

exec screen ssh "$@"

Which nicely takes care of the flow control issue and (most of) the encoding issue. I say “most” because now things like man will try to render with fancy em-dashes and the like, which have no representation in iso8859-1, so they come out as question marks. (Setting MANOPT=”-E ascii” fixes this) But no matter, it works to ssh to my workstation and read my email! (mu4e in emacs)

What screen doesn’t help with are things that have no ISO-8859-1 versions; em-dashes are the most frequent problems, and are replaced with unsightly question marks.

termcaps, terminfos, and weird things

So pretty soon you start diving down the terminal rabbit hole, and you realize there’s a lot of weird stuff out there. For instance, one solution to the problem of slow processors in terminals was padding: ncurses would know how long it would take the terminal to execute some commands, and would send it NULLs for that amount of time. That calculation, of course, requires knowledge of line speed, which one wouldn’t have in this era of ssh. Thankfully the vt420 doesn’t fall into that category.

But it does have a ton of modes. The Emacs On Terminal page discusses some of the interesting bits: 7-bit or 8-bit control characters, no ESC key, Alt key not working, etc, etc. I believe some of these are addressed by the vt510 (at least in PC mode). I wonder whether Emacs or vim keybindings would be best here…

Helpful Resources

Terminals Wiki
vt100.net – mainly about DEC terminals vt50 and up, but some about others as well.
Wyse Terminal Knowledge Base (via archive.org)
Televideo terminals (via archive.org)
Wikipedia on terminals
Richard Shuford’s fantastic archive of Video Terminal Information – now sadly offline and only available frmo archive.org.
Text-Terminal HOWTO
Using an ASCII terminal for Debian
Back to the future: Using a vt220
Text-mode browser resources:
- Browsh
- Text-based web browser list

The Changelog

Comments on family, technology, and society