Alas, Poor PGP

July 20, 2019UncategorizedJohn Goerzen

Over in The PGP Problem, there’s an extended critique of PGP (and also specifics of the GnuPG implementation) in a modern context. Robert J. Hansen, one of the core GnuPG developers, has an interesting response:

First, RFC4880bis06 (the latest version) does a pretty good job of bringing the crypto angle to a more modern level. There’s a massive installed base of clients that aren’t aware of bis06, and if you have to interoperate with them you’re kind of screwed: but there’s also absolutely nothing prohibiting you from saying “I’m going to only implement a subset of bis06, the good modern subset, and if you need older stuff then I’m just not going to comply.” Sequoia is more or less taking this route — more power to them.

Second, the author makes a couple of mistakes about the default ciphers. GnuPG has defaulted to AES for many years now: CAST5 is supported for legacy reasons (and I’d like to see it dropped entirely: see above, etc.).

Third, a couple of times the author conflates what the OpenPGP spec requires with what it permits, and with how GnuPG implements it. Cleaner delineation would’ve made the criticisms better, I think.

But all in all? It’s a good criticism.

The problem is, where does that leave us? I found the suggestions in the original author’s article (mainly around using IM apps such as Signal) to be unworkable in a number of situations.

The Problems With PGP

Before moving on, let’s tackle some of the problems identified.

The first is an assertion that email is inherently insecure and can’t be made secure. There are some fairly convincing arguments to be made on that score; as it currently stands, there is little ability to hide metadata from prying eyes. And any format that is capable of talking on the network — as HTML is — is just begging for vulnerabilities like EFAIL.

But PGP isn’t used just for this. In fact, one could argue that sending a binary PGP message as an attachment gets around a lot of that email clunkiness — and would be right, at the expense of potentially more clunkiness (and forgetfulness).

What about the web-of-trust issues? I’m in agreement. I have never really used WoT to authenticate a key, only in rare instances trusting an introducer I know personally and from personal experience understand how stringent they are in signing keys. But this is hardly a problem for PGP alone. Every encryption tool mentioned has the problem of validating keys. The author suggests Signal. Signal has some very strong encryption, but you have to have a phone number and a smartphone to use it. Signal’s strength when setting up a remote contact is as strong as SMS. Let that disheartening reality sink in for a bit. (A little social engineering could probably get many contacts to accept a hijacked SIM in Signal as well.)

How about forward secrecy? This is protection against a private key that gets compromised in the future, because an ephemeral session key (or more than one) is negotiated on each communication, and the secret key is never stored. This is a great plan, but it really requires synchronous communication (or something approaching it) between the sender and the recipient. It can’t be used if I want to, for instance, burn a backup onto a Bluray and give it to a friend for offsite storage without giving the friend access to its contents. There are many, many situations where synchronous key negotiation is impossible, so although forward secrecy is great and a nice enhancement, we should assume it to be always applicable.

The saltpack folks have a more targeted list of PGP message format problems. Both they, and the article I link above, complain about the gpg implementation of PGP. There is no doubt truth to these. Among them is a complaint that gpg can emit unverified data. Well sure, because it has a streaming mode. It exits with a proper error code and warnings if a verification fails at the end — just as gzcat does. This is a part of the API that the caller needs to be aware of. It sounds like some callers weren’t handling this properly, but it’s just a function of a streaming tool.

Additional Discussions

The Desktop Security Nightmare

July 18, 2019LinuxJohn Goerzen

Back in 1995 or so, pretty much everyone with a PC did all their work as root. We ran graphics editors, word processors, everything as root. Well, not literally an account named “root”, but the most common DOS, Windows, and Mac operating systems of the day had no effective reduced privilege account.

It was that year that I tried my first Unix. “Wow!” A virus can’t take over my system. My programs are safe!

That turned out to be a little short-sighted.

The fundamental problem we have is that we’d like to give users of a computer more access than we would like to give the computer itself.

Many of us have extremely sensitive data on our systems. Emails to family, medical or bank records, Bitcoin wallets, browsing history, the list goes on. Although we have isolation between our user account and root, we have no isolation between applications that run as our user account. We still, in effect, have to be careful about what attachments we open in email.

Only now it’s worse. You might “npm install hello-world”, and audit hello-world itself, but get some totally malicious code as well. How many times do we see instructions to gem install this, pip install that, go get the other, and even curl | sh? Nowadays our risky click isn’t an email attachment. It’s hosted on Github with a README.md.

Not only that, but my /usr/bin has over 4000 binaries. Have every one been carefully audited? Certainly not, and this is from a distro with some of the highest quality control around. What about the PPAs that people add? The debs or rpms that are installed from the Internet? Are you sure that the postinst scripts — which run as root — aren’t doing anything malicious when you install Oracle Virtualbox?

Wouldn’t it be nice if we could, say, deny access to everything in ~/.ssh or ~/bankstatements except for trusted programs when we want it? On mobile, this happens, to an extent. But we have both a legacy of a different API on desktop, and a much more demanding set of requirements.

It feels like our ecosystem is on the cusp of being able to do this, but none of the options I’ve looked at quite get us there. Let’s take a look at some.

AppArmor

AppArmor falls into the “first line of defense — better than nothing” category. It works by imposing mandatory access controls on a per-executable basis. This starts out as a pretty good idea: we can go after some high-risk targets (Firefox, Chromium, etc) and lock them down. Great! Although it’s not exactly intuitive, with a little configuration, you can prevent them from accessing sensitive areas on disk.

But there’s a problem. Actually, several. To start with, AppArmor does nothing by default. On my system, aa-unconfined --paranoid lists 171 processes that have no policies on them. Among them are Firefox, Apache, ssh, a ton of Pythons, and some stuff I don’t even recognize (/usr/lib/geoclue-2.0/demos/agent? What’s this craziness?)

Worse, since AppArmor matches on executable, all shell scripts would match the /bin/bash profile, all Python programs the Python profile, etc. It’s not so useful for them. While AppArmor does technically have a way to set a default profile, it’s not as useful as you might think.

Then you’re still left with problems like: a PDF viewer should not ordinarily have access to my sensitive files — except when I want to see an old bank statement. This can’t really be expressed in AppArmor.

SELinux

From its documentation, it sounds like SELinux might fit the bill well. It allows transitions into different roles after logging in, which is nice. The problem is complexity. The “notebook” for SELinux is 395 pages. The SELinux homepage has a wiki, which says it’s outdated and replaced by a github link with substantially less information. The Debian wiki page on it is enough to be scary in itself: you need to have various filesystem support, even backups are complicated. Ted T’so had a famous comment about never getting some of his life back, and the Debian wiki also warns that it’s not really tested on desktop systems.

We have certainly learned that complexity is an enemy of good security, leading users to just bypass it. I’m not sure we can rely on it.

Mount Tricks

One thing a person could do would be to keep the sensitive data on a separate, ideally encrypted, filesystem. (Maybe even a fuse one such as gocryptfs.) Then, at least, it could be unavailable for most of the time the system is on.

Of course, the downside here is that it’s still going to be available to everything when it is mounted, and there’s the hassle of mounting, remembering to unmount, password typing, etc. Not exactly transparent.

I wondered if mount namespaces might be an answer here. A filesystem could be mounted but left pretty much unavailable to processes unless a proper mount namespace is joined. Indeed that might be a solution. It is somewhat complicated, though, since nsenter requires root to work. Enter sudo, and dropping privileges back to a particular user — a not particularly ideal situation, and complex as well.

Still, it might well have some promise for some of these things.

Firejail

Firejail is a great idea, but suffers from a lot of the problems that AppArmor does: things must explicitly be selected to run within it.

AppImage and related tools

So now there’s your host distro and your bundled distro, each with libraries that may or may not be secure, both with general access to your home directory. I think this is a recipe for worse security, not better. Add to that the difficulty of making those things work; I know that the Digikam people have been working for months to get sound to work reliably in their AppImage.

Others?

What other ideas are out there? I’ve occasionally created a separate user on the system for running suspicious-ish code, or even a VM or container. That’s a fair bit of work, and provides incomplete protection, but has some benefits. Still, it’s again not going to work for everything.

I hope to play around with many of these tools, especially SELinux, before too long and report back how I’ve found them to be.

Finally, I would like to be really clear that I don’t believe this issue is limited to Debian, or even to Linux. It impacts every desktop platform in wide use today. Actually, I think we’re in a better position to address it than some, but it won’t be easy for anyone.

Tips for Upgrading to, And Securing, Debian Buster

July 17, 2019DebianJohn Goerzen

Wow. Once again, a Debian release impresses me — a guy that’s been using Debian for more than 20 years. For the first time I can ever recall, buster not only supported suspend-to-disk out of the box on my laptop, but it did so on an encrypted volume atop LVM. Very impressive!

For those upgrading from previous releases, I have a few tips to enhance the experience with buster.

AppArmor

AppArmor is a new line of defense against malicious software. The release notes indicate it’s now enabled by default in buster. For desktops, I recommend installing apparmor-profiles-extra apparmor-notify. The latter will provide an immediate GUI indication when something is blocked by AppArmor, so you can diagnose strange behavior. You may also need to add userself to the adm group with adduser username adm.

Security

I recommend installing these packages and taking note of these items, some of which are different in buster:

unattended-upgrades will automatically install security updates for you. New in buster, the default config file will also apply stable updates in addition to security updates.
needrestart will detect what processes need a restart after a library update and, optionally, restart them. Beginning in buster, it will not automatically restart them when in noninteractive (unattended-upgrades) mode. This can be changed by editing /etc/needrestart/needrestart.conf (or, better, putting a .conf file in /etc/needrestart/conf.d) and setting $nrconf{restart} = 'a'. Edit: If you have an Intel CPU, installing iucode-tool intel-microcode will let needrestart also check on your CPU microcode.
debian-security-support will warn you of gaps in security support for packages you are installing or running.
package-update-indicator is useful for desktops that won’t be running unattended-upgrades. I believe Gnome 3 has this built in, but for other desktops, this adds an icon when updates are available.
You can harden apt with seccomp.
You can enable UEFI secure boot.

Tuning

Unless you are sticking with sysvinit, you can purge some packages with dpkg --purge initscripts sysv-rc insserv startpar.
With the advent of driverless printing, you may now be able to remove custom printer definitions for certain LAN printers. See that section for details.
Follow the instructions in preparing for the next release.

If you hadn’t noticed, many of these items are links into the buster release notes. It’s a good document to read over, even for a new buster install.

Treasuring Moments

July 1, 2019Aviation, Family, MusicmarthaJohn Goerzen

“Treasure the moments you have. Savor them for as long as you can, for they will never come back again.”

– J. Michael Straczynski

This quote sits on a post-it note on my desk. Here are some moments of our fast-changing little girl that I’m remembering today — she’s almost 2!

Brothers & Sister

Martha loves to play with her siblings. She has names for them — Jacob is “beedoh” and Oliver is “ah-wah”. When she sees them come home, she gets a huge smile and will screech with excitement. Then she will ask them to play with her.

She loves to go down the slide with Jacob. “Beedoh sigh?” (Jacob slide) — that’s her request. He helps her up, then they go down together. She likes to swing side-by-side with Oliver. “Ahwah sing” (Oliver swing) when she wants him to get on the swing next to her. The boys enjoy teaching her new words and games.

[Video: Martha and Jacob on the slide]

Music

Martha loves music! To her, “sing” is a generic word for music. If we’re near a blue speaker, she’ll say “boo sing” (blue sing) and ask for it to play music.

But her favorite request is “daddy sing.” It doesn’t mean she wants me to sing. No, she wants me to play my xaphoon (a sax-like instrument). She’ll start jumping, clapping, and bopping her head to the music. Her favorite spot to do this is a set of soft climbing steps by the piano.

But that’s not enough — next she pulls out our hymnbooks and music books and pretends to sing along. “Wawawawawawa the end!”

If I decide to stop playing, that is most definitely not allowed. “Daddy sing!” And if I don’t comply, she gets louder and more insistent: “DADDY SING.”

[Videos: Martha singing and reading from hymn books, singing her ABCs]

Airplanes

Martha loves airplanes. She started to be able to say “airplane” — first “peen”, then “airpeen”, and now “airpane!” When we’re outside and she hears any kind of buzzing that might possibly be a plane, I’m supposed to instantly pick her up and carry her past our trees so we can look for it. “AIRPANE! AIRPANE! Ho me?” (hold me) Then when we actually see a plane, it’s “Airpane! Hi airpane!” And as it flies off, “Bye-bye airpane. Bye-bye. [sadly] Airpane all done.”

One day, Martha was trying to see airplanes, but it was cloudy. I bundled her up and we went to our local GA airport and stood in the grass watching planes. Now that was a hit! Now anytime Martha sees warehouse-type buildings, she thinks they are hangars, and begs to go to the airport. She loves to touch the airplane, climb inside it, and look at the airport beacon — even if we won’t be flying that day.

[Video: Hi big plane!]

This year, for Mother’s Day, we were going to fly to a nearby airport with a restaurant on the field. I took a photo of our family by the plane before we left. All were excited!

Mornings

We generally don’t let Martha watch TV, but make a few exceptions for watching a few videos and looking at family pictures. Awhile back, Martha made asked to play with me while I was getting ready for the day. “Martha, I have to get dressed first. Then I’ll play with you.” “OK,” she said.

She ran off into the closet, and came back with what she could reach of my clothing – a dirty shirt, and handed it up to me to wear. I now make sure to give her the chance to bring me socks, shirts, etc. And especially shoes. She really likes to bring me shoes.

Then we go downstairs. Sometimes she sits on my lap in the office and we watch Youtube videos of owls or fish. Or sometimes we go downstairs and start watching One Six Right, a wonderful aviation documentary. She and I jabber about what we see — she can identify the beacon (“bee”), big hangar door (“bih doh”), airplanes of different colors (“yellow one”), etc. She loves to see a little Piper Cub fly over some cows, and her favorite shot is a plane that flies behind the control tower at sunset. She’ll lean over and look for it as if it’s going around a corner.

Sometimes we look at family pictures and videos. Her favorite is a video of herself in a plane, jabbering and smiling. She’ll ask to watch it again and again.

Bedtime

Part of our bedtime routine is that I read a story to Martha. For a long time, I read her The Very Hungry Caterpillar by Eric Carle. She loved that book, and one night said “geecko” for pickle. She noticed I clapped for it, and so after that she always got excited for the geeckos and would clap for them.

Lately, though, she wants the “airpane book” – Clair Bear’s First Solo. We read through that book, she looks at the airplanes that fly, and always has an eye out for the “yellow one” and “boo one” (blue plane). At the end, she requests “more pane? More pane?”

After that, I wave goodnight to her. She used to wave back, but now she says “Goodnight, daddy!” and heads on up the stairs.

The Rightward, Establishment Bias of Lazy Journalism

March 15, 2019Politics, SocietyJohn Goerzen

Note: I also posted this post on medium.

I remember clearly the moment I’d had enough of NPR for the day. It was early morning January 25 of this year, still pretty dark outside. An NPR anchor was interviewing an NPR reporter — they seem to do that a lot these days — and asked the following simple but important question:

“So if we know that Roger Stone was in communications with WikiLeaks and we know U.S. intelligence agencies have said WikiLeaks was operating at the behest of Russia, does that mean that Roger Stone has been now connected directly to Russia’s efforts to interfere in the U.S. election?”

The factual answer, based on both data and logic, would have been “yes”. NPR, in fact, had spent much airtime covering this; for instance, a June 2018 story goes into detail about Stone’s interactions with WikiLeaks, and less than a week before Stone’s arrest, NPR referred to “internal emails stolen by Russian hackers and posted to Wikileaks.” In November of 2018, The Atlantic wrote, “Russia used WikiLeaks as a conduit — witting or unwitting — and WikiLeaks, in turn, appears to have been in touch with Trump allies.”

Why, then, did the NPR reporter begin her answer with “well,” proceed to hedge, repeat denials from Stone and WikiLeaks, and then wind up saying “authorities seem to have some evidence” without directly answering the question? And what does this mean for bias in the media?

Let us begin with a simple principle: facts do not have a political bias. Telling me that “the sky is blue” no more reflects a Democratic bias than saying “3+3=6” reflects a Republican bias. In an ideal world, politics would shape themselves around facts; ideas most in agreement with the data would win. There are not two equally-legitimate sides to questions of fact. There is no credible argument against “the earth is round”, “climate change is real,” or “Donald Trump is an unindicted co-conspirator in crimes for which jail sentences have been given.” These are factual, not political, statements. If you feel, as I do, a bit of a quickening pulse and escalating tension as you read through these examples, then you too have felt the forces that wish you to be uncomfortable with unarguable reality.

That we perceive some factual questions as political is a sign of a deep dysfunction in our society. It’s a sign that our policies are not always guided by fact, but that a sustained effort exists to cause our facts to be guided by policy.

Facts do not have a political bias. There are not two equally-legitimate sides to questions of fact. “Climate change is real” is a factual, not a political, statement. Our policies are not always guided by fact; a sustained effort exists to cause our facts to be guided by policy.

Why did I say right-wing bias, then? Because at this particular moment in time, it is the political right that is more engaged in the effort to shape facts to policy. Whether it is denying the obvious lies of the President, the clear consensus on climate change, or the contours of various investigations, it is clear that they desire to confuse and mislead in order to shape facts to their whim.

It’s not always so consequential when the media gets it wrong. When CNN breathlessly highlights its developing story — that an airplane “will struggle to maintain altitude once the fuel tanks are empty” —it gives us room to critique the utility of 24/7 media, but not necessarily a political angle.

Hey thanks CNN for making sure we all knew the crucial fact that planes cannot fly on empty fuel tanks pic.twitter.com/P6uXuTCUgI
— Katie Pavlich (@KatiePavlich) March 31, 2014

But ask yourself: who benefits when the media is afraid to report a simple fact about an investigation with political connotations? The obvious answer, in the NPR example I gave, is that Republicans benefit. They want the President to appear innocent, so every hedge on known facts about illegal activities of those in Trump’s orbit is a gift to the right. Every time a reporter gives equal time to climate change deniers is a gift to the right and a blow to informed discussion in a democracy.

Not only is there a rightward bias, but there is also an establishment bias that goes hand-in-hand. Consider this CNN report about Facebook’s “pivot to privacy”, in which CEO Zuckerberg is credited with “changing his tune somewhat”. To the extent to which that article highlights “problems” with this, they take Zuckerberg at face-value and start to wonder if it will be harder to clamp down on fake news in the news feed if there’s more privacy. That is a total misunderstanding of what was being proposed; a more careful reading of the situation was done by numerous outlets, resulting in headlines such as this one in The Intercept: “Mark Zuckerberg Is Trying to Play You — Again.” They correctly point out the only change actually mentioned pertained only to instant messages, not to the news feed that CNN was talking about, and even that had a vague promise to happen “over the next few years.” Who benefited from CNN’s failure to read a press release closely? The established powers — Facebook.

Pay attention to the media and you’ll notice that journalists trip all over themselves to report a new dot in a story, but they run away scared from being the first to connect the dots. Much has been written about the “media narrative,” often critical, with good reason. Back in November of 2018, an excellent article on “The Ubearable Rightness of Seth Abramson” covered one particular case in delightful detail.

Journalists trip all over themselves to report a new dot in a story, but they run away scared from being the first to connect the dots.

Seth Abramson himself wrote, “Trump-Russia is too complex to report. We need a new kind of journalism.” He argues the culprit is not laziness, but rather that “archive of prior relevant reporting that any reporter could review before they publish their own research is now so large and far-flung that more and more articles are frustratingly incomplete or even accidentally erroneous than was the case when there were fewer media outlets, a smaller and more readily navigable archive of past reporting for reporters to sift through, and a less internationalized media landscape.” Whether laziness or not, the effect is the same: a failure to properly contextualize facts leading to misrepresented or outright wrong outcomes that, at present, have a distinct bias towards right-wing and establishment interests.

Yes, the many scandals in Trumpland are extraordinarily complex, and in this age of shrinking newsroom budgets, it’s no wonder that reporters have trouble keeping up. Highly-paid executives like Zuckerberg and politicians in Congress have years of practice with obfuscation, and it takes skill to find the truth (if there even is any) behind a corporate press release or political talking point. One would hope, though, that reporters would be less quick to opine if they lack those skills or the necessary time to dig in.

There’s not just laziness; there’s also, no doubt, a confusion about what it means to be a balanced journalist. It is clear that there are two sides to a debate over, say, whether to give a state’s lottery money to the elementary schools or the universities. When there is the appearance of a political debate over facts, shouldn’t that also receive equal time for each side? I argue no. In fact, politicians making claims that contradict establish fact should be exposed by journalists, not covered by them.

And some of it is, no doubt, fear. Fear that if they come out and say “yes, this implicates Stone with Russian hacking” that the Fox News crowd will attack them as biased. Of course this will happen, but that attack will be wrong. The right has done an excellent job of convincing both reporters and the public that there’s a big left-leaning bias that needs to be corrected, by yelling about it every time a fact is mentioned that they don’t like. The unfortunate result is that the fact-leaning bias in the media is being whittled away.

Politicians making claims that contradict establish fact should be exposed by journalists, not covered by them. The fact-leaning bias in the media is being whittled away.

Regardless of the cause, media organizations and their reporters need to be cognizant of the biases actors of all stripes wish them to display, and refuse to play along. They need to be cognizant of the demands they put on their own reporters, and give them space to understand the context of a story before explaining it. They need to stand up to those that try to diminish facts, to those that would like them to be uninformed.

A world in which reporters know the context of their stories and boldly state facts as facts, come what may, is a world in which reporters strengthen the earth’s democracies. And, by extension, its people.

Goodbye to a 15-year-old Debian server

March 12, 2019DebianJohn Goerzen

It was October of 2003 that the server I’ve called “glockenspiel” was born. It was the early days of Linux-based VM hosting, using a VPS provider called memset, running under, of all things, User Mode Linux. Over the years, it has been migrated around, sometimes running on the metal and sometimes in a VM. The operating system has been upgraded in-place using standard Debian upgrades over the years, and is now happily current on stretch (albeit with a 32-bit userland). But it has never been reinstalled. When I’d migrate hosting providers, I’d use tar or rsync to stream glockenspiel across the Internet to its new home.

A lot of people reinstall an OS when a new version comes out. I’ve been doing Debian upgrades with apt for ages, and this one is a case in point. It lingers.

Root’s .profile was last modified in November 2004, and its .bashrc was last modified in December 2004. My own home directory still has a .pinerc, .gopherrc, and .arch-params file. I last edited my .vimrc in 2003 and my .emacs dates back to 2002 (having been copied over from a pre-glockenspiel FreeBSD server).

drwxr-xr-x  3 jgoerzen jgoerzen      4096 Dec  3  2003 irclogs
-rw-r--r--  1 jgoerzen jgoerzen       373 Dec  3  2003 .vimrc
-rw-r--r--  1 jgoerzen jgoerzen       651 Nov 27  2003 .reportbugrc
drwx------  3 jgoerzen jgoerzen      4096 Sep  2  2003 .arch-params
-rw-r--r--  1 jgoerzen jgoerzen      1115 Aug 23  2003 .gopherrc
drwxr-xr-x  3 jgoerzen jgoerzen      4096 Jul 18  2003 .subversion
-rw-r--r--  1 jgoerzen jgoerzen     15317 Jun 21  2003 .pinerc

Poking around /etc on glockenspiel is like a trip back in time. Various apache sites still have configuration files around, but have long since been disabled. Over the years, glockenspiel has hosted source code repositories using Subversion, arch, tla, darcs, mercurial and git. It’s hosted websites using Drupal, WordPress, Serendipity, and so forth. It’s hosted gopher sites, websites or mailing lists for various Free Software projects (such as Freeciv), and any number of local charitable organizations. Remnants of an FTP configuration still exist, when people used web design software to build websites for those organizations on their PCs and then upload them to glockenspiel.

-rw-r--r--   1 root  root                      268 Dec 25  2005 libnet.cfg
-rw-r-----   1 root  root                     1305 Nov 11  2004 mrtg.cfg
-rw-r--r--   1 root  root                      552 Jul 31  2004 pam.conf

All this has been replaced by a set of Docker containers running my docker-debian-base software. They’re all in git, I can rebuild one of the containers in a few seconds or a few minutes by typing “make”, and there is no cruft from 2002. There are a lot of benefits to this.

And yet, there is a part of me that feels it’s all so… cold. Servers having “personalities” was always a distinctly dubious thing, but these days as we work through more and more layers of virtualization and indirection and become more distant from the hardware, we lose an appreciation for what we have and the many shoulders of giants upon which we stand.

And, so with that, the final farewell to this server that’s been running since 2003:

glockenspiel:/etc# shutdown -P now
Shared connection to glockenspiel.complete.org closed.

A (Partial) Defense of Debian

March 11, 2019DebianJohn Goerzen

I was sad to read on his blog that Michael Stapelberg is winding down his Debian involvement. In his post, he outlined some critiques of Debian. In his post, I want to acknowledge that he is on point with some of them, but also push back on others. Some of this is also a response to some of the comments on Hacker News.

I’d first like to discuss some of the assumptions I believe his post rests on: namely that “we’ve always done it this way” isn’t a good reason to keep doing something. I completely agree. However, I would also say that “this thing is newer, so it’s better and we should use it” is also poor reasoning. Newer is not always better. Sometimes it is, sometimes it’s not, but deeper thought is generally required.

Also, when thinking about why things are a certain way or why people prefer certain approaches, we must often ask “why does that make sense to them?” So let’s dive in.

Debian’s Perspective: Stability

Stability, of course, can mean software that tends not to crash. That’s important, but there’s another aspect of it that is also important: software continuing to act the same over time. For instance, if you wrote a C program in 1985, will that program still compile and run today? Granted, that’s a bit of an extreme example, but the point is: to what extent can you count on software you need continuing to operate without forced change?

People that have been sysadmins for a long period of time will instantly recognize the value of this kind of stability. Change is expensive and difficult, and often causes outages and incidents as bugs are discovered when software is adapted to a new environment. Being able to keep up-to-date with security patches while also expecting little or no breaking changes is a huge win. Maintaining backwards compatibility for old software is also important.

Even from a developer’s perspective, lack of this kind of stability is why I have handed over maintainership of most of my Haskell software to others. Some of my Haskell projects were basically “done”, and every so often I’d get bug reports that it no longer compiles due to some change in the base library. Occasionally I’d have patches with those bug reports, but they would inevitably break compatibility with older versions (even though the language has plenty good support for something akin to a better version of #ifdefs to easily deal with this.) The culture of stability was not there.

This is not to say that this kind of stability is always good or always bad. In the Haskell case, there is value to be had in fixing designs that are realized to be poor and removing cruft. Some would say that strcpy() should be removed from libc for this reason. People that want the latest versions of gimp or whatever are probably not going to be running Debian stable. People that want to install a machine and not really be burdened by it for a couple of years are.

Debian has, for pretty much its entire life, had a large proportion of veteran sysadmins and programmers as part of the organization. Many of us have learned the value of this kind of stability from the school of hard knocks – over and over again. We recognize the value of something that just works, that is so stable that things like unattended-upgrades are safe and reliable. With many other distros, something like this isn’t even possible; when your answer to a security bug is to “just upgrade to the latest version”, just trusting a cron job to do it isn’t going to work because of the higher risk.

Recognizing Personal Preference

Writing about Debian’s bug-tracking tool, Michael says “It is great to have a paper-trail and artifacts of the process in the form of a bug report, but the primary interface should be more convenient (e.g. a web form).” This is representative of a personal preference. A web form might be more convenient for Michael — I have no reason to doubt this — but is it more convenient for everyone? I’d say no.

In his linked post, Michael also writes: “Recently, I was wondering why I was pushing off accepting contributions in Debian for longer than in other projects. It occurred to me that the effort to accept a contribution in Debian is way higher than in other FOSS projects. My remaining FOSS projects are on GitHub, where I can just click the “Merge” button after deciding a contribution looks good. In Debian, merging is actually a lot of work: I need to clone the repository, configure it, merge the patch, update the changelog, build and upload. “

I think that’s fair for someone that wants a web-based workflow. Allow me to present the opposite: for me, I tend to push off contributions that only come through Github, and the reason is that, for me, they’re less convenient. It’s also harder for me to contribute to Github projects than Debian ones. Let’s look at this – say I want to send in a small patch for something. If it’s Github, it’s going to look like this:

Go to the website for the thing, click fork
Now clone that fork or add it to my .git/config, hack, and commit
Push the commit, go back to the website, and submit a PR
Github’s email integration is so poor that I basically have to go back to the website for most parts of the conversation. I can do little from the comfort of mu4e.
Remember to clean up my fork after the patch is accepted or rejected.

Compare that to how I’d contribute with Debian:

Hack (and commit if I feel like it)
Type “reportbug foo”, attach my patch
Followup conversation happens directly in email where it’s convenient to reply

How about as the developer? Github constantly forces me to their website. I can’t very well work on bug reports, etc. without a strong Internet connection. And it’s designed to push people into using their tools and their interface, which is inferior in a lot of ways to a local interface – but then the process to pull down someone else’s set of patches involves a lot of typing and clicking, much more that would be involved from a simple git format-patch. In short, I don’t have my shortcut keys, my environment, etc. for reviewing things – the roadblocks are there to make me use theirs.

If I get a contribution from someone in debbugs, it’s so much easier. It’s usually just git apply or patch -p1 and boom, I can see exactly what’s changed and review it. A review comment is just a reply to an email. I don’t have to ever fire up a web browser. So much more convenient.

I don’t write this to say Michael is wrong about what’s more convenient for him. I write it to say he’s wrong about what’s more convenient for me (or others). It may well be the case that debbugs is so inconvenient that it pushes him to leave while github is so inconvenient for others that it pushes them to avoid it.

I will note before leaving this conversation that there are some command-line tools available for Github and a web interface to debbugs, but it is still clear that debbugs is a lot easier to work with from within my own mail reader and tooling, and Github is a lot easier to work with from within a web browser.

The case for reportbug

I remember the days before we had reportbug. Over and over and over again, I would get bug reports from users that wouldn’t have the basic information needed to investigate. reportbug gathers information from the system: package versions, configurations, versions of dependencies, etc. A simple web form can’t do this because it doesn’t have a local agent. From a developer’s perspective, trying to educate users on how to do this over and over as an unending, frustrating, and counter-productive task. Even if it’s clearly documented, the battle will be fought over and over. From a user’s perspective, having your bug report ignored or told you’re doing it wrong is frustrating too.

So I think reportbug is much nicer than having some github-esque web-based submission form. Could it be better? Sure. I think a mode to submit the reportbug report via HTTPS instead of email would make sense, since a lot of machines no longer have local email configured.

Where Debian Should Improve

I agree that there are areas where Debian should improve.

Michael rightly identifies the “strong maintainer” concept as a source of trouble. I agree. Though we’ve been making slow progress over time with things like low-threshold NMU and maintainer teams, the core assumption that a maintainer has a lot of power over particular packages is one that needs to be thrown out.

Michael, and commentators on HN, both identify things that boil down to documentation problems. I have heard so many times that it’s terribly hard to package something up for Debian. That’s really not the case for most things; dh_make and similar tools will do the right thing for many packages, and all you have to do is add some package descriptions and such. I wrote a “concise guide” to packaging for my workplace that ran to only about 2 pages. But it is true that the documentation on debian.org doesn’t clearly offer this simple path, so people are put off and not aware of it. Then there were the comments about how hard it is to become a Debian developer, and how easy it is to submit a patch to NixOS or some such. The fact is, these are different things; one does not need to be a Debian Developer to contribute to Debian. A DD is effectively the same as a patch approver elsewhere; these are the people that can ultimately approve software for insertion into the OS, and you DO want an element of trust there. Debian could do more to offer concise guides for drive-by contributions and the building of packages that follow standard language community patterns, both of which can be done without much knowledge of packaging tools and inner workings of the project.

Finally, I have distanced myself from conversations in Debian for some time, due to lack of time to participate in what I would call excessive bikeshedding. This is hardly unique to Debian, but I am glad to see the project putting more effort into expecting good behavior from conversations of late.

Review of Secure, Privacy-Respecting Email Services

February 27, 2019Freedom, Online Life, Technologyemail, InternetJohn Goerzen

I’ve been hosting my own email for several decades now. Even before I had access to a dedicated Internet link, I had email via dialup UUCP (and, before that, a FidoNet gateway).

But self-hosting email is becoming increasingly difficult. The time required to maintain spam and virus filters, SPF/DKIM settings, etc. just grows. The importance of email also is increasing. Although my own email has been extremely reliable, it is still running on a single server somewhere and therefore I could stand to have a lot of trouble if it went down while I was unable to fix it

Email with Pretty Good Privacy & Security

(Yes, this heading is a pun.)

There’s a lot of important stuff linked to emails. Family photos. Password resets for banks, social media sites, chat sites, photo storage sites, etc. Shopping histories. In a lot of cases, if your email was compromised, it wouldn’t be all that hard to next compromise your bank account, buy stuff with your Amazon account, hijack your Netflix, etc. There are lots of good resources about why privacy matters; here’s one informative video even if you think you have “nothing to hide”.

There is often a tradeoff between security and usability. A very secure system would be airgapped; you’d always compose your messages and use your secret keys on a system that has no Internet access and never will. Such a system would be quite secure, but not particularly usable.

On the other end of the spectrum are services such as Gmail, which not only make your email available to you, but also to all sorts of other systems within the service that aim to learn about your habits so they can sell this information to advertisers.

This post is about the services in the middle – ones that are usable, can be easily used on mobile devices, and yet make a serious and credible effort to provide better security and privacy than the “big services” run by Google, Yahoo, and Microsoft. Some elements of trust are inherent here; for instance, that the description of the technical nuances of the provider’s services are accurate. (Elements of trust are present in any system; whether your firmware, binaries, etc. are trustworthy.) I used the list at Privacy Tools as a guide to what providers to investigate, supplemented by searches and NoMoreGoogle.

It so happens that most of these services integrate PGP in some way. PGP has long been one of the better ways to have secure communication via email, but it is not always easy for beginners to use. These services make it transparent to a certain degree. None of them are as good as a dedicated client on an airgapped machine, but then again, such a setup isn’t very practical for everyday use. These services give you something better — pretty good, even — but of course not perfect. All of these pay at least lip service to Open Source, some of them actually publishing source for some of their components, but none are fully open.

I pay particular attention to how they handle exchanges with people that do not have PGP, as this kind of communication constitutes the vast majority of my email.

A final comment – if what you really need is an easy and secure way to communicate with one or two people, email itself may not be the right option. Consider Signal.

Protonmail

Protonmail is, in many ways, the gold standard of privacy-respecting email. Every email is stored encrypted in a way that even they can’t see, being decrypted on the client side (using a Javascript PGP implementation or other clients). They definitely seem to be pushing the envelope for security and privacy; ~~they keep no IP logs~~, don’t require any personal information to set up an account, and go into quite a lot of detail about how your keys are protected.

A side effect of this is that you can’t just access your email with any mail reader. Since the decryption is done on the client side, you pretty much have to use a Protonmail client. They provide clients for iOS and Android, the Web interface, and a “bridge” that exposes IMAP and SMTP ports to localhost and lets you connect a traditional mail client to the system. The bridge, in this case, handles the decryption for you. The bridge works really well and supports Windows, Mac, and Linux, though it is closed source. (The source for the Linux bridge has been “coming soon” for awhile now.) Protonmail provides very good support for bringing your own domain, and in my testing this worked flawlessly. It supports Sieve-based filters, which can also act on envelope recipients (yes!) The web interface is sleek, very well done, tightly integrated, and just generally exceptionally easy to use and just works.

Unfortunately, the mobile clients get the job done for only light use. My opinion: they’re bad. Really bad. For instance:

There’s no way to change the sort order on a mail folder
The Android client has an option to automatically download all message bodies. The iOS client lacks this option, but no matter; it doesn’t work on Android anyhow.
They’re almost completely unusable offline. You can compose a brand new message but that’s it.

There are some other drawbacks. For one, they don’t actually encrypt mail metadata, headers, or subject lines (though this is common to all of the solutions here, Protonmail’s marketing glosses over this). They also seem to have a lot of problems with overly-aggressive systems blocking people’s accounts: here’s a report from 2017, and I’ve seen more recent ones from people that had paid, but then had the account disabled. Apparently protonmail is used by scammers a fair bit and this is a side-effect of offering free, highly secure accounts – some of their deactivations have been legitimate. Nevertheless, it makes me nervous, especially given the high number of reports of this on reddit.

Unfortunately, Proton seems more focused on new products than on fixing these issues. They’ve been long-simmering in the community but what they talk about is more about their upcoming new products.

Protonmail’s terms of service include both a disclaimer that it’s as-is and an SLA, as well as an indemnification clause. Update 2019-03-04: Protonmail’s privacy policy states they use Matomo analytics, that they don’t record your login IP address by default (but IP logs might be kept if you enable it or if they suspect spamming, etc), collect mobile app analytics, IP addresses on incoming messages, etc. Data is retained “indefinitely” for active accounts and for 14 days after account deletion for closed accounts.

Support: email ticket only

Pricing: $4/mo if paid annually; includes 5GB storage, 5 aliases, and 1 custom domain

Location: Switzerland

MFA: TOTP only

Plus address extensions: yes

Transparency report: yes

Mailfence

Mailfence is often mentioned in the same sentence as ProtonMail. They also aim to be a privacy-respecting, secure email solution.

While it is quite possible they use something like LUKS to encrypt data at rest (safeguarding it from a stolen hard drive), unlike ProtonMail, Mailfence does have access to the full content of any plaintext messages sent or received by your account. Mailfence integrates PGP into the Web interface, claiming end-to-end encryption with a “zero-knowledge environment” using, of all things, the same openpgpjs library that is maintained by ProtonMail. While ProtonMail offers a detailed description of key management, ~~I haven’t been able to find this with Mailfence~~ – other than that the private key is stored encrypted on their servers and is protected by a separate passphrase from the login. If we assume the private key is decrypted on the client side, then for PGP-protected communications, the level of security is similar to ProtonMail. With Mailfence, decrypting these messages is a separate operation, while with ProtonMail it happens automatically once logged in. (Update 2019-03-01: Mailfence emailed me, pointing to their document on key storage – it is AES-256 encrypted by the client and stored on the server. They also passed along a link describing their PGP keystore. They also said they plan to work on a feature th encrypt plain text messages.)

While technical measures are part of the story, business policies are another, and Mailfence does seem to have some pretty good policies in place.

In experimenting with it, I found that Mailfence’s filters don’t support filtering based on the envelope recipient, which limits the utility of its aliases since BCC and the like won’t filter properly. A workaround might be possible via the IMAP connection filtering based on Received: headers, but that is somewhat ugly.

Mailfence also supports “secure” documents (word processor, spreadsheet, etc), WebDAV file storage, contacts, and calendars. There is no detail on what makes it “secure” – is it just that it uses TLS or is there something more? I note that the online document editor goes to a URL under writer.zoho.com, so this implies some sort of leakage to me and a possible violation of their “no third-party access to your data” claim. (Update 2019-03-01: Mailfence emailed me to point out that, while it’s not disclosed on the page I liked to, it is disclosed on their blog, and that since I evaluated it, they added a popup warning in the application before sending the documents to Zoho.)

Mailfence supports POP, IMAP, SMTP, and — interestingly — Exchange ActiveSync access to their services. I tested ActiveSync on my Android device, and it appeared to work exactly as planned. This gives a lot of client flexibility and very nice options for calendar and contact sync (*DAV is also supported).

Mailfence’s terms of use is fairly reasonable, though it also includes an indemnification clause. It makes no particular uptime promises. Update 2019-03-04: Per their privacy policy, Mailfence logs IP addresses and use Matomo analytics on the website but not within the application. Deleted messages and documents are retained for 45 days. The policy does not specify retention for logs.

Support: email ticket or business-hours phone support for paying customers

Pricing: EUR 2.50/mo paid annually, includes 5GB storage, 10 aliases, and 1 custom domain

Location: Belgium

MFA: TOTP only

Plus address extensions: Yes

Transparency report: Yes

Mailbox.org

Mailbox.org has been in the hosting business for a long time, and also has a privacy emphasis. Their security is conceptually similar to that of Mailfence. They offer two web-based ways of dealing with PGP: OX Guard and Mailvelope. Mailvelope is a browser extension that does all encryption and decryption on the client side, similar to Mailfence and ProtonMail. OX Guard is part of the Open-Xchange package which mailbox.org uses. It stores the encryption keys on the server, protected by a separate key passphrase, but all encryption and decryption is done server-side. Mailbox’s KB articles on this makes it quite clear and spell out the tradeoffs. The basic upshot is that messages you receive in plaintext will still be theoretically visible to the service itself.

Mailbox.org offers another interesting feature: automatic PGP-encryption of any incoming email that isn’t already encrypted. This encrypts everything inbound. If accessed using Mailvelope or some other external client, it provides equivalent security to ProtonMail. (OX Guard is a little different since the decryption happens server-side.)

They also offer you an @secure.mailbox.org email address that will reject any incoming mail that isn’t properly secured by TLS. You can also send from that address, which will fail to send unless the outgoing connection is properly secured as well. This is one of the more interesting approaches to dealing with the non-PGP-using public. Even if you don’t use that, if you compose in their web interface, you get immediate feedback about the TLS that will be used. It’s not end-to-end, but it’s better than nothing. Mailfence and Protonmail both offer an “secure email” that basically emails a link to a recipient, that links back to their server and requires the recipient to enter a password that was presumably exchanged out of band. Mailbox Guard will automatically go this route when you attempt to send email to someone for whom the PGP keys weren’t known, but goes a step further and invites them to reply there or set up their PGP keys.

Mailbox.org runs Open-Xchange, a semi-Open Source web-based office suite. As such, it also offers calendar, contacts, documents, task lists, IMAP/SMTP/POP, ActiveSync, and so forth. Their KB specifically spells out that things like the calendar are not encrypted with PGP. The filtering does the right thing with envelope recipients.

Mailbox.org has an amazingly comprehensive set of options, a massive knowledge base, even a user forum. Some of the settings I found to be interesting, besides the ones already mentioned, include:

Spam settings: greylisting on or off, RBL use, executable file attachment blocking, etc.
Restoring email from a backup
Disposable addresses (automatically deleted after 30 days)
A “catch-all” alias, that just counts as one of your regular aliases, and applies to all usernames under a domain not otherwise aliased.

I know Protonmail has frequent third-party security audits; I haven’t seen any mention of this on the mailbox.org site. However, it looks probable that less of their code was written in house, and it may have been audited without a mention.

Overall, I’ve been pretty impressed with them. They give details on EVERYTHING. It’s the geeky sort of comprehensive, professional solution I’d like. I wish it would have full end-to-end transparent encryption like ProtonMail, but honestly what they’re doing is more practical and useful to a lot of folks.

Mailbox has a reasonable T&C (though it does include an indemnity clause as many others do) and a thorough data protection and privacy policy. ~~Some providers don’t log IP addresses at all~~; mailbox.org does, but destroys them after 4 days. (Update 2019-03-04: Discovered that all of the providers reviewed may do this at times; updated the other reviews and removed incorrect text; mailbox.org’s is actually one of the better policies) mailbox.org goes into a lot more detail than others, and also explicitly supports things such as Tor for greater anonymity.

Support: email ticket (phone for business-level customers)

Pricing: EUR 1/mo for 2GB storage and 3 aliases; EUR 2.50/mo for 5GB storage and 25 aliases. Expansions possible (for instance, 25GB storage costs a total of EUR 3.50/mo)

Location: Germany

MFA: Yubikey, OATH, TOTP, HOTP, MOTP (web interface only)

Plus address extensions: Yes

Transparency Report: Yes

Startmail

Startmail is a service from the people behind the privacy-respecting search engine Startpage. There is not a lot of information about the technical implementation of Startmail, with the exception of a technical white paper from 2016. It is unclear if this white paper remains accurate, but this review will assume it is. There are also some articles in the knowledge base.

I was unable to fully review Startmail, because the free trial is quite limited (doesn’t even support IMAP) and anything past that level requires an up-front payment of $60. While I paid a few dollars for a month’s real account elsewhere, this was rather too much for a few paragraphs’ review.

However, from the trial, it appears to have a feature set roughly akin to Mailfence. Its mail filters are actually more limited, and it’s mail only: no documents, calendars, etc.

Startmail a somewhat unique setup, in which a person’s mail, PGP keys, etc. are stored in a “vault” which turns out to be a LUKS-encrypted volume. This vault is opened when a person logs in and closed when they log out, and controlled by a derivative of their password. On the one hand, this provides an even stronger level of security than Protonmail (since headers are also encrypted). On the other hand, when the vault is “open” – when one must presume it is quite frequently for an account being polled by IMAP – it is no better than anything else.

They explicitly state that they have not had a third-party audit.

Support: ticket only

Pricing: $60/yr ($5/mo), must be paid as an entire year up-front

Location: Netherlands

MFA: TOTP only

Plus address extensions: unknown

Transparency report: no

Not Reviewed

Some other frequently-used providers I didn’t review carefully:

posteo.de: encrypts your mail using a dovecot extension that decrypts it using a derivation of your password when you connect. Something better than nothing but less than Protonmail. Didn’t evaluate because it didn’t support my own domain.
Tutanota: Seems to have a security posture similar to ProtonMail, but has no IMAP support at all. If I can’t use emacs to read my mail, I’m not going to bother.

Conclusions

The level of security represented by Protonmail was quite appealing to me. I wish that the service itself was more usable. It looks like an excellent special-needs service, but just isn’t quite there yet as a main mail account for people that have a lot of mail.

I am likely to pursue mailbox.org some more, as although it isn’t as strong as Protonmail when it comes to privacy, it is still pretty good and is amazing on usability and flexibility.

A Final Word on Trust

Trust is a big part of everything going on here. For instance, if you use ProtonMail, where does trust come into play? Well, you trust that they aren’t serving you malicious JavaScript that captures your password and sends it to them out of band. You trust that your browser provides a secure environment for JavaScript and doesn’t have leakage. Or if you use mailbox.org, you trust that the server is providing a secure environment and that when you supply your password for the PGP key, it’s used only for that. ProtonMail will tell you how great it is to have this code client-side. Startmail will tell you how bad Javascript in a browser is for doing things related to security. Both make good, valid points.

To be absolutely sure, it is not possible or practical for any person to verify every component in their stack on every use. Different approaches have different trust models. The very best is still standalone applications.

The providers reviewed here raise the average level of privacy and security on the Internet, and do it by making it easier for the average user. That alone is a good thing and worthy of support. None of them can solve every problem, but all of them are a step up from the standard, which is almost no security at all.

Beauty Breaks Through

November 7, 2018UncategorizedJohn Goerzen

Two years ago, I was in the middle of the forest in rural southern Indiana. It was a time of hope – of defeating racism, sexism, xenophobia. Hope for affordable health care, for peace, for care for the young and the old. Then I woke up, in that beautiful place, to the news that Donald Trump would be president. Trump. President.

A few days later, I wrote Morning In The Skies, which included, in part:

Not long after the election, I got in a plane, pushed in the throttle, and started the takeoff roll down a runway in the midst of an Indiana forest. The skies were the best kind of clear blue, and pretty soon I lifted off and could see for miles. Off in the distance, I could see the last cottony remnants of the morning’s fog, lying still in the valleys, surrounding the little farms and houses as if to give them a loving hug. Wow.

Sometimes the flight is bumpy. Sometimes the weather doesn’t cooperate, and it doesn’t happen at all. Sometimes you can fly across four large states and it feels as smooth as glass the whole way.

Whatever happens, at the end of the day, the magic flying carpet machine gets locked up again. We go home, rest our heads on our soft pillows, and if we so choose, remember the beauty we experienced that day.

Really, this post is not about being a pilot. This post is a reminder to pay attention to all that is beautiful in this world. It surrounds us; the smell of pine trees in the forest, the delight in the faces of children, the gentle breeze in our hair, the kind word from a stranger, the very sunrise.

I hope that more of us will pay attention to the moments of clear skies and wind at our back. Even at those moments when we pull the hangar door shut.

For two years, I have often reflected on the bittersweet memories of that trip to Indiana. But for some reason, I hadn’t shared that photo until today. That beautiful valley-hugging fog is what you see above.

These last two years have been — well, full. Full of hate, even of death in the wake of several racist murders. But that’s not all. These years have also been full of an awakening, a swelling of people that care. People that care enough to do something. All across the country, people have risen up to send the message: “Trumpism is not American.” My own family did something we never had before: joined a protest, against families being separated. I and many others knocked on doors and made phone calls for the first time. Millions of Americans care and are doing something. We have seen the true colors of what the GOP has become, and it’s ugly, but people care. What’s more, we’ve won the first battle. Here in what the media often calls “deep-red Kansas”, we will have a Democratic governor. Racism and vote suppression has been sent packing, here in Kansas.

We have a powerful reminder that part of what makes this world beautiful is its people. People that go knock on doors in the cold. People that drive people to voting places. People that care about health care for others, about food for others, about education, intact families, refugees, and the earth itself. People that know the fight has just begun and are going to be there fighting for what is right and just for years to come. People that make the world beautiful.

The Python Unicode Mess

October 5, 2018ProgrammingJohn Goerzen

Unicode has solved a lot of problems. Anyone that remembers the mess of ISO-8859-* vs. CP437 (and of course it’s even worse for non-Western languages) can attest to that. And of course, these days they’re doing the useful work of…. codifying emojis.

Emojis aside, things aren’t all so easy. Today’s cause of pain: Python 3. So much pain.

Python decided to fully integrate Unicode into the language. Nice idea, right?

But here come the problems. And they are numerous.

gpodder, for instance, frequently exits with tracebacks due to Python errors converting podcast titles with smartquotes into ASCII. Then you have the case where the pexpect docs say to use logfile = sys.stdout to show the interaction with the virtual terminal. Only that causes an error these days.

But processing of filenames takes the cake. I was recently dealing with data from 20 years ago, before UTF-8 was a filename standard. These filenames are still valid on Unix. tar unpacks them, and they work fine. But you start getting encoding errors from Python trying to do things like store filenames in strings. For a Python program to properly support all valid Unix filenames, it must use “bytes” instead of strings, which has all sorts of annoying implications. What’s the chances that all Python programs do this correctly? Yeah. Not high, I bet.

I recently was processing data generated by mtree, which uses octal escapes for special characters in filenames. I thought this should be easy in Python, eh?

Many wrong answers – you’ll get an encoding error for certain values. Even the regular expression solution on that page doesn’t work.
Even more wrong answers exist.

That second link had a mention of an undocumented function, codecs.escape_decode, which does it right. I finally had to do this:

    if line.startswith(b'#'):
        continue
    fields = line.split()
    filename = codecs.escape_decode(fields[0])[0]
    filetype = getfield(b"type", fields[1:])
    if filetype == b"file":

And, whatever you do, don’t accidentally write if filetype == "file" — that will silently always evaluate to False, because "file" tests different than b"file". Not that I, uhm, wrote that and didn’t notice it at first…

So if you want to actually handle Unix filenames properly in Python, you:

Must have a processing path that fully avoids Python strings.
Must use sys.{stdin,stdout}.buffer instead of just sys.stdin/stdout
Must supply filenames as bytes to various functions. See PEP 0471 for this comment: “Like the other functions in the os module, scandir() accepts either a bytes or str object for the path parameter, and returns the DirEntry.name and DirEntry.path attributes with the same type as path. However, it is strongly recommended to use the str type, as this ensures cross-platform support for Unicode filenames. (On Windows, bytes filenames have been deprecated since Python 3.3).” So if you want to be cross-platform, it’s even worse, because you can’t use str on Unix nor bytes on Windows.

Update: Would you like to receive filenames on the command line? I’ll hand you this fine mess. And the environment? it’s not even clear.

The Changelog

Comments on family, technology, and society

Alas, Poor PGP

The Problems With PGP

Suggested Solutions

Additional Discussions

The Desktop Security Nightmare

AppArmor

SELinux

Mount Tricks

Firejail

AppImage and related tools

Others?

Tips for Upgrading to, And Securing, Debian Buster

Treasuring Moments

Brothers & Sister

Music

Airplanes

Mornings

Bedtime

The Rightward, Establishment Bias of Lazy Journalism

Goodbye to a 15-year-old Debian server

A (Partial) Defense of Debian

Debian’s Perspective: Stability

Recognizing Personal Preference

The case for reportbug

Where Debian Should Improve

Review of Secure, Privacy-Respecting Email Services

Email with Pretty Good Privacy & Security

Protonmail

Mailfence

Mailbox.org

Startmail

Not Reviewed

Conclusions

A Final Word on Trust

Beauty Breaks Through

The Python Unicode Mess