Category Archives: Debian

“Has Linux lost its way?” comments prompt a Debian developer to revisit FreeBSD after 20 years

I’ll admit it. I have a soft spot for FreeBSD. FreeBSD was the first Unix I ran, and it was somewhere around 20 years ago that I did so, before I switched to Debian. Even then, I still used some of the FreeBSD Handbook to learn Linux, because Debian didn’t have the great Reference that it does now.

Anyhow, some comments in my recent posts (“Has modern Linux lost its way?” and Reactions to that, and the value of simplicity), plus a latent desire to see how ZFS fares in FreeBSD, caused me to try it out. I installed it both in VirtualBox under Debian, and in an old 64-bit Thinkpad sitting in my basement that previously ran Debian.

The results? A mixture of amazing and disappointing. I will say that I am quite glad that both exist; there is plenty of innovation happening everywhere and neat features exist everywhere, too. But I can also come right out and say that the statement that FreeBSD doesn’t have issues like Linux does is false and misleading. In many cases, it’s running the exact same stack. In others, it’s better, but there are also others where it’s worse. Perhaps this article might dispell a bit of the FUD surrounding jessie, while also showing off some of the nice things FreeBSD does. My conclusion: Both jessie and FreeBSD 10.1 are awesome Free operating systems, but both have their warts. This article is more about FreeBSD than Debian, but it will discuss a few of Debian’s warts as well.

The experience

My initial reaction to FreeBSD was: wow, this feels so familiar. It reminds me of a commercial Unix, or maybe of Linux from a few years ago. A minimal, well-documented base system, everything pretty much in logical places in the filesystem, and solid memory management. I felt right at home. It was almost reassuring, even.

Putting together a FreeBSD box is a lot of package installing and config file editing. The FreeBSD Handbook, describing how to install X, talks about editing this or that file for this or that feature. I like being able to learn directly how things fit together by doing this.

But then you start remembering the reasons you didn’t like Linux a few years ago, or the commercial Unixes: maybe it’s that programs like apache are still not as well supported, or maybe it’s that the default vi has this tendency to corrupt the terminal periodically, or perhaps it’s that root’s default shell is csh. Or perhaps it’s that I have to do a lot of package installing and config file editing. It is not quite the learning experience it once was, either; now there are things like “paste this XML file into some obscure polkit location to make your mouse work” or something.

Overall, there are some areas where FreeBSD kills it in a way no other OS does. It is unquestionably awesome in several areas. But there are a whole bunch of areas where it’s about 80% as good as Linux, a number of areas (even polkit, dbus, and hal) where it’s using the exact same stack Linux is (so all these comments about FreeBSD being so differently put together strike me as hollow), and frankly some areas that need a lot of work and make it hard to manage systems in a secure and stable way.

The amazing

Let’s get this out there: I’ve used ZFS too much to use any OS that doesn’t support it or something like it. Right now, I’m not aware of anything like ZFS that is generally stable and doesn’t cost a fortune, so pretty much: if your Unix doesn’t do ZFS, I’m not interested. (btrfs isn’t there yet, but will be awesome when it is.) That’s why I picked FreeBSD for this, rather than NetBSD or OpenBSD.

ZFS on FreeBSD is simply awesome. They have integreated it extremely well. The installer supports root on zfs, even encrypted root on zfs (though neither is a default). top on a FreeBSD system shows a line of ZFS ARC (cache) stats right alongside everything else. The ZFS defaults for maximum cache size, readahead, etc. auto-tune themselves at boot (unless overridden) based on the amount of RAM in a system and the system type. Seriously, these folks have thought of everything and it just reeks of solid. I haven’t seen ZFS this well integrated outside the Solaris-type OSs.

I have been using ZFSOnLinux for some time now, but it is just not as mature as ZFS on FreeBSD. ZoL, for instance, still has some memory tuning issues, and is not really suggested for 32-bit machines. FreeBSD just nails it. ZFS on FreeBSD even supports TRIM, which is not available in ZoL and I think fairly unique even among OpenZFS platforms. It also supports delegated administration of the filesystem, both to users and to jails on the system, seemingly very similar to Solaris zones.

FreeBSD also supports beadm, which is like a similar tool on Solaris. This lets you basically use ZFS snapshots to make lightweight “boot environments”, so you can select which to boot into. This is useful, say, before doing upgrades.

Then there are jails. Linux has tried so hard to get this right, and fallen on its face so many times, a person just wants to take pity sometimes. We’ve had linux-vserver, openvz, lxc, and still none of them match what FreeBSD jails have done for a long time. Linux’s current jail-du-jour is LXC, though it is extremely difficult to configure in a secure way. Even its author comments that “you won’t hear any of the LXC maintainers tell you that LXC is secure” and that it pretty much requires AppArmor profiles to achieve reasonable security. These are still rather in flux, as I found out last time I tried LXC a few months ago. My confidence in LXC being as secure as, say, KVM or FreeBSD is simply very low.

FreeBSD’s jails are simple and well-documented where LXC is complex and hard to figure out. Its security is fairly transparent and easy to control and they just work well. I do think LXC is moving in the right direction and might even get there in a couple years, but I am quite skeptical that even Docker is getting the security completely right.

The simply different

People have been throwing around the word “distribution” with respect to FreeBSD, PC-BSD, etc. in recent years. There is an analogy there, but it’s not perfect. In the Linux ecosystem, there is a kernel project, a libc project, a coreutils project, a udev project, a systemd/sysvinit/whatever project, etc. You get the idea. In FreeBSD, there is a “base system” project. This one project covers the kernel and the base userland. Some of what they use in the base system is code pulled in from elsewhere but maintained in their tree (ssh), some is completely homegrown (kernel), etc. But in the end, they have a nicely-integrated base system that always gets upgraded in sync.

In the Linux world, the distribution makers are responsible for integrating the bits from everywhere into a coherent whole.

FreeBSD is something of a toolkit to build up your system. Gentoo might be an analogy in the Linux side. On the other end of the spectrum, Ubuntu is a “just install it and it works, tweak later” sort of setup. Debian straddles the middle ground, offering both approaches in many cases.

There are pros and cons to each approach. Generally, I don’t think either one is better. They are just different.

The not-quite-there

I said that there are a lot of things in FreeBSD that are about 80% of where Linux is. Let me touch on them here.

Its laptop support leaves something to be desired. I installed it on a few-years-old Thinkpad — basically the best possible platform for working suspend in a Free OS. It has worked perfectly out of the box in Debian for years. In FreeBSD, suspend only works if it’s in text mode. If X is running, the video gets corrupted and the system hangs. I have not tried to debug it further, but would also note that suspend on closed lid is not automatic in FreeBSD; the somewhat obscure instuctions tell you what policykit pkla file to edit to make suspend work in XFCE. (Incidentally, it also says what policykit file to edit to make the shutdown/restart options work).

Its storage subsystem also has some surprising misses. Its rough version of LVM, LUKS, and md-raid is called GEOM. GEOM, however, supports only RAID0, RAID1, and RAID3. It does not support RAID5 or RAID6 in software RAID configurations! Linux’s md-raid, by comparison, supports RAID0, RAID1, RAID4, RAID5, RAID6, etc. There seems to be a highly experimental RAID5 patchset floating around for many years, but it is certainly not integrated into the latest release kernel. The current documentation makes no mention of RAID5, although it seems that a dated logical volume manager supported it. In any case, RAID5 does not seem to be well-supported in software like it is in Linux.

ZFS does have its raidz1 level, which is roughly the same as RAID5. However, that requires full use of ZFS. ZFS also does not support some common operations, like adding a single disk to an existing RAID5 group (which is possible with md-raid and many other implementations.) This is a ZFS limitation on all platforms.

FreeBSD’s filesystem support is rather a miss. They once had support for Linux ext* filesystems using the actual Linux code, but ripped it out because it was in GPL and rewrote it so it had a BSD license. The resulting driver really only works with ext2 filesystems, as it doesn’t work with ext3/ext4 in many situations. Frankly I don’t see why they bothered; they now have something that is BSD-licensed but only works with a filesystem so old nobody uses it anymore. There are only two FreeBSD filesystems that are really useable: UFS2 and ZFS.

Virtualization under FreeBSD is also not all that present. Although it does support the VirtualBox Open Source Edition, this is not really a full-featured or fast enough virtualization environment for a server. Its other option is bhyve, which looks to be something of a Xen clone. bhyve, however, does not support Windows guests, and requires some hoops to even boot Linux guest installers. It will be several years at least before it reaches feature-parity with where KVM is today, I suspect.

One can run FreeBSD as a guest under a number of different virtualization systems, but their instructions for making the mouse work best under VirtualBox did not work. There may have been some X.Org reshuffle in FreeBSD that wasn’t taken into account.

The installer can be nice and fast in some situations, but one wonders a little bit about QA. I had it lock up on my twice. Turns out this is a known bug reported 2 months ago with no activity, in which the installer attempts to use a package manger that it hasn’t set up yet to install optional docs. I guess the devs aren’t installing the docs in testing.

There is nothing like Dropbox for FreeBSD. Apparently this is because FreeBSD has nothing like Linux’s inotify. The Linux Dropbox does not work in FreeBSD’s Linux mode. There are sketchy reports of people getting an OwnCloud client to work, but in something more akin to rsync rather than instant-sync mode, if they get it working at all. Some run Dropbox under wine, apparently.

The desktop environments tend to need a lot more configuration work to get them going than on Linux. There’s a lot of editing of polkit, hal, dbus, etc. config files mentioned in various places. So, not only does FreeBSD use a lot of the same components that cause confusion in Linux, it doesn’t really configure them for you as much out of the box.

FreeBSD doesn’t support as many platforms as Linux. FreeBSD has only two platforms that are fully supported: i386 and amd64. But you’ll see people refer to a list of other platforms that are “supported”, but they don’t have security support, official releases, or even built packages. They includ arm, ia64, powerpc, and sparc64.

The bad: package management

Roughly 20 years ago, this was one of the things that pulled me to Debian. Perhaps I am spolied from running the distribution that has been the gold standard for package management for so long, but I find FreeBSD’s package management — even “pkg-ng” in 10.1-RELEASE — to be lacking in a number of important ways.

To start with, FreeBSD actually has two different package management systems: one for the base system, and one for what they call the ports/packages collection (“ports” being the way to install from source, and “packages” being the way to install from binaries, but both related to the same tree.) For the base system, there is freebsd-update which can install patches and major upgrades. It also has a “cron” option to automate this. Sadly, it has no way of automatically indicating to a calling script whether a reboot is necessary.

freebsd-update really manages less than a dozen packages though. The rest are managed by pkg. And pkg, it turns out, has a number of issues.

The biggest: it can take a week to get security updates. The FreeBSD handbook explains pkg audit -F which will look at your installed packages (but NOT the ones in the base system) and alert you to packages that need to be updates, similar to a stripped-down version of Debian’s debsecan. I discovered this myself, when pkg audit -F showed a vulnerability in xorg, but pkg upgrade showed my system was up-to-date. It is not documented in the Handbook, but people on the mailing list explained it to me. There are workarounds, but they can be laborious.

If that’s not bad enough, FreeBSD has no way to automatically install security patches for things in the packages collection. Debian has several (unattended-upgrades, cron-apt, etc.) There is “pkg upgrade”, but it upgrades everything on the system, which may be quite a bit more than you want to be upgraded. So: if you want to run Apache with PHP, and want it to just always apply security patches, FreeBSD packages are not up to the job like Debian’s are.

The pkg tool doesn’t have very good error-handling. In fact, its error handling seems to be nonexistent at times. I noticed that some packages had failures during install time, but pkg ignored them and marked the package as correctly installed. I only noticed there was a problem because I happened to glance at the screen at the right moment during messages about hundreds of packages. In Debian, by contrast, if there are any failures, at the end of the run, you get a nice report of which packages failed, and an exit status to use in scripts.

It also has another issue that Debian resolved about a decade ago: package scripts displaying messages that are important for the administrator, but showing so many of them that they scroll off the screen and are never seen. I submitted a bug report for this one also.

Some of these things just make me question the design of pkg. If I can’t trust it to accurately report if the installation succeeded, or show me the important info I need to see, then to what extent can I trust it?

Then there is the question of testing of the ports/packages. It seems that, automated tests aside, basically everyone is running off the “master” branch of the ports/packages. That’s like running Debian unstable on your servers. I am distinctly uncomfortable with this notion, though it seems FreeBSD people report it mostly works well.

There are some other issues, too: FreeBSD ports make no distinction between development and runtime files like Debian’s packages do. So, just by virtue of wanting to run a graphical desktop, you get all of the static libraries, include files, build scripts, etc for XOrg installed.

For a package as concerned about licensing as FreeBSD, the packages collection does not have separate sections like Debian’s main, contrib, and non-free. It’s all in one big pot: BSD-license, GPL-license, proprietary without source license. There is /usr/local/share/licenses where you can look up a license for each package, but there is no way with FreeBSD, like there is with Debian, to say “never even show me packages that aren’t DFSG-free.” This is useful, for instance, when running in a company to make sure you never install packages that are for personal use only or something.

The bad: ABI stability

I’m used to being able to run binaries I compiled years ago on a modern system. This is generally possible in Linux, assuming you have the correct shared libraries available. In FreeBSD, this is explicitly NOT possible. After every major version upgrade, you must reinstall or recompile every binary on your system.

This is not necessarily a showstopper for me, but it is a hassle for a lot of people.

Update 2015-02-17: Some people in the comments are pointing out compat packages in the ports that may help with this situation. My comment was based on advice in the FreeBSD Handbook stating “After a major version upgrade, all installed packages and ports need to be upgraded”. I have not directly tried this, so if the Handbook is overstating the need, then this point may be in error.

Conclusions

As I said above, I found little validation to the comments that the Debian ecosystem is noticeably worse than the FreeBSD one. Debian has its warts too — particularly with keeping software up-to-date. You can see that the two projects are designed around a different passion: FreeBSD’s around the base system, and Debian’s around an integrated whole system. It would be wrong to say that either of those is always better. FreeBSD’s approach clearly produces some leading features, especially jails and ZFS integration. Yet Debian’s approach also produces some leading features in the way of package management and security maintainability beyond the small base.

My criticism of excessive complexity in the polkit/cgmanager/dbus area still stands. But to those people commenting that FreeBSD hasn’t “lost its way” like Linux has, I would point out that FreeBSD mostly uses these same components also, and FreeBSD has excessive complexity in its ports/package system and system management tools. I think it’s a draw. You pick the best for your use case. If you’re looking for a platform to run a single custom app then perhaps all of the Debian package management benefits don’t apply to you (you may not even need FreeBSD’s packages, or just a few). The FreeBSD ZFS support or jails may well appeal. If you’re looking to run a desktop environment, or a server with some application that needs a ton of PHP, Python, Perl, or C libraries, then Debian’s package management and security handling may well be attractive.

I am disappointed that Debian GNU/kFreeBSD will not be a release architecture in jessie. That project had the promise to provide a best of both worlds for those that want jails or tight ZFS integration.

Has modern Linux lost its way? (Some thoughts on jessie)

For years, I used to run Debian sid (unstable) on all my personal machines. Laptops, workstations, sometimes even my personal servers years ago ran sid. Sid was, as its name implies, unstable. Sometimes things broke. But it wasn’t a big deal, because I could always get in there and fix it fairly quickly, whatever it was. It was the price I paid for the latest and greatest.

For the last number of months, I’ve dealt with a small but annoying issue in jessie: None of Nautilus, Thunar, or digikam (yes, that represents Gnome, XFCE, and KDE) can mount USB drives I plug in anymore. I just get “Not authorized to perform operation.” I can, of course, still mount -o uid=1000 /dev/sdc1 /mnt, but I miss the convenience of doing it this way.

One jessie system I switched to systemd specifically to get around this problem. It worked, but I don’t know why. I haven’t had the time to switch my workstation, and frankly I am concerned about it.

Here’s the crux of the issue: I don’t even know where to start looking. I’ve googled this issue, and found all sorts of answers pointing to polkit, or dbus, or systemd-shim, or cgmanager, or lightdm, or XFCE, or… I found a bug report of this exact problem — Debian #760281, but it’s marked fixed, and nobody replied to my comment that I’m still seeing it.

Nowhere is it documented that a Digikam mounting issue should have me looking in polkit, let alone cgmanager. And even once I find those packages, their documentation suffers from Bad Unix Documentation Disease: talking about the nitty-gritty weeds view of something, without bothering to put it in context. Here is the mystifying heading for the cgmanager(8) manpage:

cgmanager is a daemon to manage cgroups. Programs and users can make D-Bus requests to administer cgroups over which they have privilege. To ensure that users may not exceed their privilege in manipulating cgroups, the cgroup manager accepts regular D-Bus requests only from tasks within its own process-id and user namespaces. For tasks in private namespaces (such as containers), SCM-enhanced D-Bus calls are available. Using these manually is not recommended. Rather, each container is advised to run a cgproxy, which will forward plain D-Bus requests as SCM-enhanced D-Bus requests to the host cgmanager.

That’s about as comprehensible as Vogon poetry to me. How is cgmanager started? What does “SCM-enhanced” mean? And I even know what a cgroup is.

This has been going on for months, which has me also wondering: is it only me? (Google certainly suggests it’s not, and there are plenty of hits for this exact problem with many distros, and some truly terrible advice out there to boot.) And if not, why is something so basic and obvious festering for so long? Have we built something that’s too complex to understand and debug?

This is, in my mind, orthogonal to the systemd question. I used to be able to say Linux was clean, logical, well put-together, and organized. I can’t really say this anymore. Users and groups are not really determinitive for permissions, now that we have things like polkit running around. (Yes, by the way, I am a member of plugdev.) Error messages are unhelpful (WHY was I not authorized?) and logs are nowhere to be found. Traditionally, one could twiddle who could mount devices via /etc/fstab lines and perhaps some sudo rules. Granted, you had to know where to look, but when you did, it was simple; only two pieces to fit together. I’ve even spent time figuring out where to look and STILL have no idea what to do.

systemd may help with some of this, and may hurt with some of it; but I see the problem more of an attitude of desktop environments to add features fast without really thinking of the implications. There is something to be said for slower progress if the result is higher quality.

Then as I was writing this, of course, my laptop started insisting that it needed the root password to suspend when I close the lid. And it’s running systemd. There’s another quagmire…

Update: Part 2 with some reactions to this and further thoughts is now available.

Debian – A plea to worry about what matters, and not take ourselves too seriously

I posted this on debian-devel today. I am also posting it here, because I believe it is important to more than just Debian developers.

Good afternoon,

This message comes on the heels of Sam Hartman’s wonderful plea for compassion [1] and the sad news of Joey Hess’s resignation from Debian [2].

I no longer frequently post to this list, but when you’ve been a Debian developer for 18 years, and still care deeply about the community and the project, perhaps you have a bit of perspective to share.

Let me start with this:

Debian is not a Free Software project.

Debian is a making-the-world-better project, a caring for people project, a freedom-spreading project. Free Software is our tool.

As many of you, hopefully all of you, I joined Debian because I enjoyed working on this project. We all did, didn’t we? We joined Debian because it was fun, because we were passionate about it, because we wanted to make the world a better place and have fun doing it.

In short, Debian is life-giving, both to its developers and its users.

As volunteers, it is healthy to step back every so often, and ask ourselves two questions: 1) Is this activity still life-giving for me? 2) Is it life-giving for others?

I have my opinions about init. Strong ones, in fact. [3] They’re not terribly relevant to this post. Because I can see that they are not really all that relevant.

14 years ago, I proposed what was, until now anyhow, one of the most controversial GRs in Debian history. It didn’t go the way I hoped. I cared about it deeply then, and still care about the principles.

I had two choices: I could be angry and let that process ruin my enjoyment of Debian. Or I could let it pass, and continue to have fun working on a project that I love. I am glad I chose the latter.

Remember, for today, one way or another, jessie will still boot.

18 years ago when I joined Debian, our major concerns were helping newbies figure out how to compile their kernels, finding manuals for monitors so we could set the X modelines properly, finding some sort of Free web browser, finding some acceptable Office-type software.

Wow. We WON, didn’t we? Not just Debian, but everyone. Freedom won.

I promise you – 18 years from now, it will not matter what init Debian chose in 2014. It will probably barely matter in 3 years. This is not key to our goals of making the world a better place. Jessie will still boot. I say that even though my system runs out of memory every few days because systemd-logind has a mysterious bug [4]. It will be fixed. I say that even though I don’t know what init system it will use, or how much choice there will be. I say that because it is simply true. We are Debian. We will make it work, one way or another.

I don’t post much on this list anymore because my personal passion isn’t with posting on this list anymore. I make liberal use of my Delete Thread keybinding on -vote these days, because although I care about the GR, I don’t care about it enough to read all the messages about it. I have not yet decided if I will spend the time researching it in order to vote. Instead of debating the init GR, sometimes I sit on the sofa with my wife. Sometimes I go out and fly the remote-control airplane I’m learning to fly. Sometimes I repair my plane after a flight that was shorter than planned. Sometimes I play games with my boys, or help them with homework, or share my 8-year-old’s delight as a text file full of facts about the Titanic that he wrote in Emacs comes spitting out of the printer. Sometimes I write code or play with the latest Linux filesystems or build a new server for my basement.

All these things matter more to me than init. I have been using Debian at home for almost 20 years, at various workplaces for almost that long, and it is not going to stop being a part of my life any time soon. Perhaps I will have to learn how to administer a new init system. Well, so be it; I enjoy learning new things. Or perhaps I will have to learn to live with some desktop limitations with an old init system. Well, so be it; it won’t bother me much anyhow. Either way, I’m still going to be using what is, to me, the best operating system in the world, made by one of the world’s foremost Freedom projects.

My hope is that all of you may also have the sense of peace I do, that you may have your strong convictions, but may put them all in perspective. That we as a project realize that the enemy isn’t the lovers of the other init, but the people that would use law and technology to repress people all over the world. We are but one shining beacon on a hill, but the world will be worse off if our beacon winked out.

My plea is that we each may get angry at what matters, and let go of the smaller frustrations in life; that we may each find something more important than init/systemd to derive enjoyment and meaning from. [5] May you each find that airplane to soar freely in the skies, to lift your soul so that the joy of using Free Software to make the world a better place may still be here, regardless of what /sbin/init is.

[1] https://lists.debian.org/debian-project/2014/11/msg00002.html

[2] https://lists.debian.org/debian-devel/2014/11/msg00174.html

[3] A hint might be that in my more grumpy moments, I realize I haven’t ever quite figured out why the heck this dbus thing is on so many of my systems, or why I have to edit XML to configure it… ;-)

[4] #765870

[5] No disrespect meant to the init/systemd maintainers. Keep enjoying what you do, too!

Update on the systemd issue

The other day, I wrote about my poor first impressions of systemd in jessie. Here’s an update.

I’d like to start with the things that are good. I found the systemd community to be one of the most helpful in Debian, and #debian-systemd IRC channel to be especially helpful. I was in there for quite some time yesterday, and appreciated the help from many people, especially Michael. This is a nontechnical factor, but is extremely important; this has significantly allayed my concerns about systemd right there.

There are things about the systemd design that impress. The dependency system and configuration system is a lot more flexible than sysvinit. It is also a lot more complicated, and difficult to figure out what’s happening. I am unconvinced of the utility of parallelization of boot to begin with; I rarely reboot any of my Linux systems, desktops or servers, and it seems to introduce needless complexity.

Anyhow, on to the filesystem problem, and a bit of a background. My laptop runs ZFS, which is somewhat similar to btrfs in that it’s a volume manager (like LVM), RAID manager (like md), and filesystem in one. My system runs LVM, and inside LVM, I have two ZFS “pools” (volume groups): one, called rpool, that is unencrypted and holds mainly the operating system; and the other, called crypt, that is stacked atop LUKS. ZFS on Linux doesn’t yet have built-in crypto, which is why LVM is even in the picture here (to separate out the SSD at a level above ZFS to permit parts of it to be encrypted). This is a bit of an antiquated setup for me; as more systems have AES-NI, I’m going to everything except /boot being encrypted.

Anyhow, inside rpool is the / filesystem, /var, and /usr. Inside /crypt is /tmp and /home.

Initially, I tried to just boot it, knowing that systemd is supposed to work with LSB init scripts, and ZFS has init scripts with carefully-planned dependencies. This was evidently not working, perhaps because /lib/systemd/systemd/ It turns out that systemd has a few assumptions that turn out to be less true with ZFS than otherwise. ZFS filesystems are normally not mounted via /etc/fstab; a ZFS pool has internal properties about which dataset gets mounted where (similar to LVM’s actions after a vgscan and vgchange -ay). Even though there are ordering constraints in the units, systemd is writing files to /var before /var gets mounted, resulting in the mount failing (unlike ext4, ZFS by default will reject an attempt to mount over a non-empty directory). Partly this due to the debian-fixup.service, and partly it is due to systemd reacting to udev items like backlight.

This problem was eventually worked around by doing zfs set mountpoint=legacy rpool/var, and then adding a line to fstab (“rpool/var /var zfs defaults 0 2”) for /var and its descendent filesystems.

This left the problem of /tmp; again, it wasn’t getting mounted soon enough. In this case, it required crypttab to be processed first, and there seem to be a lot of bugs in the crypttab processing in systemd (more on that below). I eventually worked around that by adding After=cryptsetup.target to the zfs-import-cache.service file. For /tmp, it did NOT work to put it in /etc/fstab, because then it tried to mount it before starting cryptsetup for some reason. It probably didn’t help that the system’s cryptdisks.service is a symlink to /dev/null, a fact I didn’t realize until after a lot of needless reboots.

Anyhow, one thing I stumbled across was poor console control with systemd. On numerous occasions, I had things like two cryptsetup processes trying to read a password, plus an emergency mode console trying to do so. I had this memorable line of text at one point:

(or type Control-D to continue): Please enter passphrase for disk athena-crypttank (crypt)! [ OK ] Stopped Emergency Shell.

And here we venture into unsatisfying territory with systemd. One answer to this in IRC was to install plymouth, which apparently serializes console I/O. However, plymouth is “an attractive boot animation in place of the text messages that normally get shown.” I don’t want an “attractive boot animation”. Nevertheless, neither systemd-sysv nor cryptsetup depends on plymouth, so by default, the prompt for a password at boot is obscured by various other text.

Worse, plymouth doesn’t support serial consoles, so at the moment booting a system that uses LUKS with systemd over a serial console is a matter of blind luck of typing the right password at the right time.

In the end, though, the system booted and after a few more tweaks, the backlight buttons do their thing again. Whew!

Update 2014-10-13: uau pointed out that Plymouth is more than a bootsplash, and can work with serial consoles, despite the description of the package. I stand corrected on that. (It is still the case, however, that packages don’t depend on it where they should, and the default experience for people using cryptsetup is not very good.)

First impressions of systemd, and they’re not good

Well, I finally bit the bullet. My laptop, which runs jessie, got dist-upgraded for the first time in a few months. My brightness keys stopped working, and it no longer would suspend to RAM when the lid was closed, and upon chasing things down from XFCE to policykit, eventually it appears that suddenly major parts of the desktop breaks without systemd in jessie. Sigh.

So apt-get install systemd-sysv (and watch sysvinit-core get uninstalled) and reboot.

Only, my system doesn’t come back up. In fact, over several hours of trying to make it boot with systemd, it failed in numerous spectacular and hilarious (or, would be hilarious if my laptop would boot) ways. I had text obliterating the cryptsetup password prompt almost every time. Sometimes there were two processes trying to read a cryptsetup password at once. Sometimes a process was trying to read that while another one was trying to read an emergency shell password. Many times it tried to write to /var and /tmp before they were mounted, meaning they *wouldn’t* mount because there was stuff there.

I noticed it not doing much with ZFS, complaining of a dependency loop between zfs-mount and $local-fs. I fixed that, but it still wouldn’t boot. In fact, it simply hung after writing something about wall passwords.

I’ve dug into systemd, finding a “unit generator for fstab” (whatever the hack that is, it’s not at all made clear by systemd-fstab-generator(8)).

In some cases, there’s info in journalctl, but if I can’t even get to an emergency mode prompt, the practice of hiding all stdout and stderr output is not all that pleasant.

I remember thinking “what’s all the flaming about?” systemd wasn’t my first choice (I always thought “if it ain’t broke, don’t fix it” about sysvinit), but basically ignored the thousands of messages, thinking whatever happens, jessie will still boot.

Now I’m not so sure. Even if the thing boots out of the box, it seems like the boot process with systemd is colossally fragile.

For now, at least zfs rollback can undo upgrades to 800 packages in about 2 seconds. But I can’t stay at some early jessie checkpoint forever.

Have we made a vast mistake that can’t be undone? (If things like even *brightness keys* now require systemd…)

Why and how to run ZFS on Linux

I’m writing a bit about ZFS these days, and I thought I’d write a bit about why I am using it, why it might or might not be interesting for you, and what you might do about it.

ZFS Features and Background

ZFS is not just a filesystem in the traditional sense, though you can use it that way. It is an integrated storage stack, which can completely replace the need for LVM, md-raid, and even hardware RAID controllers. This permits quite a bit of flexibility and optimization not present when building a stack involving those components. For instance, if a drive in a RAID fails, it needs only rebuild the parts that have actual data stored on them.

Let’s look at some of the features of ZFS:

  • Full checksumming of all data and metadata, providing protection against silent data corruption. The only other Linux filesystem to offer this is btrfs.
  • ZFS is a transactional filesystem that ensures consistent data and metadata.
  • ZFS is copy-on-write, with snapshots that are cheap to create and impose virtually undetectable performance hits. Compare to LVM snapshots, which make writes notoriously slow and require an fsck and mount to get to a readable point.
  • ZFS supports easy rollback to previous snapshots.
  • ZFS send/receive can perform incremental backups much faster than rsync, particularly on systems with many unmodified files. Since it works from snapshots, it guarantees a consistent point-in-time image as well.
  • Snapshots can be turned into writeable “clones”, which simply use copy-on-write semantics. It’s like a cp -r that completes almost instantly and takes no space until you change it.
  • The datasets (“filesystems” or “logical volumes” in LVM terms) in a zpool (“volume group”, to use LVM terms) can shrink or grow dynamically. They can have individual maximum and minimum sizes set, but unlike LVM, where if, say, /usr gets bigger than you thought, you have to manually allocate more space to it, ZFS datasets can use any space available in the pool.
  • ZFS is designed to run well in big iron, and scales to massive amounts of storage. It supports SSDs as L2 cache and ZIL (intent log) devices.
  • ZFS has some built-in compression methods that are quite CPU-efficient and can yield not just space but performance benefits in almost all cases involving compressible data.
  • ZFS pools can host zvols, a block device under /dev that stores its data in the zpool. zvols support TRIM/DISCARD, so are ideal for storing VM images, as they can instantly release space released by the guest OS. They can also be snapshotted and backed up like the rest of ZFS.

Although it is often considered a server filesystem, ZFS has been used in plenty of other situations for some time now, with ports to FreeBSD, Linux, and MacOS. I find it particularly useful:

  • To have faith that my photos, backups, and paperwork archives are intact. zpool scrub at any time will read the entire dataset and verify the integrity of every bit.
  • I can create snapshots of my system before running apt-get dist-upgrade, making it easy to track down issues or roll back to a known-good configuration. Ideal for people tracking sid or testing. One can also easily simply boot from a previous snapshot.
  • Many scripts exist that make frequent snapshots, and retain the for a period of time as a way of protecting work in progress against an accidental rm. There is no reason not to snapshot /home every 5 minutes, for instance. It’s almost as good as storing / in git.

The added level of security in having cheap snapshots available is almost worth it by itself.

ZFS drawbacks

Compared to other Linux filesystems, there are a few drawbacks of ZFS:

  • CDDL will prevent it from ever being part of the Linus kernel tree
  • It is more RAM-hungry than most, although with tuning it can even run on the Raspberry Pi.
  • A 64-bit kernel is strongly preferred, even in low-memory situations.
  • Performance on many small files may be less than ext4
  • The ZFS cache does not shrink and expand in response to changing RAM usage conditions on the system as well as the normal Linux cache does.
  • Compared to btrfs, ZFS lacks some features of btrfs, such as being able to shrink an existing pool or easily change storage allocation on the fly. On the other hand, the features in ZFS have never caused me a kernel panic, and half the things I liked about btrfs seem to have.
  • ZFS is already quite stable on Linux. However, the GRUB, init, and initramfs code supporting booting from a ZFS root and /boot is less stable. If you want to go 100% ZFS, be prepared to tweak your system to get it to boot properly. Once done, however, it is quite stable.

Converting to ZFS

I have written up an extensive HOWTO on converting an existing system to use ZFS. It covers workarounds for all the boot-time bugs I have encountered as well as documenting all steps needed to make it happen. It works quite well.

Additional Hints

If setting up zvols to be used by VirtualBox or some such system, you might be interested in managing zvol ownership and permissions with udev.

Voice Keying with bash, sox, and aplay

There are plenty of times where it is nice to have Linux transmit things out a radio. One obvious example is the digital communication modes, where software acts as a sort of modem. A prominent example of this in Debian is fldigi.

Sometimes, it is nice to transmit voice instead of a digital signal. This is called voice keying. When operating a contest, for instance, a person might call CQ over and over, with just some brief gaps.

Most people that interface a radio with a computer use a sound card interface of some sort. The more modern of these have a simple USB cable that connects to the computer and acts as a USB sound card. So, at a certain level, all that you have to do is play sound out a specific device.

But it’s not quite so easy, because there is one other wrinkle: you have to engage the radio’s transmitter. This is obviously not something that is part of typical sound card APIs. There are all sorts of ways to do it, ranging from dedicated serial or parallel port circuits involving asserting voltage on certain pins, to voice-activated (VOX) circuits.

I have used two of these interfaces: the basic Signalink USB and the more powerful RigExpert TI-5. The Signalink USB integrates a VOX circuit and provides cabling to engage the transmitter when VOX is tripped. The TI-5, on the other hand, emulates three USB serial ports, and if you raise RTS on one of them, it will keep the transmitter engaged as long as RTS is high. This is a more accurate and precise approach.

VOX-based voice keying with the Signalink USB

But let’s first look at the Signalink USB case. The problem here is that its VOX circuit is really tuned for digital transmissions, which tend to be either really loud or completely silent. Human speech rises and falls in volume, and it tends to rapidly assert and drop PTT (Push-To-Talk, the name for the control that engages the radio’s transmitter) when used with VOX.

The solution I hit on was to add a constant, loud tone to the transmitted audio, but one which is outside the range of frequencies that the radio will transmit (which is usually no higher than 3kHz). This can be done using sox and aplay, the ALSA player. Here’s my script to call cq with Signalink USB:

#!/bin/bash
# NOTE: use alsamixer and set playback gain to 99
set -e

playcmd () {
        sox -V0 -m "$1" \
           "| sox -V0 -r 44100 $1 -t wav -c 1 -   synth sine 20000 gain -1" \
            -t wav - | \
           aplay -q  -D default:CARD=CODEC
}

DELAY=${1:-1.5}

echo -n "Started at: "
date

STARTTIME=`date +%s`
while true; do
        printf "\r"
        echo -n $(( (`date +%s`-$STARTTIME) / 60))
        printf "m/${DELAY}s: TRANSMIT"
        playcmd ~/audio/cq/cq.wav
        printf "\r"
        echo -n $(( (`date +%s`-$STARTTIME) / 60))
        printf "m/${DELAY}s: off         "
        sleep $DELAY
done

Run this, and it will continuously play your message, with a 1.5s gap in between during which the transmitter is not keyed.

The screen will look like this:

Started at: Fri Aug 24 21:17:47 CDT 2012
2m/1.5s: off

The 2m is how long it’s been going this time, and the 1.5s shows the configured gap.

The sox commands are really two nested ones. The -m causes sox to merge the .wav file in $1 with the 20kHz sine wave being generated, and the entire thing is piped to the ALSA player.

Tweaks for RigExpert TI-5

This is actually a much simpler case. We just replace playcmd as follows:

playcmd () {
        ~/bin/raiserts /dev/ttyUSB1 'aplay -q -D default:CARD=CODEC' < "$1"
}

Where raiserts is a program that simply keeps RTS asserted on the serial port while the given command executes. Here's its source, which I modified a bit from a program I found online:

/* modified from
 * https://www.linuxquestions.org/questions/programming-9/manually-controlling-rts-cts-326590/
 * */
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


static struct termios oldterminfo;


void closeserial(int fd)
{
    tcsetattr(fd, TCSANOW, &oldterminfo);
    if (close(fd) < 0)
        perror("closeserial()");
}


int openserial(char *devicename)
{
    int fd;
    struct termios attr;

    if ((fd = open(devicename, O_RDWR)) == -1) {
        perror("openserial(): open()");
        return 0;
    }
    if (tcgetattr(fd, &oldterminfo) == -1) {
        perror("openserial(): tcgetattr()");
        return 0;
    }
    attr = oldterminfo;
    attr.c_cflag |= CRTSCTS | CLOCAL;
    attr.c_oflag = 0;
    if (tcflush(fd, TCIOFLUSH) == -1) {
        perror("openserial(): tcflush()");
        return 0;
    }
    if (tcsetattr(fd, TCSANOW, &attr) == -1) {
        perror("initserial(): tcsetattr()");
        return 0;
    }
    return fd;
}


int setRTS(int fd, int level)
{
    int status;

    if (ioctl(fd, TIOCMGET, &status) == -1) {
        perror("setRTS(): TIOCMGET");
        return 0;
    }
    status &= ~TIOCM_DTR;   /* ALWAYS clear DTR */
    if (level)
        status |= TIOCM_RTS;
    else
        status &= ~TIOCM_RTS;
    if (ioctl(fd, TIOCMSET, &status) == -1) {
        perror("setRTS(): TIOCMSET");
        return 0;
    }
    return 1;
}


int main(int argc, char *argv[])
{
    int fd, retval;
    char *serialdev;

    if (argc < 3) {
        printf("Syntax: raiserts /dev/ttyname 'command to run while RTS held'\n");
        return 5;
    }
    serialdev = argv[1];
    fd = openserial(serialdev);
    if (!fd) {
        fprintf(stderr, "Error while initializing %s.\n", serialdev);
        return 1;
    }

    setRTS(fd, 1);
    retval = system(argv[2]);
    setRTS(fd, 0);

    closeserial(fd);
    return retval;
}

This compiles to an executable less than 10K in size. I love it when that happens.

So these examples support voice keying both with VOX circuits and with serial-controlled PTT. raiserts.c could be trivially modified to control other serial pins as well, should you have an interface which uses different ones.

Windows & a dying hard disk: Solving with Linux

Today, my workstation sent me this email:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

and then a little later, this one:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Offline uncorrectable sectors

From the hard disk’s SMART data, this is a clue that the drive is failing or will soon. Sigh. Incidentally, if smartmontools isn’t installed on your machine, whether it’s a laptop, desktop, or server, it should be.

Although most of you know I run Linux on the metal on my machines almost exclusively, I do maintain a small drive with a Windows installation that I boot into every few months for various reasons. This is that drive.

The drive is non-redundant (no RAID), and although it is backed up, the backup is made via backuppc from the NTFS filesystem mounted on Linux, and is a partial backup – backing up certain data, not the OS. There are, of course, bare metal Windows backup solutions, but I generally don’t want to back up Windows from within Windows on this machine. Restoring Windows isn’t quite as simple as an mkfs, an untar, and a grub-install, either.

So my first thought is: immediately save whatever of the drive I can. So I ran apt-get install gddrescue to install the GNU ddrescue tool. ddrescue is somewhat similar to dd, but deals much more intelligently with bad blocks on the drive. It will try to read them repeatedly, with decreasing block sizes, in an effort to get every last good byte off the disk that it can. If it ultimately fails to get certain bytes read, it will write placeholder data to the output file in place of the missing data, so that the output file maintains proper size and alignment. It also saves a log file that notes what it found (see info ddrescue for more on that.)

So I created an LVM volume for the purpose (not enough free space on /home, and didn’t want to have to shrink it somehow later), and ran:

ddrescue /dev/sda /mnt/sdasave.ddrescue /mnt/sdasave.logfile

Then I went to dinner.

When I got back, I discovered there were 1 or 2 bad sectors, about halfway through the disk, but everything else was fine. So now, the question became: did I lose any data? If so, what? I needed to know if I had to revert to a backup for anything or not.

To answer THAT question, first I had to figure out the offset of the bad spots on the disk. That’s not too hard; the logfile gives it to me:

# Rescue Logfile. Created by GNU ddrescue version 1.15
# Command line: ddrescue /dev/sda /mnt/sdasave.ddrescue sdasave.logfile
# current_pos  current_status
0x3BBB8BFC00     +
#      pos        size  status
0x00000000  0x3BBB8BF000  +
0x3BBB8BF000  0x00001000  -
0x3BBB8C0000  0x38B5346000  +

what we see is that the bad sector starts at byte 0x3BBB8BF000 (256549580800 decimal) and extends for 0x1000 bytes (4096 decimal). Both the drive and NTFS use 512-byte sectors. So dividing by 512, we get sector 501073400 – 501073407 (4096 bytes is 8 sectors).

As a check, I ran grep sector /var/log/kern.log and turned up a bunch of lines like this:

Jun 14 21:39:11 hephaestus kernel: [35346.929957] end_request: I/O error, dev sda, sector 501073404

Which is within my calculated range.

But this is an absolute sector on the disk. We need the sector within the partition, so for that, we have to enlist fdisk to make that calculation.

fdisk shows, among other things:

Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   976771071   488384512    7  HPFS/NTFS/exFAT

So the Windows partition starts at disk sector 2048.

Let’s just confirm that. If I use dd if=/dev/sda1 bs=512 count=1 | hd | head, I see a line beginning with “.R.NTFS”. Exactly the same as with dd if=/dev/sda bs=512 count=1 skip=2048 | hd | head, so I read the partition table information correctly.

Subtract offset of 2048 from the earlier values, and I get relative sectors 501071352-501071359.

That’s enough to get some solid info from the filesystem via ntfscluster, part of Debian’s ntfs-3g package. I pass -s to it, and ignoring some irrelevant stuff, get my answer:

ntfscluster -s 501071352-501071359 /dev/sda1
Inode 190604 /System Volume Information/{b4816feb-b609-11e1-a908-50e549b934f7}{3808876b-c176-4e48-b7ae-04046e6cc752}/$DATA

I even reran it with a much larger sector range, just to be absolutely sure I had wiggle room in case calculations had an off-by-one error or something somewhere.

This is really great news, because the file in question is pretty much useless – I believe it’s a system restore point, which I won’t be needing anyhow.

So at this point, all that remains is to reinstall this on a different drive. For that, I could just use my ddrescue image. I thought I would take a second image, just to be very extra careful, and use that; I used:

partclone.ntfs --rescue -c -s /dev/sda1 -o sda1.partclone

although ntfsclone would work just as well. This captures only the partition; I’ll need the partition table as well, and perhaps also the space between the partition table and the first partition. I could capture it separately with dd, but it’s already in the ddrescue image, so there’s no need. (GRUB is installed on this drive, but there is no Linux filesystem on it, so it may well exceed the size of the MBR).

Note that for Linux ext[234] filesystems, debugfs can provide the same (and more) info as I got from ntfscluster.

I happen to have a drive of the right size sitting here, which I was about to install in a different machine. So a wipe and a swap and a restore later, and I should be good to go.

This scenario is commonplace enough that I thought I’d post how I dealt with it, in case anyone else ever has hard drive issues.

How to debugging Linux failure to resume from suspend?

I’m running a computer with a Gigabyte Z68A-D3H-B3 motherboard, and have never been able to get it to properly resume from suspend to RAM in Linux. It has worked fine on the rare occasion I’ve tried it in Windows 7.

My somewhat limited usual for debugging aren’t particularly helpful. The system appears to suspend perfectly fine. It just doesn’t resume. To be more precise, when I push the button to resume, the power comes up (fans whir, HDD spins up, etc.) but nothing happens. The USB keyboard and mouse don’t respond, Caps Lock doesn’t toggle any LEDs, it doesn’t respond on the wired LAN, and the display stays off.

Although it’s a desktop, I’d really like to save power on this thing by suspending it when it’s not in use. There’s no sense in wasting power I don’t need to be consuming.

I’ve tried what I used to try on laptops. I tried running in single-user mode, without X, or even the kernel modules for video acceleration loaded. I tried unloading whatever hardware modules I thought I could without completely destabilizing the system. I updated the BIOS to the latest release. I tried various combinations of video tweaks. I tried using s2ram from uswsusp instead of pm-suspend. Nothing made any difference. They all behaved exactly the same.

Googling showed a lot of resources for people that had trouble getting their machines to go to sleep. And also for people whose machines would wake up but just wouldn’t re-activate the display. But precious little for people with my particular symptoms.

What’s a good place to start looking to fix something like this?

Some details…

CPU is Core i5-2400. Kernel is wheezy’s 3.2.0-2-amd64, though this problem has persisted as long as I’ve had this machine, which was running squeeze at install time. Video is NVidia GeForce GTX 560 (GF114). Hard drives are SATA, Ethernet is integrated RTL8111/8168B. Userland is up-to-date amd64 wheezy.

Shell Scripts For Preschoolers

It probably comes as no surprise to anybody that Jacob has had a computer since he was 3. Jacob and I built it from spare parts, together.

It may come as something of a surprise that it has no graphical interface, and Jacob uses the command line and loves it — and did even before he could really read.

A few months ago, I wrote about the fun Jacob had with speakers and a microphone, and posted a copy of the cheat sheet he has with his computer. Lately, Jacob has really enjoyed playing with the speech synthesizer — both trying to make it say real words and nonsense words. Sometimes he does that for an hour.

I was asked for a copy of the scripts I wrote. They are really simple. I gave them names that would be easy for a preschooler to remember and spell, even if they conflicted with existing Unix/Linux commands. I put them in /usr/local/bin, which occurs first on the PATH, so it doesn’t matter if they conflict.

First, for speech systhesis, /usr/local/bin/talk:


#!/bin/bash
echo "Press Ctrl-C to stop."
espeak -v en-us -s 150

espeak comes from the espeak package. It seemed to give the most consistenly useful response.

Now, on to the sound-related programs. Here’s /usr/local/bin/ssl, the “sound steam locomotive”. It starts playing a train sound if one isn’t already playing:


#!/bin/bash
pgrep mpg321 > /dev/null || mpg321 -q /usr/local/trainsounds/main.mp3 &
sl "$@"

And then there’s /usr/local/bin/record:


#!/bin/bash
cd $HOME/recordings
echo "Now recording. Press Ctrl-C to stop."
DATE=`date +%Y-%m-%dT%H-%M-%S`
FILENAME="$DATE-$$.wav"
chmod a-w *.wav
exec arecord -c 1 -f S16_LE -c 1 -r 44100 "$FILENAME"

This simply records in a timestamped file. Then, its companion, /usr/local/bin/play. Sorry about the indentation; for whatever reason, it is being destroyed by the blog, but you get the idea.


#!/bin/bash
case "$1" in
train)
mpg321 /usr/local/trainsounds/main.mp3
;;
song)
/usr/bin/play /usr/local/trainsounds/traindreams.flac
;;
*)
cd $HOME/recordings
exec aplay `ls -tr| tail -n 1`
;;
esac

So, Jacob can run just “play”, which will play back his most recent recording. As something of a bonus, the history of recordings is saved for us to listen to later. If he types “play train”, there is the sound of a train passing. And, finally, “play song” plays Always a Train in My Dreams by Steve Gillette (I heard it on the radio once and bought the CD).

Some of these commands kick off sound playing in the background, so here is /usr/local/bin/bequiet:


#!/bin/bash
killall mpg321 &> /dev/null
killall play &> /dev/null
killall aplay &> /dev/null
killall cw &> /dev/null