Category Archives: Technology

Home Automation, part 2: Z-Wave and ISY programming

In my part 1 post yesterday, I wrote about the start of the home automation project. I mentioned that I was using Insteon switches, and they mostly were working well (I forgot to mention an annoyance: you can dim, but not totally shut off, their LED status light.) Anyhow, the Insteon battery-operated sensors seem to not be as good as their Z-Wave competition.

Setting up Z-Wave

Z-Wave devices get joined to the Z-Wave network in much the same way Bluetooth devices get paired with each other. The first time you use a Z-Wave device, you join it to the controller. The controller assigns it an ID on your network, and both devices discover the best route to communicate with each other.

I discovered that the Z-Wave module on the ISY-994i has particularly poor reception. Combined with the generally short range of battery-operated Z-wave devices, this meant that my sensors didn’t work reliably. As with Insteon, AC-powered Z-wave devices tend to be repeaters, but I didn’t have any AC-powered Z-wave devices. I went looking for Z-wave repeaters, and found some. But then I discovered that a Z-wave relay-based appliance switch was actually $5 cheaper, acted as a repeater, and could be used as a switch down the road if needed. A couple of those solved my communications issues. You join them to the network as usual, but then either re-join the battery-powered sensors (so they see the new route) or do a “network heal” (every device on the network re-learns about its neighbors and routes to the other devices) so they see the new route.

Z-Wave Motion / Occupancy Sensors

If you have been in new buildings, chances are you have seen light switches with built-in occupancy sensors. My doctor’s office has these. They are typically the same size as a regular light switch, but with a passive infrared (PIR) motion sensor used to automatically turn the lights off after a timeout (or even on, when someone walks into the room.) These work fine for small rooms, but if you’ve ever been in a bathroom with a PIR lightswitch that goes off while you’re still in the room, you are aware of their faults for larger rooms!

Most of the time, when you read about “occupancy sensors”, it’s really a PIR sensor, and the term is used interchangeably with “motion detector.” Motion detectors in home automation systems are commonly used for a number of purposes: detecting occuption of rooms for automatic control of lighting or HVAC systems, triggering alerts, or even locking or unlocking doors.

My friend told me he had poor luck with the Insteon PIR sensors, so I tried two different Z-Wave models: the $48 Aeon/Aeotec multi-sensor and the $30 Ecolink PIRZWAVE2-ECO. The Aeon device is clearly the more capable; it can be used indoors or out, and has a lot of options that can be configured by Z-Wave configuration parameters. It also has a ball/socket mount, so it can be easily aimed in different directions. It can draw power from 4 AAA batteries, or a mini USB cable (cable, but not power supply, included). The Aeon multi-sensor also has a temperature, humidity, and illumination (lux) sensor – but, as you’ll see, they have some drawbacks.

The Ecolink device is more basic; it has a few settings that can be altered via jumpers, but none that can be altered via Z-Wave configuration commands. That makes it a bit of a hassle, since you have to open it up to change, and when you do, it triggers a “tamper alarm” that is both undocumented and never seems to go away. It is powered by a single CR123A 3V Lithium battery, which are about $2 each on Amazon.

Both units have a default PIR timeout of 4 minutes. That means that after sensing motion, they will transmit the “on” signal (meaning motion detected) and then transmit no further signals for at least 4 minutes. This is because operating the radio consumes far more battery power than simply monitoring the PIR sensor, and it cuts down on repeated on/off transmissions. In this configuration, both units should have battery life of 6 months to a year, I figure, with an edge for the Ecolink, perhaps.

Both can also be configured to transmit more frequently; the Ecolink has a jumper that can change its timeout to 10 seconds, whereas the Aeon can be configured over Z-Wave for any timeout between 10 seconds and 65535 seconds (values above 4 minutes are rounded to the nearest minute, for some reason.)

Both also have adjustable sensitivity; on the Aeon this is via an adjustment knob, and on the Ecolink it’s via jumpers. And both can report their battery level to enable software alerts when it’s getting low.

The Aeon ships with its temperature, humidity, battery, and lux sensors disabled. This appears to be the cause of much confusion online, as they send one transmission and then no more. Sadly, one has to resort to the hard-to-find but very helpful MultiSensor engineering spec document to figure out the way to enable those sensors and set their interval. (It must be said, however, that I doubt most consumers will understand how to set bitfields, and this is either not covered or covered incorrectly in their other manuals.) This can be a real power drain, so I just set it to report battery level every 6 hours (so I can alert myself if it gets low) and leave it at that.

The Ecolink manual claims a detection radius of 39 feet and a total angle of 90deg (45deg left or right from center) and a 3-year battery life (I’m skeptical). It also is rated for indoor use only. Aeotec claims a detection radius of 16 feet and a total angle of 120 deg. Be skeptical of all these figures.

Once set up with good reception, both devices have been working fine.

Programming the controller

The ISY-994i-Zw Pro controller I mentioned in Part 1 has its own sort of programming language. It’s a lot better than the sort of GUI clicky mess that is found in most of these things, but it still has a sort of annoying Java-based editor where you select keywords from a menu and such. Although you can backup and restore the device, and import and export the programs, the file formats are XML and not really suitable for hand-editing. Sigh.

The language is limited, but gets the job done. Here is a simple program that turns off the fan in a bathroom 30 minutes after the light was turned off:

        Status  '1st Floor / Bathroom + Laundry / Main Bath Fan' is On
    And Status  '1st Floor / Bathroom + Laundry / Main Bath Light' is Off
        Wait  30 minutes 
        Set '1st Floor / Bathroom + Laundry / Main Bath Fan' Off
   - No Actions - (To add one, press 'Action')

The if/then/else clauses should probably be more properly called when/do/finally. The “if” clauses are evaluated whenever a relevant event occurs. If the If evaluates to true, if executes the Then section. The program in “Then” is uninterruptible except for “wait” and “repeat” statements. So in this case, the program begins, and if the light stays off and the fan stays on, 30 minutes later it turns off the fan. But if the light comes back on, the program aborts (since there is nothing in “else”). Similarly, if the fan goes off, the program aborts.

A more complicated example: motion-activated lamp

There is a table lamp in our living room controlled by Insteon. By adding a motion sensor in that room, it can be automatically turned on by someone walking through the room. Now, to be useful, I don’t want it to turn on during the day. I also don’t want it to turn on or adjust itself if other lights are on in the room, or if I turned it on myself; that could cause it to go on and off while I’m watching TV, for instance. It’s just to be helpful at night. Because the ISY-994i is pretty limited, having almost no control flow operations, this — as with many tasks — requires several “programs”.

First, here’s my “main”:

        Status  '1st Floor / Living + Dining Room / LR Table Lamp' is Off
    And $LR_Lamp_Lockout is 0
    And Status  '1st Floor / Living + Dining Room / ZW 005 Binary Sensor' is On
    And From    Sunset 
        To      Sunrise (next day)
    And $LR_Lamp_Motion_Working is not 0
    And Status  '1st Floor / Living + Dining Room / LR Floor Lamp' is Off
    And Status  '1st Floor / Living + Dining Room / Dining Light' is Off
    And Status  '1st Floor / Living + Dining Room / LR Light' is Off
        Set '1st Floor / Living + Dining Room / LR Table Lamp' 22%
        Set '1st Floor / Living + Dining Room / LR S Remote (Table Lamp)' 22%
   - No Actions - (To add one, press 'Action')

So, let’s look at the conditions. This program triggers if the lamp is off, the sensor is on, the time is between sunset and sunrise, and three other lights are off. (I will explain the lockout and motion_working variables later). If this is the case, it sets the lamp to 22% (and also informs the wall switch for it that the lamp is at 22%). I pick this precise value because it is unlikely I would manually set it via the wall switch, and therefore “lamp is 22% bright” doubles as a “lamp was turned on by this program” flag.

So this is the turning the lamp on bit. Let’s look at the program that turns it back off later:

        Status  '1st Floor / Living + Dining Room / LR Table Lamp' is 22%
    And (
             Status  '1st Floor / Living + Dining Room / ZW 005 Binary Sensor' is Off
          Or $LR_Lamp_Motion_Working is 0
          Or Status  '1st Floor / Living + Dining Room / LR Light' is not Off
          Or Status  '1st Floor / Living + Dining Room / LR Floor Lamp' is not Off
          Or Status  '1st Floor / Living + Dining Room / Dining Light' is not Off
        Wait  15 seconds
        Set '1st Floor / Living + Dining Room / LR S Remote (Table Lamp)' Off
        Set '1st Floor / Living + Dining Room / LR Table Lamp' Off
   - No Actions - (To add one, press 'Action')

This is using a sensor with a 4-minute timeout, so the lamp will always be on for at least 4 minutes and 15 seconds. This program runs if the lamp is still at the program-set level (so if, for instance, I turned the lamp full on, the program does nothing to override my setting.) Then, it looks for a condition to trigger turning the lamp off, which could be any of the sensor indicating no more motion, another program detecting it’s lost communication with the sensor, or somebody turning on one of the bigger lights in the room. Then it simply sets the light (and the wall switch controlling it) to off.

There are a couple more bits to this puzzle. What if the system turned the lamp on, but I really want it off? If I walked up to the wall switch and pushed “off”, the lamp would go off. And then, a couple seconds later, come back on, since the state of the system met the conditions for the lamp-on program. So we need a lockout that prevents this from happening. Here’s my “trigger lockout” program:

        Control '1st Floor / Living + Dining Room / LR Table Lamp' is switched Off
     Or Control '1st Floor / Living + Dining Room / LR Table Lamp' is switched Fast Off
     Or Control '1st Floor / Living + Dining Room / LR S Remote (Table Lamp)' is switched Fast Off
     Or Control '1st Floor / Living + Dining Room / LR S Remote (Table Lamp)' is switched Off
        $LR_Lamp_Lockout += 1
        Run Program 'LR Lamp Clear Lockout' (If)
   - No Actions - (To add one, press 'Action') 

So if someone pushes the “off” button at the lamp switch box at the outlet it’s plugged into (unlikely), or at the wall switch, it increments the lockout variable and runs another program. This other program is unique in that it is flagged “disabled”, meaning it is never run automatically by the system, only when called by another program. Here’s the clear lockout program:

        $LR_Lamp_Lockout > 0
        Wait  5 minutes 
        $LR_Lamp_Lockout  = 0
   - No Actions - (To add one, press 'Action')

Thus by pushing the “off” button on the switch, the motion-triggered program won’t turn the lamp back on for at least 5 minutes.

Before I had reliable Z-Wave communication to the device, I had some times where it would simply drop off the Z-Wave network until a reboot. This was particularly annoying if it occured after having detected motion, since the state of the sensor in the ISY controller is simply whatever state it last received. Therefore, I wrote this program to check if it believes the motion sensor is working:

        Status  '1st Floor / Living + Dining Room / ZW 005 Binary Sensor' is On
        $LR_Lamp_Motion_Working  = 1
        Wait  1 hour 
        $LR_Lamp_Motion_Working  = 0
        $LR_Lamp_Motion_Working  = 1

We don’t really care if the motion sensor is broken when the status is off; all that happens then is no lamp turns on. So this program activates when the status is set to on. It flips the working flag to true, then waits for an hour. If the sensor shows no motion within that hour, the program skips to the else (keeping the flag true). But if it is still on after an hour, it decides “it must not be working” and sets the working flag to false. You can see that flag used in the other programs’ logic. But because of the Else, which is run whenever the conditions that caused the Then clause to run become false, as soon as the system receives “no motion”, it will flag the sensor as working again.

The final piece to this puzzle is a program flagged to run at boot time of the controller:

   - No Conditions - (To add one, press 'Schedule' or 'Condition')
        $LR_Lamp_Motion_Working  = 1
   - No Actions - (To add one, press 'Action')

This simply initializes the motion working variable to a known state.


The Insteon KeypadLinc is a nice device. It can control a single load directly, but all 8 buttons are fully responsive in the Insteon system. They each also have individually-addressible LED backlights. They are commonly used to do things like “ALL OFF”, “TV” (to set lights for watching TV), “AWAY” (to set the system for everyone being out of the house for awhile), etc. They are the size of a regular Decora switch, and I have installed one already, but haven’t programmed much of it yet.


The ISY has an extensive REST API, which I’ve used to integrate it a bit with my Debian systems. More on that another time.

Mobile app

Mobile apps are a common thing people look for in these systems. You can’t use the Insteon app with the ISY, but they recommend Mobilinc Pro. It does the job. Mobilinc tries to sell a $10/mo connection service, which is totally unnecessary if you can figure out SSL, and has on-screen buttons to bypass, but judging by the Google Play reviews, a lot of people thought they had to pay for that and uninstalled it afterwards.

Future directions

Many people put in electronic door locks. I don’t plan to do that. I do plan to have the house systems be aware of the general state of things (is the house empty? is everyone asleep?) and do appropriate things with lighting and HVAC. I don’t really expect the savings in power for lighting control to pay for the system anytime soon. However, if it can achieve some savings in heating and cooling, it may well be able to pay for itself in a few years. So my big next step is thermostats that can integrate with all this.

I have had a water mess in my basement before, and water leak sensors are a very common item people deploy in these setups. I certainly plan to add a few of them.

Door-open sensors are also useful; they trigger more instantly and reliably than motion detectors and can be used in some nice ways (is it after dark and the door is opening when the house is vacant? If so, turn on the light nearby in case their hands are full.)


Some issues I ran into so far are already discussed above. One other major one involves SSL on the ISY-994i Pro. The method for adding SSL keys is cumbersome, but the processor on the device — which appears to run some sort of Java — is just not up to working with SSL. Apparently they only recently got it fast enough to work with 2048-bit keys. This is rather undocumented, though, so I obtained a cert for a 4096-bit key, my usual. Attempting to connect to the box with SSL appeared to hang not just that but confuse a lot of other things on it as well. Turned out it wasn’t hung; it just too a minute and 45 seconds to complete the SSL handshake. Moral of the story: use 2048-bit keys, or stunnel4 or some such to re-wrap the SSL communications with a stronger key.

The KeypadLinc backlights can be completely shut off, and both their on and off levels can be customized. I have it set to shut off the backlight during the day and turn it on at night. The wall switches, however, can’t have their brightness status LED bar entirely shut off. They can be made dim, but don’t ever go away. That’s rather annoying.

Also annoying is that Insteon doesn’t make switches in the traditional toggle switch style in colors other than white. As our house had mostly black switches, I was forced into the Decora style.

Overall thoughts

This has been a great learning experience for me in a number of ways. I have only begun to tap what the system can do, and the real benefits will probably come once I get the heating and cooling into the mix. It’s quite a nice way for a geek to go, and the improvements in lighting have also been popular with everyone else in the house.

First Steps with Home Automation and LED Lighting

I’ve been thinking about home automation — automating lights, switches, thermostats, etc. — for years. Literally decades, in fact. When I was a child, my parents had a RadioShack X10 control module and one or two target devices. I think I had fun giving people a “light show” turning on or off one switch and one outlet remotely.

But I was stuck — there are a daunting number of standards for home automation these days. Zigbee, UPB, Z-Wave, Insteon, and all sorts of Wifi-enabled things that aren’t really compatible with each other (hellooooo, Nest) or have their own “ecosystem” that isn’t all that open (helloooo, Apple). Frankly I don’t think that Wifi is a great home automation protocol; its power drain completely prohibits it being used in a lot of ways.

Earlier this month, my awesome employer had our annual meeting and as part of that our technical teams had some time for anyone to talk about anything geeky. I used my time to talk about flying quadcopters, but two of my colleagues talked about home automation. I had enough to have a place to start, and was hooked.

People use these systems to do all sorts of things: intelligently turn off lights when rooms aren’t occupied, provide electronic door locks (unlockable via keypad, remote, or software), remote control lighting and heating/cooling via smartphone apps, detect water leakage, control switches with awkward wiring environments, buttons to instantly set multiple switches to certain levels for TV watching, turning off lights left on, etc. I even heard examples of monitoring a swamp cooler to make sure it is being used correctly. The possibilities are endless, and my geeky side was intrigued.

Insteon and Z-Wave

Based on what I heard from my colleagues, I decided to adopt a hybrid network consisting of Insteon and Z-Wave.

Both are reliable protocols (with ACKs and retransmit), so they work far better than X10 did. Both have all sorts of controls and sensors available (browse around on for some ideas).

Insteon is a particularly interesting system — an integrated dual-mesh network. It has both powerline and RF signaling, and most hardwared Insteon devices act as repeaters for both the wired and RF network simultaneously. Insteon packets contain a maximum hop count that is decremented after each relay, and the packets repeat in such as way that they collide and strengthen one another. There is no need to maintain routing tables or anything like that; it simply scales nicely.

This system addresses all sorts of potential complexities. It addresses the split-phase problem of powerline-only systems, but using an RF bridge. It addresses long distances and outbuildings by using the powerline signaling. I found it to work quite well.

The downside to Insteon is that all the equipment comes from one vendor: Insteon. If you don’t like their thermostat or motion sensor, you don’t have any choice.

Insteon devices can be used entirely without a central controller. Light switches can talk to each other directly, and you can even set them up so that one switch controls dozens of others, if you have enough patience to go around your house pressing tiny “set” buttons.

Enter Z-Wave. Z-Wave is RF-only, and while it is also a mesh network, it is source-routed, meaning that if you move devices around, you have to “heal” your network as all your nodes have to re-learn the path to each other. It also doesn’t have the easy distance traversal of Insteon, of course. On the other hand, hundreds of vendors make Z-Wave products, and they mostly interoperate well. Z-Wave is said to scale practically to maybe two or three dozen devices, which would have been an issue for me, buut with Insteon doing the heavy lifting and Z-Wave filling in the gaps, it’s worked out well.

Controlling it all

While both Insteon (and, to a certain extent, Z-Wave) devices can control each other, to really spread your wings, you need more centralized control. This lets you have programs that do things like “if there’s motion in the room on a weekday and it’s dark outside, then turn on a light, and turn it back off 5 minutes later.”

Insteon has several options. One, you can buy their “power line modem” (PLM). This can be hooked up to a PC to run either Insteon’s proprietary software, or something open-source like MisterHouse, written in Perl. Or you can hook it up to a controller I’ll mention in a minute. Those looking for a fairly simpe controller might get the Insteon 2242-222 Hub, which has the obligatory smartphone app and basic schedules.

For more sophisticated control, my friend recommended the ISY-994i controller. Not only does it have a lot more capable programming language (though still frustrating), it supports both Insteon and Z-Wave in an integrated box, and has a comprehensive REST API for integrating with other things. I went this route.

First step: LED lighting

I began my project by replacing my light bulbs with LEDs. I found that I could get Cree 4-Flow 60W equivs for $10 at Home Depot. They are dimmable, a key advantage over CFL, and also maintain their brightness throughout their life far better. As I wanted to install dimmer switches, I got a combination of Cree 60W bulbs, Cree TW bulbs (which have a better color spectrum coverage for more true colors), and Cree 100W equiv bulbs for places I needed more coverage. I even put in a few LED flood lights to replace my can lights.

Overall I was happy with the LEDs. They are a significant improvement over the CFLs I had been using, and use even less power to boot. I have had issues with three Cree bulbs, though: one arrived broken, and two others have had issues such as being quite dim. They have a good warranty, but it seems their QA could be better. Also, they can have a tendency to flickr when dimmed, though this plagues virtually all LED bulbs.

I had done quite a bit of research. CNET has some helpful brightness guides, and Insteon has a color temperature chart. CNET also had a nifty in-depth test of LED bulbs.

Second step: switches

Once the LED bulbs were in place, I was then able to start installing smart switches. I picked up Insteon’s basic switch, the SwitchLinc 2477D at Menard’s. This is a dimmable switch and requires a neutral wire in the box, but acts as a dual-band repeater for the system as well.

The way Insteon switches work, they can be standalone, or controllers, responders, or both in a “scene”. A scene is where multiple devices act together. You can create virtual 3-way switches in a scene, or more complicated things where different lights are turned on at different levels.

Anyhow, these switches worked out quite well. I have a few boxes where there is no neutral wire, so I had to use the Insteon SwitchLinc 2474D in them. That switch is RF-only and is supposed to have a minimum load of 20W, though they seemed to work OK — albeit with limited range and the occasional glitch — with my LEDs. There is also the relay-based SwitchLinc 2477S for use with non-dimmable lights, fans, etc. You can also get plug-in modules for controlling lamps and such.

I found the Insteon devices mostly lived up to their billing. Occasionally I could provoke a glitch by changing from dimming to brightening in rapid succession on a remote switch controlling a load on a distant one. Twice I had to power cycle an Insteon switch that got confused (rather annoying when they’re hardwared). Other than that, though, it’s been solid. From what I gather, this stuff isn’t ever quite as reliable as a 1950s mechanical switch, but at least in this price range, this is about as good as it gets these days.

Well, this post got quite long, so I will have to follow up with part 2 in a little while. I intend to write about sensors and the Z-Wave network (which didn’t work quite as easily as Insteon), as well as programming the ISY and my lessons learned along the way.

My boys love 1986 computing

Yesterday, Jacob (age 8) asked to help me put together a 30-year-old computer from parts in my basement. Meanwhile, Oliver (age 5) asked Laura to help him learn cursive. Somehow, this doesn’t seem odd for a Saturday at our place.

2014-11-22 18.58.36

Let me tell you how this came about.

I’ve had a project going on for a while now to load data from old floppies. It’s been fun, and had a surprise twist the other day: my parents gave me an old TRS-80 Color Computer II (aka “CoCo 2”). It was, in fact, my first computer, one they got for me when I was in Kindergarten. It is nearly 30 years old.

I have been musing lately about the great disservice Apple did the world by making computers easy to learn — namely the fact that few people ever bother to learn about them. Who bothers to learn about them when, on the iPhone for instance, the case is sealed shut, the lifespan is 1 or 2 years for many purchasers, and the platform is closed in lots of ways?

I had forgotten how finicky computers used to be. But after some days struggling with IDE incompatibilities, booting issues, etc., when I actually managed to get data off a machine that had last booted in 1999, I had quite the sense of accomplishment, which I rarely have lately. I did something that was hard to do in a world where most of the interfaces don’t work with equipment that old (even if nominally they are supposed to.)

The CoCo is one of those computers normally used with a floppy drive or cassette recorder to store programs. You type DIR, and you feel the clack of the drive heads through the desk. You type CLOAD and you hear the relay click closed to turn on the tape motor. You wiggle cables around until they make contact just right. You power-cycle for the times when the reset button doesn’t quite do the job. The details of how it works aren’t abstracted away by innumerable layers of controllers, interfaces, operating system modules, etc. It’s all right there, literally vibrating your desk.

So I thought this could be a great opportunity for Jacob to learn a few more computing concepts, such as the difference between mass storage and RAM, plus a great way to encourage him to practice critical thinking. So we trekked down to the basement and came up with handfulls of parts. We brought up the computer, some joysticks, all sorts of tangled cables. We needed adapters, an old TV. Jacob helped me hook everything up, and then the moment of truth: success! A green BASIC screen!

I added more parts, but struck out when I tried to connect the floppy drive. The thing just wouldn’t start up right whenever the floppy controller cartridge was installed. I cleaned the cartridge. I took it apart, scrubbed the contacts, even did a re-seat of the chips. No dice.

So I fired up my CoCo emulator (xroar), and virtually “saved” some programs to cassette (a .wav file). I then burned those .wav files to an audio CD, brought up an old CD player from the basement, connected the “cassette in” plug to the CD player’s headphone jack, and presto — instant programs. (Well, almost. It takes a couple of minutes to load a program from audio codes.)

The picture above is Oliver cackling at one of the very simplest BASIC programs there is: “number find.” The computer picks a random number between 1 and 2000, and asks the user to guess it, giving a “too low” or “too high” clue with each incorrect guess. Oliver delighted in giving invalid input (way too high numbers, or things that weren’t numbers at all) and cackled at the sarcastic error messages built into the program. During Jacob’s turn, he got very serious about it, and is probably going to be learning about how to calculate halfway points before too long.

But imagine my pride when this morning, Jacob found the new CD I had made last night (correcting a couple recordings), found my one-line instruction on just part of how to load a program, and correctly figured out by himself all the steps to do in order (type CLOAD on the CoCo, advance the CD to the proper track, press play on the player, wait for it to load on the CoCo, then type RUN).

I ordered a replacement floppy controller off eBay tonight, and paid $5 for a coax adapter that should fix some video quality issues. I rescued some 5.25″ floppies from my trash can from another project, so they should have plenty of tools for exploration.

It is so much easier for them to learn how a disk drive works, and even what the heck a track is, when you can look at a floppy drive with the cover off and see the heads move. There are other things we can do with more modern equipment — Jacob has shown a lot of interest in Arduino projects — but I have so far drawn a blank on ways to really let kids discover how a modern PC (let alone a modern phone or tablet) works.

Update Nov. 24: Every so often, the world surprises me by deciding to, well, read one of my random blog posts. For the benefit of those of you that don’t already know my boys, you might want to know that among their common play activites are turning trees into pretend trains, typing at a manual typewriter, reading, writing their own books, using a cassette recorder, building a PC and learning to use bash or xmonad, making long paper tapes with an adding machine, playing records on a record player, building electric gizmos, and even making mud balls.

I am often asked about the role of the computer in the lives, given that my hobby and profession involves computers. The answer: less than that of most of their peers. I look for opportunities for them to learn by doing, discovering, playing, or imagining. I make no presumption that they will develop the passion for computers that I did. What I want is for them to have the curiosity and drive to learn everything there is to know about whatever they do develop a passion for, so they will be great at it.

Debian – A plea to worry about what matters, and not take ourselves too seriously

I posted this on debian-devel today. I am also posting it here, because I believe it is important to more than just Debian developers.

Good afternoon,

This message comes on the heels of Sam Hartman’s wonderful plea for compassion [1] and the sad news of Joey Hess’s resignation from Debian [2].

I no longer frequently post to this list, but when you’ve been a Debian developer for 18 years, and still care deeply about the community and the project, perhaps you have a bit of perspective to share.

Let me start with this:

Debian is not a Free Software project.

Debian is a making-the-world-better project, a caring for people project, a freedom-spreading project. Free Software is our tool.

As many of you, hopefully all of you, I joined Debian because I enjoyed working on this project. We all did, didn’t we? We joined Debian because it was fun, because we were passionate about it, because we wanted to make the world a better place and have fun doing it.

In short, Debian is life-giving, both to its developers and its users.

As volunteers, it is healthy to step back every so often, and ask ourselves two questions: 1) Is this activity still life-giving for me? 2) Is it life-giving for others?

I have my opinions about init. Strong ones, in fact. [3] They’re not terribly relevant to this post. Because I can see that they are not really all that relevant.

14 years ago, I proposed what was, until now anyhow, one of the most controversial GRs in Debian history. It didn’t go the way I hoped. I cared about it deeply then, and still care about the principles.

I had two choices: I could be angry and let that process ruin my enjoyment of Debian. Or I could let it pass, and continue to have fun working on a project that I love. I am glad I chose the latter.

Remember, for today, one way or another, jessie will still boot.

18 years ago when I joined Debian, our major concerns were helping newbies figure out how to compile their kernels, finding manuals for monitors so we could set the X modelines properly, finding some sort of Free web browser, finding some acceptable Office-type software.

Wow. We WON, didn’t we? Not just Debian, but everyone. Freedom won.

I promise you – 18 years from now, it will not matter what init Debian chose in 2014. It will probably barely matter in 3 years. This is not key to our goals of making the world a better place. Jessie will still boot. I say that even though my system runs out of memory every few days because systemd-logind has a mysterious bug [4]. It will be fixed. I say that even though I don’t know what init system it will use, or how much choice there will be. I say that because it is simply true. We are Debian. We will make it work, one way or another.

I don’t post much on this list anymore because my personal passion isn’t with posting on this list anymore. I make liberal use of my Delete Thread keybinding on -vote these days, because although I care about the GR, I don’t care about it enough to read all the messages about it. I have not yet decided if I will spend the time researching it in order to vote. Instead of debating the init GR, sometimes I sit on the sofa with my wife. Sometimes I go out and fly the remote-control airplane I’m learning to fly. Sometimes I repair my plane after a flight that was shorter than planned. Sometimes I play games with my boys, or help them with homework, or share my 8-year-old’s delight as a text file full of facts about the Titanic that he wrote in Emacs comes spitting out of the printer. Sometimes I write code or play with the latest Linux filesystems or build a new server for my basement.

All these things matter more to me than init. I have been using Debian at home for almost 20 years, at various workplaces for almost that long, and it is not going to stop being a part of my life any time soon. Perhaps I will have to learn how to administer a new init system. Well, so be it; I enjoy learning new things. Or perhaps I will have to learn to live with some desktop limitations with an old init system. Well, so be it; it won’t bother me much anyhow. Either way, I’m still going to be using what is, to me, the best operating system in the world, made by one of the world’s foremost Freedom projects.

My hope is that all of you may also have the sense of peace I do, that you may have your strong convictions, but may put them all in perspective. That we as a project realize that the enemy isn’t the lovers of the other init, but the people that would use law and technology to repress people all over the world. We are but one shining beacon on a hill, but the world will be worse off if our beacon winked out.

My plea is that we each may get angry at what matters, and let go of the smaller frustrations in life; that we may each find something more important than init/systemd to derive enjoyment and meaning from. [5] May you each find that airplane to soar freely in the skies, to lift your soul so that the joy of using Free Software to make the world a better place may still be here, regardless of what /sbin/init is.



[3] A hint might be that in my more grumpy moments, I realize I haven’t ever quite figured out why the heck this dbus thing is on so many of my systems, or why I have to edit XML to configure it… ;-)

[4] #765870

[5] No disrespect meant to the init/systemd maintainers. Keep enjoying what you do, too!

Update on the systemd issue

The other day, I wrote about my poor first impressions of systemd in jessie. Here’s an update.

I’d like to start with the things that are good. I found the systemd community to be one of the most helpful in Debian, and #debian-systemd IRC channel to be especially helpful. I was in there for quite some time yesterday, and appreciated the help from many people, especially Michael. This is a nontechnical factor, but is extremely important; this has significantly allayed my concerns about systemd right there.

There are things about the systemd design that impress. The dependency system and configuration system is a lot more flexible than sysvinit. It is also a lot more complicated, and difficult to figure out what’s happening. I am unconvinced of the utility of parallelization of boot to begin with; I rarely reboot any of my Linux systems, desktops or servers, and it seems to introduce needless complexity.

Anyhow, on to the filesystem problem, and a bit of a background. My laptop runs ZFS, which is somewhat similar to btrfs in that it’s a volume manager (like LVM), RAID manager (like md), and filesystem in one. My system runs LVM, and inside LVM, I have two ZFS “pools” (volume groups): one, called rpool, that is unencrypted and holds mainly the operating system; and the other, called crypt, that is stacked atop LUKS. ZFS on Linux doesn’t yet have built-in crypto, which is why LVM is even in the picture here (to separate out the SSD at a level above ZFS to permit parts of it to be encrypted). This is a bit of an antiquated setup for me; as more systems have AES-NI, I’m going to everything except /boot being encrypted.

Anyhow, inside rpool is the / filesystem, /var, and /usr. Inside /crypt is /tmp and /home.

Initially, I tried to just boot it, knowing that systemd is supposed to work with LSB init scripts, and ZFS has init scripts with carefully-planned dependencies. This was evidently not working, perhaps because /lib/systemd/systemd/ It turns out that systemd has a few assumptions that turn out to be less true with ZFS than otherwise. ZFS filesystems are normally not mounted via /etc/fstab; a ZFS pool has internal properties about which dataset gets mounted where (similar to LVM’s actions after a vgscan and vgchange -ay). Even though there are ordering constraints in the units, systemd is writing files to /var before /var gets mounted, resulting in the mount failing (unlike ext4, ZFS by default will reject an attempt to mount over a non-empty directory). Partly this due to the debian-fixup.service, and partly it is due to systemd reacting to udev items like backlight.

This problem was eventually worked around by doing zfs set mountpoint=legacy rpool/var, and then adding a line to fstab (“rpool/var /var zfs defaults 0 2”) for /var and its descendent filesystems.

This left the problem of /tmp; again, it wasn’t getting mounted soon enough. In this case, it required crypttab to be processed first, and there seem to be a lot of bugs in the crypttab processing in systemd (more on that below). I eventually worked around that by adding to the zfs-import-cache.service file. For /tmp, it did NOT work to put it in /etc/fstab, because then it tried to mount it before starting cryptsetup for some reason. It probably didn’t help that the system’s cryptdisks.service is a symlink to /dev/null, a fact I didn’t realize until after a lot of needless reboots.

Anyhow, one thing I stumbled across was poor console control with systemd. On numerous occasions, I had things like two cryptsetup processes trying to read a password, plus an emergency mode console trying to do so. I had this memorable line of text at one point:

(or type Control-D to continue): Please enter passphrase for disk athena-crypttank (crypt)! [ OK ] Stopped Emergency Shell.

And here we venture into unsatisfying territory with systemd. One answer to this in IRC was to install plymouth, which apparently serializes console I/O. However, plymouth is “an attractive boot animation in place of the text messages that normally get shown.” I don’t want an “attractive boot animation”. Nevertheless, neither systemd-sysv nor cryptsetup depends on plymouth, so by default, the prompt for a password at boot is obscured by various other text.

Worse, plymouth doesn’t support serial consoles, so at the moment booting a system that uses LUKS with systemd over a serial console is a matter of blind luck of typing the right password at the right time.

In the end, though, the system booted and after a few more tweaks, the backlight buttons do their thing again. Whew!

Update 2014-10-13: uau pointed out that Plymouth is more than a bootsplash, and can work with serial consoles, despite the description of the package. I stand corrected on that. (It is still the case, however, that packages don’t depend on it where they should, and the default experience for people using cryptsetup is not very good.)

First impressions of systemd, and they’re not good

Well, I finally bit the bullet. My laptop, which runs jessie, got dist-upgraded for the first time in a few months. My brightness keys stopped working, and it no longer would suspend to RAM when the lid was closed, and upon chasing things down from XFCE to policykit, eventually it appears that suddenly major parts of the desktop breaks without systemd in jessie. Sigh.

So apt-get install systemd-sysv (and watch sysvinit-core get uninstalled) and reboot.

Only, my system doesn’t come back up. In fact, over several hours of trying to make it boot with systemd, it failed in numerous spectacular and hilarious (or, would be hilarious if my laptop would boot) ways. I had text obliterating the cryptsetup password prompt almost every time. Sometimes there were two processes trying to read a cryptsetup password at once. Sometimes a process was trying to read that while another one was trying to read an emergency shell password. Many times it tried to write to /var and /tmp before they were mounted, meaning they *wouldn’t* mount because there was stuff there.

I noticed it not doing much with ZFS, complaining of a dependency loop between zfs-mount and $local-fs. I fixed that, but it still wouldn’t boot. In fact, it simply hung after writing something about wall passwords.

I’ve dug into systemd, finding a “unit generator for fstab” (whatever the hack that is, it’s not at all made clear by systemd-fstab-generator(8)).

In some cases, there’s info in journalctl, but if I can’t even get to an emergency mode prompt, the practice of hiding all stdout and stderr output is not all that pleasant.

I remember thinking “what’s all the flaming about?” systemd wasn’t my first choice (I always thought “if it ain’t broke, don’t fix it” about sysvinit), but basically ignored the thousands of messages, thinking whatever happens, jessie will still boot.

Now I’m not so sure. Even if the thing boots out of the box, it seems like the boot process with systemd is colossally fragile.

For now, at least zfs rollback can undo upgrades to 800 packages in about 2 seconds. But I can’t stay at some early jessie checkpoint forever.

Have we made a vast mistake that can’t be undone? (If things like even *brightness keys* now require systemd…)

Agile Is Dead (Long Live Agility)

In an intriguing post, PragDave laments how empty the word “agile” has become. To paraphrase, I might say he’s put words to a nagging feeling I’ve had: that there are entire books about agile, conferences about agile, hallway conversations I’ve heard about whether somebody is doing this-or-that agile practice correctly.

Which, when it comes down to it, means that they’re not being agile. If process and tools, even if they’re labeled as “agile” processes and tools, are king, then we’ve simply replaced one productivity-impairing dictator with another.

And he makes this bold statement:

Here is how to do something in an agile fashion:

What to do:

  • Find out where you are
  • Take a small step towards your goal
  • Adjust your understanding based on what you learned
  • Repeat

How to do it:

When faced with two or more alternatives that deliver roughly the same value, take the path that makes future change easier.

Those four lines and one practice encompass everything there is to know about effective software development.

He goes on to dive into that a bit, of course, but I think this man has a rare gift of expressing something complicated so succinctly. I am inclined to believe he is right.

Backing up every few minutes with simplesnap

I’ve written a lot lately about ZFS, and one of its very nice features is the ability to make snapshots that are lightweight, space-efficient, and don’t hurt performance (unlike, say, LVM snapshots).

ZFS also has “zfs send” and “zfs receive” commands that can send the content of the snapshot, or a delta between two snapshots, as a data stream – similar in concept to an amped-up tar file. These can be used to, for instance, very efficiently send backups to another machine. Rather than having to stat() every single file on a filesystem as rsync has to, it sends effectively an intelligent binary delta — which is also intelligent about operations such as renames.

Since my last search for backup tools, I’d been using BackupPC for my personal systems. But since I switched them to ZFS on Linux, I’ve been wanting to try something better.

There are a lot of tools out there to take ZFS snapshots and send them to another machine, and I summarized them on my wiki. I found zfSnap to work well for taking and rotating snapshots, but I didn’t find anything that matched my criteria for sending them across the network. It seemed par for the course for these tools to think nothing of opening up full root access to a machine from others, whereas I would much rather lock it down with command= in authorized_keys.

So I wrote my own, called simplesnap. As usual, I wrote extensive documentation for it as well, even though it is very simple to set up and use.

So, with BackupPC, a backup of my workstation took almost 8 hours. (Its “incremental” might take as few as 3 hours) With ZFS snapshots and simplesnap, it takes 25 seconds. 25 seconds!

So right now, instead of backing up once a day, I back up once an hour. There’s no reason I couldn’t back up every 5 minutes, in fact. The data consumes less space, is far faster to manage, and doesn’t require a nightly hours-long cleanup process like BackupPC does — zfs destroy on a snapshot just takes a few seconds.

I use a pair of USB disks for backups, and rotate them to offsite storage periodically. They simply run ZFS atop dm-crypt (for security) and it works quite well even on those slow devices.

Although ZFS doesn’t do file-level dedup like BackupPC does, and the lz4 compression I’ve set ZFS to use is less efficient than the gzip-like compression BackupPC uses, still the backups are more space-efficient. I am not quite sure why, but I suspect it’s because there is a lot less metadata to keep track of, and perhaps also because BackupPC has to store a new copy of a file if even a byte changes, whereas ZFS can store just the changed blocks.

Incidentally, I’ve packaged both zfSnap and simplesnap for Debian and both are waiting in NEW.

Migrated from Hetzner to OVH hosting

Since August 2011, my sites such as have been running on a Xen-backed virtual private server (VPS) at Hetzner Online, based in Germany. I had what they called their VQ19 package, which included 2GB RAM, 80GB HDD, 100Mb NIC and 4TB transfer.

Unlike many other VPS hosts, I never had performance problems. However, I did sometimes have hardware problems with the host, and it could take hours to resolve. Their tech support only works business hours German time, which was also a problem.

Meanwhile, OVH, a large European hosting company, recently opened a datacenter in Canada. Although they no longer offer their value-line Kimsufi dedicated servers there — starting at $11.50/mo — they do offer their midrange SoYouStart servers there. $50/mo gets a person a 4-core 3.2GHz Xeon server with 32GB RAM, 2x2TB SATA HDD, 200Mbps bandwidth. Not bad at all! The Kimsufi options are still good for lower-end needs as well.

I signed up for one of the SoYouStart servers. I’ve been pleased with my choice to migrate, and at the possibilities that having hardware like that at my disposal open up, but it is not without its downside.

The primary downside is lack of any kind of KVM console. If the server doesn’t boot, I can’t see the Grub error message (or whatever) behind it. They do provide hardware support and automatic technician dispatching when the server isn’t pingable, but… they state they have no KVM access at all. They support many OS flavors, and have a premade image for them, but there is no using a custom ISO to install; if you want ZFS on Linux, for instance, you can’t very easily build it into root.

My server was promised within 72 hours, but delivered much quicker: within about 1. I had two times they said they had to replace a motherboard within the first day; once they did it in 30 minutes, and the other took them 2.5 hours for some reason. They do have phone support, which answers almost immediately, but the people there are not the people actually in the datacenter. It was frustrating with a server down for hours and nobody really commenting on what was going on.

The server performs quite well, and after the initial issues, I’ve been happy.

I was initially planning an all-ZFS installation. SoYouStart does offer a rescue environment, but it doesn’t support ZFS, so I figured I better stick with an ext4 root at least. The default Debian install uses RAID1 on md-raid, with a 20GB root partition and the rest of the 2TB drive in /home, and then a swap partition on each drive (mysteriously NOT in the RAID!) So I broke the mirror on /home and converted those into the two legs of a mirrored vdev for a zpool.

I run all of the real work inside KVM VMs, so that should minimize the number of times I have to do anything to the root filesystem that could cause trouble.

SoYouStart includes 100GB of space on a separate FTP server for backup purposes. I have scripts that upload nightly tarballs of the root filesystem, plus full “zfs send” streams of everything else. Every hour, it uploads an incremental “zfs send” stream as well. This all works quite nicely; even if the machine is a complete loss, I’d never lose more than an hour’s work, and could restore it completely from a rescue environment. Very nice!

I’ll write more in a few days about the ZFS setup I’m using, and some KVM discoveries as well.

VirtFS isn’t quite ready

Despite claims to the contrary [PDF], VirtFS — the 9P-based virtio KVM/QEMU layer designed to pass through a host’s filesystem to the guest — is quite slow. I have yet to get it to perform at even 1/10 the speed of the virtual block device (VBD). That’s unfortunate, because in theory it should be significantly faster. At this rate, I suspect even NFS will be significantly faster.

Beyond that, it seems impossible to use VirtFS as the root filesystem in a VM, at least with Debian; initramfs-tools doesn’t know how to build an initrd in that situation, and the support is just not there.

It would make a great combination with btrfs or zfs, but unfortunately looks to be just not ready yet.