Category Archives: Linux

Why and how to run ZFS on Linux

I’m writing a bit about ZFS these days, and I thought I’d write a bit about why I am using it, why it might or might not be interesting for you, and what you might do about it.

ZFS Features and Background

ZFS is not just a filesystem in the traditional sense, though you can use it that way. It is an integrated storage stack, which can completely replace the need for LVM, md-raid, and even hardware RAID controllers. This permits quite a bit of flexibility and optimization not present when building a stack involving those components. For instance, if a drive in a RAID fails, it needs only rebuild the parts that have actual data stored on them.

Let’s look at some of the features of ZFS:

  • Full checksumming of all data and metadata, providing protection against silent data corruption. The only other Linux filesystem to offer this is btrfs.
  • ZFS is a transactional filesystem that ensures consistent data and metadata.
  • ZFS is copy-on-write, with snapshots that are cheap to create and impose virtually undetectable performance hits. Compare to LVM snapshots, which make writes notoriously slow and require an fsck and mount to get to a readable point.
  • ZFS supports easy rollback to previous snapshots.
  • ZFS send/receive can perform incremental backups much faster than rsync, particularly on systems with many unmodified files. Since it works from snapshots, it guarantees a consistent point-in-time image as well.
  • Snapshots can be turned into writeable “clones”, which simply use copy-on-write semantics. It’s like a cp -r that completes almost instantly and takes no space until you change it.
  • The datasets (“filesystems” or “logical volumes” in LVM terms) in a zpool (“volume group”, to use LVM terms) can shrink or grow dynamically. They can have individual maximum and minimum sizes set, but unlike LVM, where if, say, /usr gets bigger than you thought, you have to manually allocate more space to it, ZFS datasets can use any space available in the pool.
  • ZFS is designed to run well in big iron, and scales to massive amounts of storage. It supports SSDs as L2 cache and ZIL (intent log) devices.
  • ZFS has some built-in compression methods that are quite CPU-efficient and can yield not just space but performance benefits in almost all cases involving compressible data.
  • ZFS pools can host zvols, a block device under /dev that stores its data in the zpool. zvols support TRIM/DISCARD, so are ideal for storing VM images, as they can instantly release space released by the guest OS. They can also be snapshotted and backed up like the rest of ZFS.

Although it is often considered a server filesystem, ZFS has been used in plenty of other situations for some time now, with ports to FreeBSD, Linux, and MacOS. I find it particularly useful:

  • To have faith that my photos, backups, and paperwork archives are intact. zpool scrub at any time will read the entire dataset and verify the integrity of every bit.
  • I can create snapshots of my system before running apt-get dist-upgrade, making it easy to track down issues or roll back to a known-good configuration. Ideal for people tracking sid or testing. One can also easily simply boot from a previous snapshot.
  • Many scripts exist that make frequent snapshots, and retain the for a period of time as a way of protecting work in progress against an accidental rm. There is no reason not to snapshot /home every 5 minutes, for instance. It’s almost as good as storing / in git.

The added level of security in having cheap snapshots available is almost worth it by itself.

ZFS drawbacks

Compared to other Linux filesystems, there are a few drawbacks of ZFS:

  • CDDL will prevent it from ever being part of the Linus kernel tree
  • It is more RAM-hungry than most, although with tuning it can even run on the Raspberry Pi.
  • A 64-bit kernel is strongly preferred, even in low-memory situations.
  • Performance on many small files may be less than ext4
  • The ZFS cache does not shrink and expand in response to changing RAM usage conditions on the system as well as the normal Linux cache does.
  • Compared to btrfs, ZFS lacks some features of btrfs, such as being able to shrink an existing pool or easily change storage allocation on the fly. On the other hand, the features in ZFS have never caused me a kernel panic, and half the things I liked about btrfs seem to have.
  • ZFS is already quite stable on Linux. However, the GRUB, init, and initramfs code supporting booting from a ZFS root and /boot is less stable. If you want to go 100% ZFS, be prepared to tweak your system to get it to boot properly. Once done, however, it is quite stable.

Converting to ZFS

I have written up an extensive HOWTO on converting an existing system to use ZFS. It covers workarounds for all the boot-time bugs I have encountered as well as documenting all steps needed to make it happen. It works quite well.

Additional Hints

If setting up zvols to be used by VirtualBox or some such system, you might be interested in managing zvol ownership and permissions with udev.

Debian-Live Rescue image with ZFS On Linux; Ditched btrfs

I’m a geek. I enjoy playing with different filesystems, version control systems, and, well, for that matter, radios.

I have lately started to worry about the risks of silent data corruption, and as such, looked to switch my personal systems to either ZFS or btrfs, both of which offer built-in checksumming of all data and metadata. I initially opted for btrfs, because of its tighter integration into the Linux kernel and ability to shrink an existing btrfs filesystem.

However, as I wrote last month, that experiment was not a success. I had too many serious performance regressions and one too many kernel panics and decided it wasn’t worth it. And that the SuSE people got it wrong, deeply wrong, when they declared btrfs ready for production. I never lost any data, to its credit. But it simply reduces uptime too much.

That left ZFS. Before I build a system, I always want to make sure I can repair it. So I started with the Debian Live rescue image, and added the zfsonlinux.org repository to it, along with some key packages to enable the ZFS kernel modules, GRUB support, and initramfs support. The resulting image is described, and can be downloaded from, my ZFS Rescue Disc wiki page, which also has a link to my source tree on github.

In future blog posts in the series, I will describe the process of converting existing Debian installations to use ZFS, of getting them to boot from ZFS, some bugs I encountered along the way, and some surprising performance regressions in ZFS compared to ext4 and btrfs.

Results with btrfs and zfs

The recent news that openSUSE considers btrfs safe for users prompted me to consider using it. And indeed I did. I was already familiar with zfs, so considered this a good opportunity to experiment with btrfs.

btrfs makes an intriguing filesystem for all sorts of workloads. The benefits of btrfs and zfs are well-documented elsewhere. There are a number of features btrfs has that zfs lacks. For instance:

  • The ability to shrink a device that’s a member of a filesystem/pool
  • The ability to remove a device from a filesystem/pool entirely, assuming enough free space exists elsewhere for its data to be moved over.
  • Asynchronous deduplication that imposes neither a synchronous performance hit nor a heavy RAM burden
  • Copy-on-write copies down to the individual file level with cp --reflink
  • Live conversion of data between different profiles (single, dup, RAID0, RAID1, etc)
  • Live conversion between on-the-fly compression methods, including none at all
  • Numerous SSD optimizations, including alignment and both synchronous and asynchronous TRIM options
  • Proper integration with the VM subsystem
  • Proper support across the many Linux architectures, including 32-bit ones (zfs is currently only flagged stable on amd64)
  • Does not require excessive amounts of RAM

The feature set of ZFS that btrfs lacks is well-documented elsewhere, but there are a few odd btrfs missteps:

  • There is no way to see how much space subvolume/filesystem is using without turning on quotas. Even then, it is cumbersome and not reported with df like it should be.
  • When a maxmium size for a subvolume is set via a quota, it is not reported via df; applications have no idea when they are about to hit the maximum size of a filesystem.

btrfs would be fine if it worked reliably. I should say at the outset that I have never lost any data due to it, but it has caused enough kernel panics that I’ve lost count. I several times had a file that produced a panic when I tried to delete it, several times when it took more than 12 hours to unmount a btrfs filesystem, behaviors where hardlink-heavy workloads take days longer to complete than on zfs or ext4, and that’s just the ones I wrote about. I tried to use btrfs balance to change the metadata allocation on the filesystem, and never did get it to complete; it seemed to go into an endless I/O pattern after the first 1GB of metadata and never got past that. I didn’t bother trying the live migration of data from one disk to another on this filesystem.

I wanted btrfs to work. I really, really did. But I just can’t see it working. I tried it on my laptop, but had to turn of CoW on my virtual machine’s disk because of the rm bug. I tried it on my backup devices, but it was unusable there due to being so slow. (Also, the hardlink behavior is broken by default and requires btrfstune -r. Yipe.)

At this point, I don’t think it is really all that worth bothering with. I think the SuSE decision is misguided and ill-informed. btrfs will be an awesome filesystem. I am quite sure it will, and will in time probably displace zfs as the most advanced filesystem out there. But that time is not yet here.

In the meantime, I’m going to build a Debian Live Rescue CD with zfsonlinux on it. Because I don’t ever set up a system I can’t repair.

Why are we still backing up to hardlink farms?

A person can find all sorts of implementations of backups using hardlink trees to save space for incrementals. Some of them are fairly rudimentary, using rsync --link-dest. Others, like BackupPC, are more sophisticated, doing file-level dedup to a storage pool indexed by a hash.

While these are fairly space-efficient, they are really inefficient in other ways, because they create tons of directory entries. It would not be surprising to find millions of directory entries consumed very quickly. And while any given backup set can be deleted without impact on the others, the act of doing so can be very time-intensive, since often a full directory tree is populated with every day’s backup.

Much better is possible on modern filesystems. ZFS has been around for quite awhile now, and is stable on Solaris, FreeBSD and derivatives, and Linux. btrfs is also being used for real workloads and is considered stable on Linux.

Both have cheap copy-on-write snapshot operations that would work well with a simple rsync --inplace to achieve the same effect ad hardlink farms, but without all the performance penalties. When creating and destroying snapshots is a virtually instantaneous operation, and the snapshots work at a file block level instead of an entire file level, and preserve changing permissions and such as well (which rsync --link-dest can have issues with), why are we not using it more?

BackupPC has a very nice scheduler, a helpful web interface, and a backend that doesn’t have a mode to take advantage of these more modern filesystems. The only tool I see like this is dirvish, which someone made patches for btrfs snapshots three years ago that never, as far as I can tell, got integrated.

A lot of folks are rolling a homegrown solution involving rsync and snapshots. Some are using zfs send / btrfs send, but those mechanisms require the same kind of FS on the machine being backed up as on the destination, and do not permit excluding files from the backup set.

Is this an area that needs work, or am I overlooking something?

Incidentally, hats off to liw’s obnam. It doesn’t exactly do this, but sort of implements its own filesystem with CoW semantics.

Voice Keying with bash, sox, and aplay

There are plenty of times where it is nice to have Linux transmit things out a radio. One obvious example is the digital communication modes, where software acts as a sort of modem. A prominent example of this in Debian is fldigi.

Sometimes, it is nice to transmit voice instead of a digital signal. This is called voice keying. When operating a contest, for instance, a person might call CQ over and over, with just some brief gaps.

Most people that interface a radio with a computer use a sound card interface of some sort. The more modern of these have a simple USB cable that connects to the computer and acts as a USB sound card. So, at a certain level, all that you have to do is play sound out a specific device.

But it’s not quite so easy, because there is one other wrinkle: you have to engage the radio’s transmitter. This is obviously not something that is part of typical sound card APIs. There are all sorts of ways to do it, ranging from dedicated serial or parallel port circuits involving asserting voltage on certain pins, to voice-activated (VOX) circuits.

I have used two of these interfaces: the basic Signalink USB and the more powerful RigExpert TI-5. The Signalink USB integrates a VOX circuit and provides cabling to engage the transmitter when VOX is tripped. The TI-5, on the other hand, emulates three USB serial ports, and if you raise RTS on one of them, it will keep the transmitter engaged as long as RTS is high. This is a more accurate and precise approach.

VOX-based voice keying with the Signalink USB

But let’s first look at the Signalink USB case. The problem here is that its VOX circuit is really tuned for digital transmissions, which tend to be either really loud or completely silent. Human speech rises and falls in volume, and it tends to rapidly assert and drop PTT (Push-To-Talk, the name for the control that engages the radio’s transmitter) when used with VOX.

The solution I hit on was to add a constant, loud tone to the transmitted audio, but one which is outside the range of frequencies that the radio will transmit (which is usually no higher than 3kHz). This can be done using sox and aplay, the ALSA player. Here’s my script to call cq with Signalink USB:

#!/bin/bash
# NOTE: use alsamixer and set playback gain to 99
set -e

playcmd () {
        sox -V0 -m "$1" \
           "| sox -V0 -r 44100 $1 -t wav -c 1 -   synth sine 20000 gain -1" \
            -t wav - | \
           aplay -q  -D default:CARD=CODEC
}

DELAY=${1:-1.5}

echo -n "Started at: "
date

STARTTIME=`date +%s`
while true; do
        printf "\r"
        echo -n $(( (`date +%s`-$STARTTIME) / 60))
        printf "m/${DELAY}s: TRANSMIT"
        playcmd ~/audio/cq/cq.wav
        printf "\r"
        echo -n $(( (`date +%s`-$STARTTIME) / 60))
        printf "m/${DELAY}s: off         "
        sleep $DELAY
done

Run this, and it will continuously play your message, with a 1.5s gap in between during which the transmitter is not keyed.

The screen will look like this:

Started at: Fri Aug 24 21:17:47 CDT 2012
2m/1.5s: off

The 2m is how long it’s been going this time, and the 1.5s shows the configured gap.

The sox commands are really two nested ones. The -m causes sox to merge the .wav file in $1 with the 20kHz sine wave being generated, and the entire thing is piped to the ALSA player.

Tweaks for RigExpert TI-5

This is actually a much simpler case. We just replace playcmd as follows:

playcmd () {
        ~/bin/raiserts /dev/ttyUSB1 'aplay -q -D default:CARD=CODEC' < "$1"
}

Where raiserts is a program that simply keeps RTS asserted on the serial port while the given command executes. Here's its source, which I modified a bit from a program I found online:

/* modified from
 * https://www.linuxquestions.org/questions/programming-9/manually-controlling-rts-cts-326590/
 * */
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


static struct termios oldterminfo;


void closeserial(int fd)
{
    tcsetattr(fd, TCSANOW, &oldterminfo);
    if (close(fd) < 0)
        perror("closeserial()");
}


int openserial(char *devicename)
{
    int fd;
    struct termios attr;

    if ((fd = open(devicename, O_RDWR)) == -1) {
        perror("openserial(): open()");
        return 0;
    }
    if (tcgetattr(fd, &oldterminfo) == -1) {
        perror("openserial(): tcgetattr()");
        return 0;
    }
    attr = oldterminfo;
    attr.c_cflag |= CRTSCTS | CLOCAL;
    attr.c_oflag = 0;
    if (tcflush(fd, TCIOFLUSH) == -1) {
        perror("openserial(): tcflush()");
        return 0;
    }
    if (tcsetattr(fd, TCSANOW, &attr) == -1) {
        perror("initserial(): tcsetattr()");
        return 0;
    }
    return fd;
}


int setRTS(int fd, int level)
{
    int status;

    if (ioctl(fd, TIOCMGET, &status) == -1) {
        perror("setRTS(): TIOCMGET");
        return 0;
    }
    status &= ~TIOCM_DTR;   /* ALWAYS clear DTR */
    if (level)
        status |= TIOCM_RTS;
    else
        status &= ~TIOCM_RTS;
    if (ioctl(fd, TIOCMSET, &status) == -1) {
        perror("setRTS(): TIOCMSET");
        return 0;
    }
    return 1;
}


int main(int argc, char *argv[])
{
    int fd, retval;
    char *serialdev;

    if (argc < 3) {
        printf("Syntax: raiserts /dev/ttyname 'command to run while RTS held'\n");
        return 5;
    }
    serialdev = argv[1];
    fd = openserial(serialdev);
    if (!fd) {
        fprintf(stderr, "Error while initializing %s.\n", serialdev);
        return 1;
    }

    setRTS(fd, 1);
    retval = system(argv[2]);
    setRTS(fd, 0);

    closeserial(fd);
    return retval;
}

This compiles to an executable less than 10K in size. I love it when that happens.

So these examples support voice keying both with VOX circuits and with serial-controlled PTT. raiserts.c could be trivially modified to control other serial pins as well, should you have an interface which uses different ones.

Windows & a dying hard disk: Solving with Linux

Today, my workstation sent me this email:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

and then a little later, this one:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Offline uncorrectable sectors

From the hard disk’s SMART data, this is a clue that the drive is failing or will soon. Sigh. Incidentally, if smartmontools isn’t installed on your machine, whether it’s a laptop, desktop, or server, it should be.

Although most of you know I run Linux on the metal on my machines almost exclusively, I do maintain a small drive with a Windows installation that I boot into every few months for various reasons. This is that drive.

The drive is non-redundant (no RAID), and although it is backed up, the backup is made via backuppc from the NTFS filesystem mounted on Linux, and is a partial backup – backing up certain data, not the OS. There are, of course, bare metal Windows backup solutions, but I generally don’t want to back up Windows from within Windows on this machine. Restoring Windows isn’t quite as simple as an mkfs, an untar, and a grub-install, either.

So my first thought is: immediately save whatever of the drive I can. So I ran apt-get install gddrescue to install the GNU ddrescue tool. ddrescue is somewhat similar to dd, but deals much more intelligently with bad blocks on the drive. It will try to read them repeatedly, with decreasing block sizes, in an effort to get every last good byte off the disk that it can. If it ultimately fails to get certain bytes read, it will write placeholder data to the output file in place of the missing data, so that the output file maintains proper size and alignment. It also saves a log file that notes what it found (see info ddrescue for more on that.)

So I created an LVM volume for the purpose (not enough free space on /home, and didn’t want to have to shrink it somehow later), and ran:

ddrescue /dev/sda /mnt/sdasave.ddrescue /mnt/sdasave.logfile

Then I went to dinner.

When I got back, I discovered there were 1 or 2 bad sectors, about halfway through the disk, but everything else was fine. So now, the question became: did I lose any data? If so, what? I needed to know if I had to revert to a backup for anything or not.

To answer THAT question, first I had to figure out the offset of the bad spots on the disk. That’s not too hard; the logfile gives it to me:

# Rescue Logfile. Created by GNU ddrescue version 1.15
# Command line: ddrescue /dev/sda /mnt/sdasave.ddrescue sdasave.logfile
# current_pos  current_status
0x3BBB8BFC00     +
#      pos        size  status
0x00000000  0x3BBB8BF000  +
0x3BBB8BF000  0x00001000  -
0x3BBB8C0000  0x38B5346000  +

what we see is that the bad sector starts at byte 0x3BBB8BF000 (256549580800 decimal) and extends for 0x1000 bytes (4096 decimal). Both the drive and NTFS use 512-byte sectors. So dividing by 512, we get sector 501073400 – 501073407 (4096 bytes is 8 sectors).

As a check, I ran grep sector /var/log/kern.log and turned up a bunch of lines like this:

Jun 14 21:39:11 hephaestus kernel: [35346.929957] end_request: I/O error, dev sda, sector 501073404

Which is within my calculated range.

But this is an absolute sector on the disk. We need the sector within the partition, so for that, we have to enlist fdisk to make that calculation.

fdisk shows, among other things:

Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   976771071   488384512    7  HPFS/NTFS/exFAT

So the Windows partition starts at disk sector 2048.

Let’s just confirm that. If I use dd if=/dev/sda1 bs=512 count=1 | hd | head, I see a line beginning with “.R.NTFS”. Exactly the same as with dd if=/dev/sda bs=512 count=1 skip=2048 | hd | head, so I read the partition table information correctly.

Subtract offset of 2048 from the earlier values, and I get relative sectors 501071352-501071359.

That’s enough to get some solid info from the filesystem via ntfscluster, part of Debian’s ntfs-3g package. I pass -s to it, and ignoring some irrelevant stuff, get my answer:

ntfscluster -s 501071352-501071359 /dev/sda1
Inode 190604 /System Volume Information/{b4816feb-b609-11e1-a908-50e549b934f7}{3808876b-c176-4e48-b7ae-04046e6cc752}/$DATA

I even reran it with a much larger sector range, just to be absolutely sure I had wiggle room in case calculations had an off-by-one error or something somewhere.

This is really great news, because the file in question is pretty much useless – I believe it’s a system restore point, which I won’t be needing anyhow.

So at this point, all that remains is to reinstall this on a different drive. For that, I could just use my ddrescue image. I thought I would take a second image, just to be very extra careful, and use that; I used:

partclone.ntfs --rescue -c -s /dev/sda1 -o sda1.partclone

although ntfsclone would work just as well. This captures only the partition; I’ll need the partition table as well, and perhaps also the space between the partition table and the first partition. I could capture it separately with dd, but it’s already in the ddrescue image, so there’s no need. (GRUB is installed on this drive, but there is no Linux filesystem on it, so it may well exceed the size of the MBR).

Note that for Linux ext[234] filesystems, debugfs can provide the same (and more) info as I got from ntfscluster.

I happen to have a drive of the right size sitting here, which I was about to install in a different machine. So a wipe and a swap and a restore later, and I should be good to go.

This scenario is commonplace enough that I thought I’d post how I dealt with it, in case anyone else ever has hard drive issues.

How to debugging Linux failure to resume from suspend?

I’m running a computer with a Gigabyte Z68A-D3H-B3 motherboard, and have never been able to get it to properly resume from suspend to RAM in Linux. It has worked fine on the rare occasion I’ve tried it in Windows 7.

My somewhat limited usual for debugging aren’t particularly helpful. The system appears to suspend perfectly fine. It just doesn’t resume. To be more precise, when I push the button to resume, the power comes up (fans whir, HDD spins up, etc.) but nothing happens. The USB keyboard and mouse don’t respond, Caps Lock doesn’t toggle any LEDs, it doesn’t respond on the wired LAN, and the display stays off.

Although it’s a desktop, I’d really like to save power on this thing by suspending it when it’s not in use. There’s no sense in wasting power I don’t need to be consuming.

I’ve tried what I used to try on laptops. I tried running in single-user mode, without X, or even the kernel modules for video acceleration loaded. I tried unloading whatever hardware modules I thought I could without completely destabilizing the system. I updated the BIOS to the latest release. I tried various combinations of video tweaks. I tried using s2ram from uswsusp instead of pm-suspend. Nothing made any difference. They all behaved exactly the same.

Googling showed a lot of resources for people that had trouble getting their machines to go to sleep. And also for people whose machines would wake up but just wouldn’t re-activate the display. But precious little for people with my particular symptoms.

What’s a good place to start looking to fix something like this?

Some details…

CPU is Core i5-2400. Kernel is wheezy’s 3.2.0-2-amd64, though this problem has persisted as long as I’ve had this machine, which was running squeeze at install time. Video is NVidia GeForce GTX 560 (GF114). Hard drives are SATA, Ethernet is integrated RTL8111/8168B. Userland is up-to-date amd64 wheezy.

A 4-year-old, Linux command line, and microphone

There are certain times when I’m really glad that we have Linux on the house for our boys to play with. I’ve already written how our 4-year-old Jacob has fun with bash and can chain together commands to draw ASCII animated steam locomotives. Today I thought it might be fun to install cw, a program that can take text on standard input and play it on the console speaker or sound card as Morse code. Just the sort of thing that I could see Jacob eventually getting a kick out of.

But his PC was mute. We opened it up and discovered it didn’t have a console speaker. So we traipsed downstairs, dug out an external speaker, and I figured out how to enable the on-board audio chipset in the BIOS. So now the cw command worked, but also there were a lot of other possibilities. We also brought up a microphone.

While Jacob was busy with other things, I set to work getting things hooked up, volume levels adjusted, and wrote some shell scripts for him. I also printed out this reference sheet for Jacob:

He is good at reading but not so good at spelling. I intentionally didn’t write down what the commands do, hoping that this would provide some avenue for exploration for him. He already is generally familiar with the ones under the quiet category.

I wrote a shell script called “record”. It simply records from the microphone and drops a timestamped WAV file in a holding directory. He can then type “play” to simply play back whatever he recorded most recently. Easy enough.

But what he really wanted was sound for his ASCII steam locomotive. So with the help of a Google search for “steam train mp3”, I wrote a script “ssl” (sound steam locomotive) that starts playing the sound in the background if it isn’t already going, and then runs sl to show the animation. This was a big hit.

I also set it up so he can type “play train” to hear that audio, or “play song” to play our favorite train song (Always a Train in My Dreams by Steve Gillette). Jacob typed that in and sat still for the entire 3 minutes listening to it.

I had to hook up an Ethernet cable to his machine to do all this, and he was very interested that I was hooking his computer up to mine in some way. He thought all the stuff about cables in the walls was quite exciting.

The last thing I did was install flite, a speech synthesis program. I wrote a small shell script called “talk” which reads a line at a time from stdin and invokes flite for each one (to give more instant feedback rather than not starting playback until after having read a large block from stdin). He had some fun hearing it say his name and other favorite words, but predictably the most fun was when he typed gibberish at it, and heard it try to pronounce or spell nonsense words.

In all, he was so excited about this new world of computer sound opened up to him. I’m sure there will be lots of happy experimentation and discovery going on.

Update Feb 10, 2012: I have posted the shell scripts behind this.

A Proud Dad

I saw this on my computer screen the other day, and I’ve got to say it really warmed my heart. I’ll explain below if it doesn’t provoke that reaction for you.

Evidence a 4-year-old has been using my computer

So here’s why that made me happy. Well for one, it was the first time Jacob had left stuff on my computer that I found later. And of course he left his name there.

But moreover, he’s learning a bit about the Unix shell. sl is a command that displays an animated steam locomotive. I taught him how to use the semicolon to combine commands. So he has realized that he can combine calls to sl with the semicolon to get a series of a LOT of steam trains all at once. And was very excited about this discovery.

Also he likes how error messages start with the word “bash”.

Unix Password and Authority Management

One of the things that everyone seems to do different is managing passwords. We haven’t looked at that in quite some time, despite growth both of the company and the IT department.

As I look to us moving some things to the cloud, and shifting offsite backups from carrying tapes to a bank to backups via the Internet, I’m aware that the potential for mischief — whether intentional or not — is magnified. With cloud hosting, a person could, with the press of a button, wipe out the equivalent of racks of machines in a few seconds. With disk-based local and Internet-based offsite backups, the potential for malicious behavior may be magnified; someone could pretty quickly wipe out local and remote backups.

Add to that the mysterious fact that many enterprise-targeted services allow only a single username/password for an account, and make no provision for ACLs to delegate permissions to others. Even Rackspace Cloud has this problem, as do their JungleDisk backup product, and many, many other offsite backup products. Amazon AWS seems to be the only real exception to this rule, and their ACL support is more than a little complicated.

So one of the questions we will have to address is the balance of who has these passwords. Too many people and the probability of trouble, intentional or not, rises. Too few and productivity is harmed, and potentially also the ability to restore. (If only one person has the password, and that person is unavailable, company data may be as well.) The company does have some storage locations, including locked vaults and safe deposit boxes, that no IT people have access to. I am thinking that putting a record of passwords in those locations may be a good first step, as putting the passwords in the control of those that can’t use them seems a reasonable step.

But we’ve been thinking of this as it pertains to our local systems as well. We have, for a number of years now, assigned a unique root password to every server. These passwords are then stored in a password-management tool, encrypted with a master password, and stored on a shared filesystem. Everyone in the department therefore can access every password.

Many places where I worked used this scheme, or some variant of it. The general idea was that if root on one machine was compromised and the attacker got root’s password, it would prevent the person from being able to just try that password on the other servers on the network and achieve a greater level of intrusion.

However, the drawback is that we now have more servers than anyone can really remember the passwords for. So many people are just leaving the password tool running. Moreover, while the attack described above is still possible, these days I worry more about automated intrusion attempts that most likely won’t try that attack vector.

A couple of ways we could go may include using a single root password everywhere, or a small set of root passwords. Another option may be to not log in to root accounts at all — possibly even disabling their password — and requiring the use of user accounts plus sudo. This hasn’t been practical to date. We don’t want to make a dependency on LDAP from a bunch of machines just to be able to use root, and we haven’t been using a tool such as puppet or cfengine to manage this stuff. Using such a tool is on our roadmap and could let us manage that approach more easily. But this approach has risks too. One is that if user accounts can get to root on many machines, then we’re not really more secure than a standard root password. Second is that it makes it more difficult to detect and enforce password expiration and systematic password changes.

I’m curious what approaches other people are taking on this.