Category Archives: Software

Decisions on Listening to Music

A couple of days ago, I commented about my thoughts on finding a better way to listen to music, and asked for suggestions. I’ve checked out the avenues suggested, and here are my thoughts so far.

Streaming service: Spotify

Of the streaming service offerings, Spotify seems the most compelling. It has both Linux and Android clients. The Linux client is unofficial and a bit buggy, but serviceable. The service is fairly nice, and the serendipity of listening to “radio” of tracks like any track or artist I may pick is an incredible feature. It will mean a different mindset; rather than carefully choosing what I buy, I have a vast catalog at my fingertips. I’d have to give up some control, but might be better off for it.

Spotify can play local files, but sadly lacks support for FLAC that most of my collection is in. A little work with the shell, and I had a quick multi-core FLAC to MP3 conversion done. It can also identify local files and match them up with its catalog, but it does a somewhat poor job of it, despite the fact that my files are tagged by Musicbrainz. There are also a fair number of albums in my collection (from Magnatune and from local artists) that aren’t in Spotify’s catalog and probably never will be. I may wind up using git-annex or something like it to sync them between PCs.

Surprisingly, it doesn’t have any kind of web-based player available. It has a lot of features, but many of them are under-documented.

Subsonic / Supersonic

The idea of being able to stream music anywhere certainly has appeal, and the idea of streaming that music from my own machine — my own collection — is quite the appealing one. Subsonic seems to be the standard that people recommend for this. It has a web-based player that works well, as well as mobile clients for various platforms. The Android mobile client is particularly well-regarded and is probably the best one here, aside from Spotify. The playlists work well, as expected, and generally it has the appearance of a well-rounded system. There is a lot of flexibility in bitrates; maximum bitrates can be set per-user and per-client. It, of course, transcodes on the fly.

My chief gripe about Subsonic is that it has no true album/artist/genre browser. Its main browser is mostly a filesystem browser. I have used clients that show album/artist/genre panes or sortable lists, based on tags, for so long that my filesystem organization is rather different – centered around bitrate of rip, origin, etc. It is not very easy to navigate my collection in that way. The search feature can search artist and album, and this can partially make up for it. But there is simply no way to see a list of all artists in the collection.

The server is written in Java, and takes up 362MB at startup. That’s, I guess, decent for a Java server, but doesn’t make me pleased as I’d be running it on lower-end hardware.

Supersonic is a git-maintained actively synced subsonic fork which removes the license cost (the source is GPL3 anyhow so that wasn’t a big deal).

Ampache

Often mentioned in the same breath as Subsonic, this is yet another frequent suggestion for streaming your own collection. Its primary player is web-based, like Subsonic. It has full artist/genre browsers, and adds tag cloud and various other options as well – by default, it loaded the tag cloud with my genre tags, which was a nice touch. The search feature is a little less robust, as it simply searches anywhere and returns a list of all matching tracks, rather than listing matching albums and artists before going into the list of all tracks. But that is somewhat of a minor nit.

Ampache’s web-based player, liks Subsonic’s, uses Flash, but the implementation is really quite awkward. You can’t simply click a play button and have playback start. Rather, you must add things to the “playlist” (queue), a frame on the web interface. Then, you must click play in THAT frame, which launches a new window for the player and transfers its contents to it. You thus have two separate running playlists. And if you think you can just add to the playlist of the player window, think again; you can only replace it. It is simply not possible to add a track to the end of that list without losing your existing place. This makes the usefulness of the web-based player rather, well, low.

Playlist functionality in Ampache is rather odd. You can have playlists, but if you want to add a track to a saved playlist, first you have to clear you current playlist, then add the track to it, then save it to the existing playlist.

Frontends for Ampache exist in banshee and in amarok. Banshee’s is broken, and simply silently fails to do anything. Unless you look in the console, in which case you’ll see a .NET backtrace. Banshee has had these mysterious errors for me for years. The Amarok plugin is so basic that you can browse by artist but not by album, and is almost unusable. There is also a rhythmbox plugin in Debian, but it is apparently for an old version of Rhythmbox and no longer works.

The best frontend is a dedicated Ampache frontend called Viridian. It has a nice interface, provides features such as taskbar integration, and generally works pretty well. The only thing it’s bad at is playlists, and here it’s even worse than the web interface. It doesn’t support modifying the server-side playlists at all. And accessing one is an annoying “load playlist” operation, that replaces your current play queue (also called the “playlist”) with its contents by default.

Several Android clients exist for Ampache, though it seems only one has seen any updates since 2010.

mpd

This is an interesting program, as it is designed more to provide multiple points of control for a single player than it is to stream a single collection to multiple devices. Nevertheless, I tested its HTTP and icecast stream modules. (It can also stream with PulseAudio, but PulseAudio causes devious breakage on my workstation, so I refuse to bother.) This could be made to work reasonably well on my LAN using the gmpc client. There are several Android clients as well, though none of them integrate as well as Subsonic or Spotify. Most seem focused on controlling a remote player. The ones that do stream do so with some hiccups, and don’t integrate controls with the lock screen.

gmpc is a reasonable control program and, combined with mplayer, makes a decent player. It can browse the database every way I’d want to. It doesn’t let me right-click on tracks to add them to playlists, but I can use copy/paste to do so, or add them to the play queue and from there to a playlist. It’s annoying, but not completely unworkable.

But here’s the real concern. mpd has no provision whatsoever for any sort of encryption such as SSL. If I want to be listening to my music collection from on the road, I don’t really wish to open up a streaming music service for the entire Internet. Nor do I really want to mess with SSH tunnels and the like, which are cumbersome on Android (and can be in general). This is pretty much a fatal flaw to me.

Google Music

An interesting service, but ultimately one I don’t see the point for in my situation – where I already have my music on a PC that is always on. It would probably accommodate my entire collection free, but how many weeks would it take me to get it there? And how would it stay in sync? It simply doesn’t offer me any compelling advantage over streaming direct from my PC, so I don’t see much need to bother with it.

Others

I briefly considered Streeme (poor mobile support), Audiogalaxy (no Linux support), Jinzora (appears dead), and Sockso (no mobile support). I also briefly considered other streaming services such as Pandora, Slacker, rdio, etc., but based on reviews didn’t think that they were worth bothering with.

Conclusions

I’m left mainly considering Spotify and Subsonic. For right now, I’m going to try Spotify and see where that leads me, but may wind up getting back to Subsonic if I can’t find tracks I want with Spotify.

How to listen to music?

This seems like a simple question: how do you listen to music?

But in a way, I think I’ve fallen behind the times.

I’ve enjoyed listening to digital music for years, since well before MP3 even existed (MIDI files could be comfortably downloaded over dialup; some of you may even remember MOD files).

Anyhow, a few years ago, I digitized my entire CD collection, ripping it to FLAC and encoding to high-bitrate MP3 for storage on my 160GB iPod. Since, I’ve purchased some music from DRM-free MP3 retailers, and even FLAC files from Magnatune, but at this point I basically say that I have enough music. I have only rarely bought anything new in the last couple of years. I have used Musicbrainz Picard to enforce consistent tagging across almost my entire collection.

I am usually in front of a Linux machine. I run mt-daapd on my fileserver, so I can stream music from it to any other machine in the house. However, this has a lot of drawbacks. It is generally impossible to create or modify persistent playlists with this protocol. I can’t modify metadata. I can’t assign ratings. Though perhaps I don’t need to.

There are a lot of online services I could think of: Pandora, Spotify, Rdio, Slacker. Plus, of course, the upload your music to the cloud services such as Google and Amazon. Have people tried these? I’m hesitant to use a service where I pay a monthly fee, unless I wind up owning the music in a DRM-free fashion. And I’m a little nervous about the privacy implications of uploading a vast amount of music to the cloud.

There are some items in my collection that I am certain are not available on any of those others, but for the most part I’m willing to take what is out there if it meets my tastes. My tastes, by the way, typically run to classical and opera but also can go towards middle eastern or certain new age music (though I’m very picky about that, since I don’t care for most new age stuff I hear.) But these are huge categories; sometimes I’d like to hear some load and noisy classical (say, Beethoven’s Ninth conclusion, some Wagner, Verdi arias, etc.) And other times, something more contemplative (a Mozert concerto) or even quiet (a nice sonata) meet my mood. I have playlists for these on my iPod Classic – a bit of a problem, since I don’t use that device much anymore except in the car, and it takes a LOT of time to categorize things into these playlists. (An opera or a symphony could have parts in various styles and various levels of energy, for instance.) I keep a small manually-managed subset on my Android phone, which works fine. A recommendation engine for things like this could be useful, if it works well.

My requirements, basically: I want something that will play the same music on my Linux machines. I want to be able to have playlists kept in sync across my machines. And I don’t want a monthly fee. Ideally, the software will work on Android as well, and provide similar features there. Anything that requires Windows is a non-starter. I’ve considered NFS mounts, but that’s a bit annoying when the laptop disconnects from the network or when it’s not at home.

What are your thoughts? Ideas?

Windows & a dying hard disk: Solving with Linux

Today, my workstation sent me this email:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

and then a little later, this one:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Offline uncorrectable sectors

From the hard disk’s SMART data, this is a clue that the drive is failing or will soon. Sigh. Incidentally, if smartmontools isn’t installed on your machine, whether it’s a laptop, desktop, or server, it should be.

Although most of you know I run Linux on the metal on my machines almost exclusively, I do maintain a small drive with a Windows installation that I boot into every few months for various reasons. This is that drive.

The drive is non-redundant (no RAID), and although it is backed up, the backup is made via backuppc from the NTFS filesystem mounted on Linux, and is a partial backup – backing up certain data, not the OS. There are, of course, bare metal Windows backup solutions, but I generally don’t want to back up Windows from within Windows on this machine. Restoring Windows isn’t quite as simple as an mkfs, an untar, and a grub-install, either.

So my first thought is: immediately save whatever of the drive I can. So I ran apt-get install gddrescue to install the GNU ddrescue tool. ddrescue is somewhat similar to dd, but deals much more intelligently with bad blocks on the drive. It will try to read them repeatedly, with decreasing block sizes, in an effort to get every last good byte off the disk that it can. If it ultimately fails to get certain bytes read, it will write placeholder data to the output file in place of the missing data, so that the output file maintains proper size and alignment. It also saves a log file that notes what it found (see info ddrescue for more on that.)

So I created an LVM volume for the purpose (not enough free space on /home, and didn’t want to have to shrink it somehow later), and ran:

ddrescue /dev/sda /mnt/sdasave.ddrescue /mnt/sdasave.logfile

Then I went to dinner.

When I got back, I discovered there were 1 or 2 bad sectors, about halfway through the disk, but everything else was fine. So now, the question became: did I lose any data? If so, what? I needed to know if I had to revert to a backup for anything or not.

To answer THAT question, first I had to figure out the offset of the bad spots on the disk. That’s not too hard; the logfile gives it to me:

# Rescue Logfile. Created by GNU ddrescue version 1.15
# Command line: ddrescue /dev/sda /mnt/sdasave.ddrescue sdasave.logfile
# current_pos  current_status
0x3BBB8BFC00     +
#      pos        size  status
0x00000000  0x3BBB8BF000  +
0x3BBB8BF000  0x00001000  -
0x3BBB8C0000  0x38B5346000  +

what we see is that the bad sector starts at byte 0x3BBB8BF000 (256549580800 decimal) and extends for 0x1000 bytes (4096 decimal). Both the drive and NTFS use 512-byte sectors. So dividing by 512, we get sector 501073400 – 501073407 (4096 bytes is 8 sectors).

As a check, I ran grep sector /var/log/kern.log and turned up a bunch of lines like this:

Jun 14 21:39:11 hephaestus kernel: [35346.929957] end_request: I/O error, dev sda, sector 501073404

Which is within my calculated range.

But this is an absolute sector on the disk. We need the sector within the partition, so for that, we have to enlist fdisk to make that calculation.

fdisk shows, among other things:

Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   976771071   488384512    7  HPFS/NTFS/exFAT

So the Windows partition starts at disk sector 2048.

Let’s just confirm that. If I use dd if=/dev/sda1 bs=512 count=1 | hd | head, I see a line beginning with “.R.NTFS”. Exactly the same as with dd if=/dev/sda bs=512 count=1 skip=2048 | hd | head, so I read the partition table information correctly.

Subtract offset of 2048 from the earlier values, and I get relative sectors 501071352-501071359.

That’s enough to get some solid info from the filesystem via ntfscluster, part of Debian’s ntfs-3g package. I pass -s to it, and ignoring some irrelevant stuff, get my answer:

ntfscluster -s 501071352-501071359 /dev/sda1
Inode 190604 /System Volume Information/{b4816feb-b609-11e1-a908-50e549b934f7}{3808876b-c176-4e48-b7ae-04046e6cc752}/$DATA

I even reran it with a much larger sector range, just to be absolutely sure I had wiggle room in case calculations had an off-by-one error or something somewhere.

This is really great news, because the file in question is pretty much useless – I believe it’s a system restore point, which I won’t be needing anyhow.

So at this point, all that remains is to reinstall this on a different drive. For that, I could just use my ddrescue image. I thought I would take a second image, just to be very extra careful, and use that; I used:

partclone.ntfs --rescue -c -s /dev/sda1 -o sda1.partclone

although ntfsclone would work just as well. This captures only the partition; I’ll need the partition table as well, and perhaps also the space between the partition table and the first partition. I could capture it separately with dd, but it’s already in the ddrescue image, so there’s no need. (GRUB is installed on this drive, but there is no Linux filesystem on it, so it may well exceed the size of the MBR).

Note that for Linux ext[234] filesystems, debugfs can provide the same (and more) info as I got from ntfscluster.

I happen to have a drive of the right size sitting here, which I was about to install in a different machine. So a wipe and a swap and a restore later, and I should be good to go.

This scenario is commonplace enough that I thought I’d post how I dealt with it, in case anyone else ever has hard drive issues.

How to debugging Linux failure to resume from suspend?

I’m running a computer with a Gigabyte Z68A-D3H-B3 motherboard, and have never been able to get it to properly resume from suspend to RAM in Linux. It has worked fine on the rare occasion I’ve tried it in Windows 7.

My somewhat limited usual for debugging aren’t particularly helpful. The system appears to suspend perfectly fine. It just doesn’t resume. To be more precise, when I push the button to resume, the power comes up (fans whir, HDD spins up, etc.) but nothing happens. The USB keyboard and mouse don’t respond, Caps Lock doesn’t toggle any LEDs, it doesn’t respond on the wired LAN, and the display stays off.

Although it’s a desktop, I’d really like to save power on this thing by suspending it when it’s not in use. There’s no sense in wasting power I don’t need to be consuming.

I’ve tried what I used to try on laptops. I tried running in single-user mode, without X, or even the kernel modules for video acceleration loaded. I tried unloading whatever hardware modules I thought I could without completely destabilizing the system. I updated the BIOS to the latest release. I tried various combinations of video tweaks. I tried using s2ram from uswsusp instead of pm-suspend. Nothing made any difference. They all behaved exactly the same.

Googling showed a lot of resources for people that had trouble getting their machines to go to sleep. And also for people whose machines would wake up but just wouldn’t re-activate the display. But precious little for people with my particular symptoms.

What’s a good place to start looking to fix something like this?

Some details…

CPU is Core i5-2400. Kernel is wheezy’s 3.2.0-2-amd64, though this problem has persisted as long as I’ve had this machine, which was running squeeze at install time. Video is NVidia GeForce GTX 560 (GF114). Hard drives are SATA, Ethernet is integrated RTL8111/8168B. Userland is up-to-date amd64 wheezy.

Shell Scripts For Preschoolers

It probably comes as no surprise to anybody that Jacob has had a computer since he was 3. Jacob and I built it from spare parts, together.

It may come as something of a surprise that it has no graphical interface, and Jacob uses the command line and loves it — and did even before he could really read.

A few months ago, I wrote about the fun Jacob had with speakers and a microphone, and posted a copy of the cheat sheet he has with his computer. Lately, Jacob has really enjoyed playing with the speech synthesizer — both trying to make it say real words and nonsense words. Sometimes he does that for an hour.

I was asked for a copy of the scripts I wrote. They are really simple. I gave them names that would be easy for a preschooler to remember and spell, even if they conflicted with existing Unix/Linux commands. I put them in /usr/local/bin, which occurs first on the PATH, so it doesn’t matter if they conflict.

First, for speech systhesis, /usr/local/bin/talk:


#!/bin/bash
echo "Press Ctrl-C to stop."
espeak -v en-us -s 150

espeak comes from the espeak package. It seemed to give the most consistenly useful response.

Now, on to the sound-related programs. Here’s /usr/local/bin/ssl, the “sound steam locomotive”. It starts playing a train sound if one isn’t already playing:


#!/bin/bash
pgrep mpg321 > /dev/null || mpg321 -q /usr/local/trainsounds/main.mp3 &
sl "$@"

And then there’s /usr/local/bin/record:


#!/bin/bash
cd $HOME/recordings
echo "Now recording. Press Ctrl-C to stop."
DATE=`date +%Y-%m-%dT%H-%M-%S`
FILENAME="$DATE-$$.wav"
chmod a-w *.wav
exec arecord -c 1 -f S16_LE -c 1 -r 44100 "$FILENAME"

This simply records in a timestamped file. Then, its companion, /usr/local/bin/play. Sorry about the indentation; for whatever reason, it is being destroyed by the blog, but you get the idea.


#!/bin/bash
case "$1" in
train)
mpg321 /usr/local/trainsounds/main.mp3
;;
song)
/usr/bin/play /usr/local/trainsounds/traindreams.flac
;;
*)
cd $HOME/recordings
exec aplay `ls -tr| tail -n 1`
;;
esac

So, Jacob can run just “play”, which will play back his most recent recording. As something of a bonus, the history of recordings is saved for us to listen to later. If he types “play train”, there is the sound of a train passing. And, finally, “play song” plays Always a Train in My Dreams by Steve Gillette (I heard it on the radio once and bought the CD).

Some of these commands kick off sound playing in the background, so here is /usr/local/bin/bequiet:


#!/bin/bash
killall mpg321 &> /dev/null
killall play &> /dev/null
killall aplay &> /dev/null
killall cw &> /dev/null

A 4-year-old, Linux command line, and microphone

There are certain times when I’m really glad that we have Linux on the house for our boys to play with. I’ve already written how our 4-year-old Jacob has fun with bash and can chain together commands to draw ASCII animated steam locomotives. Today I thought it might be fun to install cw, a program that can take text on standard input and play it on the console speaker or sound card as Morse code. Just the sort of thing that I could see Jacob eventually getting a kick out of.

But his PC was mute. We opened it up and discovered it didn’t have a console speaker. So we traipsed downstairs, dug out an external speaker, and I figured out how to enable the on-board audio chipset in the BIOS. So now the cw command worked, but also there were a lot of other possibilities. We also brought up a microphone.

While Jacob was busy with other things, I set to work getting things hooked up, volume levels adjusted, and wrote some shell scripts for him. I also printed out this reference sheet for Jacob:

He is good at reading but not so good at spelling. I intentionally didn’t write down what the commands do, hoping that this would provide some avenue for exploration for him. He already is generally familiar with the ones under the quiet category.

I wrote a shell script called “record”. It simply records from the microphone and drops a timestamped WAV file in a holding directory. He can then type “play” to simply play back whatever he recorded most recently. Easy enough.

But what he really wanted was sound for his ASCII steam locomotive. So with the help of a Google search for “steam train mp3”, I wrote a script “ssl” (sound steam locomotive) that starts playing the sound in the background if it isn’t already going, and then runs sl to show the animation. This was a big hit.

I also set it up so he can type “play train” to hear that audio, or “play song” to play our favorite train song (Always a Train in My Dreams by Steve Gillette). Jacob typed that in and sat still for the entire 3 minutes listening to it.

I had to hook up an Ethernet cable to his machine to do all this, and he was very interested that I was hooking his computer up to mine in some way. He thought all the stuff about cables in the walls was quite exciting.

The last thing I did was install flite, a speech synthesis program. I wrote a small shell script called “talk” which reads a line at a time from stdin and invokes flite for each one (to give more instant feedback rather than not starting playback until after having read a large block from stdin). He had some fun hearing it say his name and other favorite words, but predictably the most fun was when he typed gibberish at it, and heard it try to pronounce or spell nonsense words.

In all, he was so excited about this new world of computer sound opened up to him. I’m sure there will be lots of happy experimentation and discovery going on.

Update Feb 10, 2012: I have posted the shell scripts behind this.

Geeks, Hobbies, and Free/Open Source: Feedback Wanted

I’ve been thinking lately about ways to improve ways in which I interact with Free Software projects, and ways in which they interact with me. Before I proceed to take steps or make suggestions, I’d like to see if others share my traits and observations.

Here are some questions I have been thinking of. If you’d like to help give me anecdotal evidence, please post a comment below this post. Identify the question numbers you are answering. It helps me if you can give specific examples, but if you don’t have the time or memory for that, no problem.

I will post my own answers in a day or two, but the point of this post is listening, not talking, so I’ll not post them immediately.

Hobbies (General – any geeks)

  • H1: To what degree do you like your hobbies to be challenging vs. easy? If something isn’t challenging, does that make it a good, bad, or indifferent candidate for a hobby
  • H2: To what degree do you like your hobbies to be educational or enlightening?
  • H3: How do you pick up new hobbies? Do you go looking for them? Do you stumble upon them? What excites you to commit time and/or money to them at the beginning?
  • H4: How does your interest wane? What causes you to lose interest in hobbies?
  • H5: For how long do you tend to maintain hobbies? Sub-hobbies?
  • H6: Are your hobbies or sub-hobbies cyclical? In other words, do you lose interest in a hobby for a time, then regain interest for a time, then lose it again? What is the length of time of these cycles, if any?
  • H7: Do you prefer social hobbies or solitary hobbies? (Note that many hobbies, including programming, video gaming, reading, knitting, etc. could be either social or solitary, depending on the inclination of individuals.)
  • H8: Have you ever felt guilt about wanting to stop a hobby or sub-hobby? (For instance, from stopping supporting users of your software project, readers of your e-zine, etc) Did the guilt keep you going? Was that a good thing?

Examples: video games might be a challenging hobby (depending on the person) but in most cases aren’t educational.

A hobby might be “video game playing” or “being a Debian developer.” A sub-hobby might be “playing GTA IV”, “playing RPGs”, or “maintaining mutt”.

Free/Open Source Hobbies

  • F1: Considering your answers above, do your FLOSS activities follow the same general pattern as your other hobbies/interests, or are there differences? If there are differences, what are they?
  • F2: Has concern for being expected to support software longer than you will have an interest in it ever been a factor in a decision whether to release source code publicly, or how public to make a release?
  • F3: Has concern over the long-term interest of a submitter in maintaining their patch/contribution ever caused you to consider rejecting it? (Or caused you to avoid using software over the same concern about its author)
  • F4: In general, do you find requirements FLOSS projects place on first-time contributors to be too stringent, not stringent enough, or about right?
  • F5: Have you ever continued contributing to a project past the point where your interest would otherwise motivate you to do so? If so, what caused you to do this? Do you believe that cause is a general positive or negative force for members of the FLOSS community?
  • F6: Have there ever been factors that caused you to stop contributing to a project even though you still had an active interest in doing so? What were they?
  • F7: Have you ever wanted to be able to take a break as a contributor or maintainer of a project, and be able to return to contributing to it later? If so, have you found it easy to do so?
  • F8: What is your typical length of engagement with FLOSS projects (such as Debian) and sub-projects (such as maintaining a particular package)?
  • F9: Does a change in social group ever encourage or discourage you from changing hobbies or sub-hobbies?
  • F10: Have you ever wanted to stop working on a project/sub-project because the problems involved were no longer challenging or educational to you?
  • F11: Have you ever wanted to stop working on a project/sub-project because of issues with the people involved?

Examples on F9: If, say, you are a long-time Perl user and have gone to Perl conferences, but now you are interested in Ruby, would your involvement with the Perl community cause you to avoid taking up the Ruby programming hobby? Or would it cause you to cut your ties with Perl less quickly than your changing interest might dictate? (This is a completely arbitrary example and isn’t meant to start a $LANGUAGE thread.)

Changes over time

  • C1: Do you believe that your answers to any of the above questions have changed over time? If yes, then:
  • C2: What kinds of changes have happened?
  • C3: What caused the change?
  • C4: Do you believe the changes produced positive results for you? For the community?

A Proud Dad

I saw this on my computer screen the other day, and I’ve got to say it really warmed my heart. I’ll explain below if it doesn’t provoke that reaction for you.

Evidence a 4-year-old has been using my computer

So here’s why that made me happy. Well for one, it was the first time Jacob had left stuff on my computer that I found later. And of course he left his name there.

But moreover, he’s learning a bit about the Unix shell. sl is a command that displays an animated steam locomotive. I taught him how to use the semicolon to combine commands. So he has realized that he can combine calls to sl with the semicolon to get a series of a LOT of steam trains all at once. And was very excited about this discovery.

Also he likes how error messages start with the word “bash”.

Unix Password and Authority Management

One of the things that everyone seems to do different is managing passwords. We haven’t looked at that in quite some time, despite growth both of the company and the IT department.

As I look to us moving some things to the cloud, and shifting offsite backups from carrying tapes to a bank to backups via the Internet, I’m aware that the potential for mischief — whether intentional or not — is magnified. With cloud hosting, a person could, with the press of a button, wipe out the equivalent of racks of machines in a few seconds. With disk-based local and Internet-based offsite backups, the potential for malicious behavior may be magnified; someone could pretty quickly wipe out local and remote backups.

Add to that the mysterious fact that many enterprise-targeted services allow only a single username/password for an account, and make no provision for ACLs to delegate permissions to others. Even Rackspace Cloud has this problem, as do their JungleDisk backup product, and many, many other offsite backup products. Amazon AWS seems to be the only real exception to this rule, and their ACL support is more than a little complicated.

So one of the questions we will have to address is the balance of who has these passwords. Too many people and the probability of trouble, intentional or not, rises. Too few and productivity is harmed, and potentially also the ability to restore. (If only one person has the password, and that person is unavailable, company data may be as well.) The company does have some storage locations, including locked vaults and safe deposit boxes, that no IT people have access to. I am thinking that putting a record of passwords in those locations may be a good first step, as putting the passwords in the control of those that can’t use them seems a reasonable step.

But we’ve been thinking of this as it pertains to our local systems as well. We have, for a number of years now, assigned a unique root password to every server. These passwords are then stored in a password-management tool, encrypted with a master password, and stored on a shared filesystem. Everyone in the department therefore can access every password.

Many places where I worked used this scheme, or some variant of it. The general idea was that if root on one machine was compromised and the attacker got root’s password, it would prevent the person from being able to just try that password on the other servers on the network and achieve a greater level of intrusion.

However, the drawback is that we now have more servers than anyone can really remember the passwords for. So many people are just leaving the password tool running. Moreover, while the attack described above is still possible, these days I worry more about automated intrusion attempts that most likely won’t try that attack vector.

A couple of ways we could go may include using a single root password everywhere, or a small set of root passwords. Another option may be to not log in to root accounts at all — possibly even disabling their password — and requiring the use of user accounts plus sudo. This hasn’t been practical to date. We don’t want to make a dependency on LDAP from a bunch of machines just to be able to use root, and we haven’t been using a tool such as puppet or cfengine to manage this stuff. Using such a tool is on our roadmap and could let us manage that approach more easily. But this approach has risks too. One is that if user accounts can get to root on many machines, then we’re not really more secure than a standard root password. Second is that it makes it more difficult to detect and enforce password expiration and systematic password changes.

I’m curious what approaches other people are taking on this.

rdiff-backup, ZFS, and rsync scripts

rdiff-backup vs. ZFS

As I’ve been writing about backups, I’ve gone ahead and run some tests with rdiff-backup. I have been using rdiff-backup personally for many years now — probably since 2002, when I packaged it up for Debian. It’s a nice, stable system, but I always like to look at other options for things every so often.

rdiff-backup stores an uncompressed current mirror of the filesystem, similar to rsync. History is achieved by the use of compressed backwards binary deltas generated by rdiff (using the rsync algorithm). So, you can restore the current copy very easily — a simple cp will do if you don’t need to preserve permissions. rdiff-backup restores previous copies by applying all necessary binary deltas to generate the previous version.

Things I like about rdiff-backup:

  1. Bandwidth-efficient
  2. Reasonably space-efficient, especially where history is concerned
  3. Easily scriptable and nice CLI
  4. Unlike tools such as duplicity, there is no need to periodically run full backups — old backups can be deleted without impacting the ability to restore more current backups

Things I don’t like about it:

  1. Speed. It can be really slow. Deleting 3 months’ worth of old history takes hours. It has to unlink vast numbers of files — and that’s pretty much it, but it does it really slowly. Restores, backups, etc. are all slow as well. Even just getting a list of your increment sizes so you’d know how much space would be saved can take a very long time.
  2. The current backup copy is stored without any kind of compression, which is not at all space-efficient
  3. It creates vast numbers of little files that take forever to delete or summarize

So I thought I would examine how efficient ZFS would be. I wrote a script that would replay the rdiff-backup history — first it would rsync the current copy onto the ZFS filesystem and make a ZFS snapshot. Then each previous version was processed by my script (rdiff-backup’s files are sufficiently standard that a shell script can process them), and a ZFS snapshot created after each. This lets me directly compare the space used by rdiff-backup to that used by ZFS using actual history.

I enabled gzip-3 compression and block dedup in ZFS.

My backups were nearly 1TB in size and the amount of space I had available for ZFS was roughly 600GB, so I couldn’t test all of them. As it happened, I tested the ones that were the worst-case scenario for ZFS: my photos, music collection, etc. These files had very little duplication and very little compressibility. Plus a backup of my regular server that was reasonably compressible.

The total size of the data backed up with rdiff-backup was 583 GB. With ZFS, this came to 498GB. My dedup ratio on this was only 1.05 (meaning 5% or 25GB saved). The compression ratio was 1.12 (60GB saved). The combined ratio was 1.17 (85GB saved). Interestingly 498 + 85 = 583.

Remember that the data under test here was mostly a worst-case scenario for ZFS. It would probably have done better had I had the time to throw the rest of my dataset at it (such as the 60GB backup of my iPod, which would have mostly deduplicated with the backup of my music server).

One problem with ZFS is that dedup is very memory-hungry. This is common knowledge and it is advertised that you need to use roughly 2GB of RAM per TB of disk when using dedup. I don’t have quite that much to dedicate to it, so ZFS got VERY slow and thrashed the disk a lot after the ARC grew to about 300MB. I found some tweakables in zfsrc and the zfs command that let me tweak the ARC cache to grow bigger. But the machine in question only has 2GB RAM, and is doing lots of other things as well, so this barely improved anything. Note that this dedup RAM requirement is not out of line with what is expected from these sorts of solutions.

Even if I got absolutely stellar dedup ratio of 2:1, that would get me at most 1TB. The cost of buying a 1TB disk is less than the cost of upgrading my system to 4GB RAM, so dedup isn’t worth it here.

I think the lesson is: think carefully about where dedup makes sense. If you’re storing a bunch of nearly-identical virtual machine images — the sort of canonical use case for this — go for it. A general fileserver — well, maybe you should just add more disk instead of more RAM.

Then that raises the question: if I don’t need dedup from ZFS, do I bother with it at all, or just use ext4 and LVM snapshots? I think ZFS still makes sense, given its built-in support for compression and very fast snapshots — LVM snapshots are known to cause serious degradation to write performance once enabled, which ZFS doesn’t.

So I plan to switch my backups to use ZFS. A few observations on this:

  1. Some testing suggests that the time to delete a few months of old snapshots will be a minute or two with ZFS compared to hours with rdiff-backup.
  2. ZFS has shown itself to be more space-efficient than rdiff-backup, even without dedup enabled.
  3. There are clear performance and convenience wins with ZFS.
  4. Backup Scripts

    So now comes the question of backup scripts. rsync is obviously a pretty nice choice here — and if used with –inplace perhaps even will play friendly with ZFS snapshots even if dedup is off. But let’s say I’m backing up a few machines at home, or perhaps dozens at work. There is a need to automate all of this. Specifically, there’s a need to:

    1. Provide scheduling, making sure that we don’t hammer the server with 30 clients all at once
    2. Provide for “run before” jobs to do things like snapshot databases
    3. Be silent on success and scream loudly via emails to administrators on any kind of error… and keep backing up other systems when there is an error
    4. Create snapshots and provide an automated way to remove old snapshots (or mount them for reading, as ZFS-fuse doesn’t support the .zfs snapshot directory yet)

    To date I haven’t found anything that looks suitable. I found a shell script system called rsbackup that does a large part of this, but something about using a script whose homepage is a forum makes me less than 100% confident.

    On the securing the backups front, rsync comes with a good-looking rrsync script (inexplicably installed under /usr/share/doc/rsync/scripts instead of /usr/bin on Debian) that can help secure the SSH authorization. GNU rush also looks like a useful restricted shell.