Category Archives: Uncategorized

Bicycling to Work

We hear a lot these days about the price of gas, energy efficiency, and the like. But, in the United States, outside of a few progressive cities, there aren’t a lot of people that are using the ultimate zero-emissions transportation technology: bicycles.

That’s really too bad, because bicycles are a lot cheaper to operate than cars even before you consider gas prices. They also are great exercise and are probably faster, safer, and more convenient than you think.

I live about 10 miles (16 km) from work, which includes several miles on sand roads. I haven’t bicycled in about 6 years. Last week, I got my bicycle out, touched it up a bit, and started riding. Sunday I rode in to work and back as a test. As soon as I get a bit of gear (hopefully by the middle of next week), I plan to start riding bike to work at least 3 days a week.

I’ve picked up some tips along the way. Let’s talk about a few of them.

Safety

Many people think bicycling is dangerous. In fact, bicycling is about as safe as driving an SUV. Not only that, but only 10% of bicycling accidents occur when you are hit from behind (and 90% of those produce only minor injuries). It turns out that the vast majority of bicycling accidents occur because people are not riding on the road with traffic, or are acting unpredictably. Following some basic safety advice can make you safer in a bicycle than an SUV. Oh, and don’t drink and ride; 24% of fatal bicycle accidents involve an intoxicated rider.

Distance

Think it’s too far? Think again. It’s fairly easy for an untrained, unfit person to ride a bicycle up to 10 miles without working hard at it. That can probably be done in about an hour. As you get more fit and used to the bike, you may be able to go that distance in half that time. Also, get pannier bags for your bicycle. They attach in back and let you carry work clothes, laptops, etc. without having to use a backpack.

Smell

Many people with office jobs are concerned about this. Not everywhere has a convenient shower. Check out these tips from the Tips and Tricks for Biking to Work manual.

I’m excited about it, and will be sure to post more here on how it goes.

Backup Software

I think most people reading my blog would agree that backups are extremely important. So much important data is on computers these days: family photos, emails, financial records. So I take backups seriously.

A little while back, I purchased two identical 400GB external hard disks. One is kept at home, and the other at a safe deposit box in a bank in a different town. Every week or two, I swap drives, so that neither one ever becomes too dated. This process is relatively inexpensive (safe deposit boxes big enough to hold the drive go for $25/year), and works well.

I have been using rdiff-backup to make these backups for several years now. (Since at least 2004, when I submitted a patch to make it record all metadata on MacOS X). rdiff-backup is quite nice. It is designed for storage to a hard disk. It stores on the disk a current filesystem mirror along with some metadata files that include permissions information. History is achieved by storing compressed rdiff (rsync) deltas going backwards in time. So restoring “most recent” files is a simple copy plus application of metadata, and restoring older files means reversing history. rdiff-backup does both automatically.

This is a nice system and has served me well for quite some time. But it has its drawbacks. One is that you always have to have the current image, uncompressed, which uses up lots of space. Another is that you can’t encrypt these backups with something like gpg for storage on a potentially untrusted hosting service (say, rsync.net). Also, when your backup disk fills up, it takes forever to figure out what to delete, since rdiff-backup –list-increment-sizes must stat tens of thousands of files. So I went looking for alternatives.

The author of rdiff-backup actually wrote one, called duplicity. Duplicity works by, essentially, storing a tarball full backup with its rdiff signature, then storing tarballs of rdiff deltas going forward in time. The reason rdiff-backup must have the full mirror is that it must generate rdiff deltas “backwards”, which requires the full prior file available. Duplicity works around this.

However, the problem with duplicity is that if the full backup gets lost or corrupted, nothing newer than it can be restored. You must make new full backups periodically so that you can remove the old history. The other big problem with duplicity is that it doesn’t grok hard links at all. That makes it unsuitable for backing up /sbin, /bin, /usr, and my /home, in which I frequently use hard links for preparing CD images, linking DVCS branches, etc.

So I went off searching out other projects and thinking about the problem myself.

One potential solution is to simply store tarballs and rdiff deltas going forward. That would require performing an entire full backup every day, which probably isn’t a problem for me now, but I worry about the load that will place on my hard disks and the additional power it would consume to process all that data.

So what other projects are out there? Two caught my attention. The first is Box Backup. It is similar in concept to rdiff-backup. It has its own archive format, and otherwise operates on a similar principle to rdiff-backup. It stores the most recent data in its archive format, compressed, along with the signatures for it. Then it generates reverse deltas similar to rdiff-backup. It supports encryption out of the box, too. It sounded like a perfect solution. Then I realized it doesn’t store hard links, device entries, etc., and has a design flaw that causes it to miss some changes to config files in /etc on Gentoo. That’s a real bummer, because it sounded so nice otherwise. But I just can’t trust my system to a program where I have to be careful not to use certain OS features because they won’t be backed up right.

The other interesting one is dar, the Disk ARchive tool, described by its author as the great grandson of tar — and a pretty legitimate claim at that. Traditionally, if you are going to back up a Unix box, you have to choose between two not-quite-perfect options. You could use something like tar, which backs up all your permissions, special files, hard links, etc, but doesn’t support random access. So to extract just one file, tar will read through the 5GB before it in the archive. Or you could use zip, which doesn’t handle all the special stuff, but does support random access. Over the years, many backup systems have improved upon this in various ways. Bacula, for instance, is incredibly fast for tapes as it creates new tape “files” every so often and stores the precise tape location of each file in its database.

But none seem quite as nice as dar for disk backups. In addition to supporting all the special stuff out there, dar sports built-in compression and encryption. Unlike tar, compression is applied per-file, and encryption is applied per 10K block, which is really slick. This allows you to extract one file without having to decrypt and decompress the entire archive. dar also maintains a catalog which permits random access, has built-in support for splitting archives across removable media like CD-Rs, has a nice incremental backup feature, and sports a host of tools for tweaking archives — removing files from them, changing compression schemes, etc.

But dar does not use binary deltas. I thought this would be quite space-inefficient, so I decided I would put it to the test, against a real-world scenario that would probably be pretty much a worst case scenario for it and a best case for rdiff-backup.

I track Debian sid and haven’t updated my home box in quite some time. I have over 1GB of .debs downloaded which represent updates. Many of these updates are going to touch tons of files in /usr, though often making small changes, or even none at all. Sounds like rdiff-backup heaven, right?

I ran rdiff-backup to a clean area before applying any updates, and used dar to create a full backup file of the same data. Then I ran apt-get upgrade, and made incrementals with both rdiff-backup and dar. Finally I ran apt-get dist-upgrade, and did the same thing. So I have three backups with each system.

Let’s look at how rdiff-backup did first.

According to rdiff-backup –list-increment-sizes, my /usr backup looks like this:

        Time                       Size        Cumulative size
-----------------------------------------------------------------------------
Sun Apr 13 18:37:56 2008         5.15 GB           5.15 GB   (current mirror)
Sun Apr 13 08:51:30 2008          405 MB           5.54 GB
Sun Apr 13 03:08:07 2008          471 MB           6.00 GB

So what we see here is that we’re using 5.15GB for the mirror of the current state of /usr. The delta between the old state of /usr and the state after apt-get upgrade was 471MB, and the delta representing dist-upgrade was 405MB, for total disk consumption of 6GB.

But if I run du -s over the /usr storage area in rdiff, it says that 7.0GB was used. du -s –apparent-size shows 6.1GB. The difference is that all the tens of thousands of files each waste some space at the end of their blocks, and that adds up to an entire gigabyte. rdiff-backup effectively consumed 7.0GB of space.

Now, for dar:

-rw-r--r-- 1 root root 2.3G Apr 12 22:47 usr-l00.1.dar
-rw-r--r-- 1 root root 826M Apr 13 11:34 usr-l01.1.dar
-rw-r--r-- 1 root root 411M Apr 13 19:05 usr-l02.1.dar

This was using bzip2 compression, and backed up the exact same files and data that rdiff-backup did. The initial mirror was 2.3GB, much smaller than the 5.1GB that rdiff-backup consumes. The apt-get upgrade differential was 826MB compared to the 471MB in rdiff-backup — not really a surprise. But the dist-upgrade differential — still a pathologically bad case for dar, but less so — was only 6MB larger than the 405MB rdiff-backup case. And the total actual disk consumption of dar was only 3.5GB — half the 7.0GB rdiff-backup claimed!

I still expect that, over an extended time, rdiff-backup could chip away at dar’s lead… or maybe not, if lots of small files change.

But this was a completely unexpected result. I am definitely going to give dar a closer look.

Also, before I started all this, I converted my external hard disk from ext3 to XFS because of ext3’s terrible performance with rdiff-backup.

Why Are We So Afraid of Socialized Medicine?

I’ve been thinking about this for awhile, so time to put down some thoughts.

First, what is socialized medicine? If we listen to the politicians that label health care as “socialized”, it seems to be “anything that is paid for by taxes and delivered free or cheaply to citizens.” Putting aside the question of whether that meets the academic definition of socialism for the moment, let’s look at things in the United States that are already socialized:

  • K-12 education
  • Police
  • Fire fighters
  • Public Libraries
  • Roads
  • Airports and air traffic control
  • Military defense and offense

That’s right. We trust the government with our children all day long for 13 years. For free!

Yet this is a country in which hospitals dump the homeless in the gutter for being unable to pay their bills. Even insured Americans find claims turned down for arbitrary reasons. People are afraid to change jobs for fear of losing health insurance.

Why is it bad to have the government pay for health care?

Here in the United States, our health care system is far from best in the world. It’s not even top 10. Or 20. Our system encourages minimizing health care, and doesn’t encourage preventative care.

I’d suggest that, in a democracy, it’s best to have the government pay for health care. That’s because, in a democracy, we are in control of the bureaucrats. If we wish to exercise common sense and pound into their heads that paying for preventative care makes good long-term sense, then we can do so at the ballot box.

So why the scare tactics about government being involved in health care?

Perhaps our real problem is that we have let government get out of our control? Perhaps we are too frightened of change to vote. Perhaps we’ve given up on a responsive government. Perhaps we think that the insurance companies and drug companies will never let us have a good health care system.

Yes, the lobbyists have a lot of power. But we have the power to remove it, and it’s high time we used it.

The audacity of Obama to have hope. To say that we can do better. When Hillary Clinton falls in line with the Republicans and accuses him of having “false hope”, effectively saying that we can’t do any better, then is she — or any Republicans — really a candidate of change? I think that all these accusations from conservatives and Hillary that Obama has “false hope” has finally convinced me that he’s the one to vote for. If everyone else claims that his ideas are too good, that his dreams too big, then I like him.

Oh, and you could substitute “college education” for “socialized medicine” everywhere in this article and get equally valid arguments.

DjVu: Almost Awesome

Earlier today, I started reading about the DjVu family of document formats. It really sounds slick: file sizes much smaller than PNG (and incredibly smaller than TIFF or PDF) for lossless data with the DjVuText format, file sizes much smaller than JPEG with equivalent quality for the DjVuPhoto format, and an advanced DjVuDocument format that separates the background photo from the foreground text and produces a quite nice output. There are wonderful plugins for browsers on all platforms, and server-side support already in Debian for sending pages incrementally as needed by clients.

I tried this out a bit and indeed it looks great on monochrome scans, and I made a quick try of DjVuPhoto as well. That part looks great.

So here’s the bad news.

Debian has no nice way to generate DjVuDocument files. There is a PS/PDF-to-DjVu converter that uses a djvu driver for Ghostscript. But Debian does not include that driver. Though, strangely, the program that depends on this driver is actually in Debian main. (Bug filed.) That program actually will make background-separated images, but only if they are separate objects in the input.

All Debian has is a program csepdjvu, which requires you to somehow manually separate the foreground and background images. Ugh.

So there is no way using software in Debian to produce DjVuDocument files with automatic separation, either from scans or from a digital source. It appears that there may not be Free Software to do this from scans either. This fact is not made clear at all in the DjVu documentation that is around.

The Sky Is Falling!

A very sad day approaches.

Those of you old enough to remember Gopher may proceed to shed a quiet, ASCII-art tear. Gopher document type 0.

For those of you that don’t know what Gopher is, here’s my quick summary:

* It existed before the web.
* It is an extremely simple protocol designed to be an Internet-wide filesystem, though the bit that let you mount the Internet like a disk never quite happened, though still could. (actually I think I saw a FUSE gopher implementation recently)
* It pretty much does everything that WAP does, but about 50 times simpler. Why smartphones invented WAP instead of just using Gopher is still a mystery to me.
* XML-RPC is usually extreme overkill when you could use a simple protocol like Gopher

Now how many of you remember Veronica and Archie?

Spineless Democrats

The Democrats ran in 2008 on the platform of ending the Iraq war, and won largely on that platform. Now they are failing to deliver upon it.

It is true that they have a thin majority in the Senate and a not much wider one in the House. It is also true that it takes 60 votes to pass legislation in the Senate, which they don’t have.

But here’s the thing. It takes 60 votes to pass legislation. That means that they can easily defeat any massive Republican war spending bill.

I think they are worried about the Republicans painting them as being against the troops. So what? If the Republicans vote against a Democratic funding bill that provides adequate funds for an orderly withdrawal, aren’t they doing the same? “No” votes on both are votes to prevent the funding from passing.

They easily have the votes to defeat massive Republican spending bills. So why not advance a spending bill like they campaigned for, and watch all the Republicans vote against it? If no funding at all passes, they achieve their objective, just not as cleanly, and the Republicans would be the ones voting against funding. Make the Republicans take some heat for a change, and give them no choice but to compromise.

Time: Failing Our Geniuses

An interesting article on Time today: Failing Our Geniuses about how the most talented students are being sidelined by current education policy. Some choice bits:

Since well before the Bush Administration began using the impossibly sunny term “no child left behind,” those who write education policy in the U.S. have worried most about kids at the bottom, stragglers of impoverished means or IQs. But surprisingly, gifted students drop out at the same rates as nongifted kids–about 5% of both populations leave school early. Later in life, according to the scholarly Handbook of Gifted Education, up to one-fifth of dropouts test in the gifted range.

It can’t make sense to spend 10 times as much to try to bring low-achieving students to mere proficiency as we do to nurture those with the greatest potential.

We take for granted that those with IQs at least three standard deviations below the mean (those who score 55 or lower on IQ tests) require “special” education. But students with IQs that are at least three standard deviations above the mean (145 or higher) often have just as much trouble interacting with average kids and learning at an average pace. Shouldn’t we do something special for them as well?

In a no-child-left-behind conception of public education, lifting everyone up to a minimum level is more important than allowing students to excel to their limit. It has become more important for schools to identify deficiencies than to cultivate gifts. Odd though it seems for a law written and enacted during a Republican Administration, the social impulse behind No Child Left Behind is radically egalitarian. It has forced schools to deeply subsidize the education of the least gifted, and gifted programs have suffered. The year after the President signed the law in 2002, Illinois cut $16 million from gifted education; Michigan cut funding from $5 million to $500,000. Federal spending declined from $11.3 million in 2002 to $7.6 million this year.

I suppose this means I’m a geek

I work in an open-plan office. Normally I like to listen to some of my iPod’s music, or NPR or something, at some point during the day. It helps me tune out distractions when I’m coding or concentrating on something. My iPod, and my nice Etymotic headphones, get transported to and from work each day in my laptop bag. Today I forgot the laptop bag at home.

What to do? I could just work without headphones. I’d be fine, but you know, I’ve got standards here. My job involves working with computers, so I ought to be able to come up with a workaround, right?

So lesse… what do I have? One binaural (mono sound, but speakers for each hear) telephone headset. One Polycom SIP phone, connected to our corporate Asterisk system. One workstation with sound capabilities. One installation of Asterisk on this workstation for testing purposes. And, a pre-existing path from the corporate system to the workstation system for testing Asterisk. (Very handy that, and used a lot when we were doing active Asterisk work.)

So in less then five minutes I had music going via my telephone headset. Lo-fi, and not noise-dampening like the Etymotics, but I enjoyed it for the simple fact that it was being played *over the phone* at no cost to anyone. My desk phone supports multiple “lines”, so I still could place and receive calls just fine.

Should anyone care to look, they’d find a 5-hour call from me to myself deep in the Asterisk logs. My own workstation logs will show that I put myself on hold for 5 hours (since I used Asterik’s music-on-hold feature to play my own selections).

IP telephony is fun. So is Asterisk.

This week’s discovery: mpix.com

Back in 2000, I started to get back into photography. I bought a Canon Elan IIe 35mm SLR camera, some lenses, and a flash. I took color photos various places, and then bought some standard Kodak T-Max black and white film. I shot some photos, and then tried to get this film processed.

Turns out it’s not terribly easy to pay someone to process black and white film. Very few of the local photo places will do it. The place I usually used was willing to send it off to Kodak for me. Lacking a darkroom at home, and any interest in doing darkroom work myself, I sent it off to Kodak.

Kodak processed my film just fine, but the prints they made were terrible. It looked as if their enlarger was seriously out of focus. Everything was fuzzy on the prints. The local shop agreed to send them back for re-printing. On try 2, they were somewhat better but not much.

Now, B&W photos should normally be sharper than color photos, so this was nanoying.

I kept looking, and nobody locally could print B&W photos. I even tried one roll of B&W C-41 film (that’s color-process film that takes photos in black and white). It stank about as much as I thought it would. I did eventually find one local lab that could take B&W negatives and, via a digital process, print the photos on color paper. They came out far more crisp than Kodak’s processing on real B&W paper!

Now, 7 years later, I’ve been shooting some photos with my Canon Digital Rebel XTi digital SLR camera that I want to print in black and white. There are any number of color labs that I can send them off to, and get results as sharp as one would expect from a color photo. I’d been doing that with reasonable results. But I wasn’t satisfied with “reasonable”, so I searched some B&W groups on Flickr to see what people were doing to make B&W prints from a digital source.

It was there that I learned of mpix.com, the online service of the USA’s largest pro photo lab. They offer printing on true B&W photo paper from digital (or film) sources. They’re a pro lab, so they ought to do this really well. They of course also do color printing. Plus all the other things you’d expect from a pro lab, such as red-eye removal, glasses glare removal, color retouching, choice of photo paper, etc.

I sent off my first order for B&W prints to them yesterday. I can’t wait to see how they turn out. I’m excited — I think they’ll be great.

Saving Power with CPU Frequency Scaling

Yesterday I wrote about the climate crisis. Today, let’s start doing something about it.

Electricity, especially in the United States and China, turns out to be a pretty dirty energy source. Most of our electricity is generated using coal, which despite promises of “clean coal” to come, burns dirty. Not only does it contribute to global warming, but it also has been shown to have an adverse impact on health.

So let’s start simple: reduce the amount of electricity our computers consume. Even for an individual person, this can add up to quite a bit of energy (and money) savings in a year. When you think about multiplying this over companies, server rooms, etc., it adds up fast. This works on desktops, servers, laptops, whatever.

The easiest way to save power is with CPU frequency scaling. This is a technology that lets you adjust how fast a running CPU runs, while it’s running. When CPUs run at slower speeds, they consume less power. Most CPUs are set to their maximum speed all the time, even when the system isn’t using them. Linux has support for keeping the CPU at maximum speed unless it is idle. By turning on this feature, we can save power at virtually no cost to performance. The Linux feature to handle CPU frequency scaling is called cpufreq.

Set up modules

Let’s start by checking to see whether cpufreq support is already enabled in your kernel. These commands will need to be run as root.

# cd /sys/devices/system/cpu/cpu0
# ls -l

If you see an entry called cpufreq, you are good and can skip to the governor selection below.

If not, you’ll need to load cpufreq support into your kernel. Let’s get a list of available drivers:

# ls /lib/modules/`uname -r`/kernel/arch/*/kernel/cpu/cpufreq

Now it’s guess time. It doesn’t really hurt if you guess wrong; you’ll just get a harmless error message. One hint, though: try acpi-cpufreq last; it’s the option of last resort.

On my system, I see:

acpi-cpufreq.ko     longrun.ko      powernow-k8.ko         speedstep-smi.ko
cpufreq-nforce2.ko  p4-clockmod.ko  speedstep-centrino.ko
gx-suspmod.ko       powernow-k6.ko  speedstep-ich.ko
longhaul.ko         powernow-k7.ko  speedstep-lib.ko

For each guess, you’ll run modprobe with the driver name. I have an Athlon64, which is a K8 machine, so I run:

#modprobe powernow-k8

Note that you leave off the “.ko” bit. If you don’t get any error message, it worked.

Once you find a working module, edit /etc/modules and add the module name there (again without the “.ko”) so it will be loaded for you on boot.

Governor Selection

Next, we need to load the driver that tells the kernel what governor to use. The governor is the thing that monitors the system and adjusts the speed accordingly.

I’m going to suggest the ondemand governor. This governor keeps the system’s speed at maximum unless it is pretty sure that the system is idle. So this will be the one that will let you save power with the least performance impact.

Let’s load the module now:

# modprobe cpufreq_ondemand

You should also edit /etc/modules and add a line that says simply cpufreq_ondemand to the end of the file so that the ondemand governor loads at next boot.

Turning It On

Now, back under /sys/devices/system/cpu/cpu0, you should see a cpufreq directory. cd into it.

To turn on the ondemand governor, run this:

# echo echo ondemand > scaling_governor

That’s it, your governor is enabled. You can see what it’s doing like this:

# cat cpuinfo_min_freq
800000
# cat cpuinfo_max_freq
2200000
# cat cpuinfo_cur_freq
800000

That shows that my CPU can go as low as 800MHz, as high as 2.2GHz, and that at the present moment, it’s running at 800MHz presently.

Now, check your scaling governor settings:

# cat scaling_min_freq
800000
# cat scaling_max_freq
800000

This is showing that the system is constraining the governor to only ever operate on an 800MHz to 800MHz range. That’s not what I want; I want it to scale over the entire range of the CPU. Since my cpuinfo_max_freq was 2200000, I want to write that out to scaling_max_freq as well:

echo 2200000 > scaling_max_freq

Making This The Default

The last step is to make this happen on each boot. Open up your /etc/sysfs.conf file. If you don’t have one, you will want to run a command such as apt-get install sysfsutils (or the appropriate one for your distribution).

Add a line like this:

devices/system/cpu/cpu0/cpufreq/scaling_governor = ondemand
devices/system/cpu/cpu0/cpufreq/scaling_max_freq = 2200000

Remember to replace the 2200000 with your own cpu_max_freq value.

IMPORTANT NOTE: If you have a dual-core CPU, or more than one CPU, you’ll need to add a line for each CPU. For instance:

devices/system/cpu/cpu1/cpufreq/scaling_governor = ondemand
devices/system/cpu/cpu1/cpufreq/scaling_max_freq = 2200000

You can see what all CPU devices you have with ls /sys/devices/system/cpu.

Now, save this file, and you’ll have CPU frequency scaling saving you money, and helping the environment, every time you boot. And with the ondemand governor, chances are you’ll never notice any performance loss.

This article showed you how to save power using CPU frequency scaling on Linux. I have no idea if it’s possible to do the same on Windows, Mac, or the various BSDs, but it would be great if someone would leave comments with links to resources for doing that if so.

Updated: added scaling_max_freq info