Monthly Archives: June 2012

I introduced my 5-year-old and 2-year-old to startx and xmonad. They’re DELIGHTED!

Two years ago, Jacob (then 3) and I built his first computer together. I installed Debian on it, but never put a GUI on the thing. It’s command-line, and has provided lots of enjoyment off and on over the last couple of years. I’ve written extensively about what our boys like to do, and the delight they have at learning things on the command line.

The looks of shock I get from people when I explain, as if it’s perfectly natural, that my child has been able to log in by himself to a Linux shell since age 3, are amusing and astounding. Especially considering that it is really not that hard. Instead of learning how to run an Xbox, he’s learned how to run bash. I like that.

Lately, Jacob (now 5) hasn’t been spending much time with it. He isn’t really at a stage where he wants to push his limits too far, I think, but yet also gets bored with the familiar. So I thought it was time to introduce a GUI in a limited fashion, perhaps to let him download photos and video from his Vtech toy camera (that takes real low-res photos and videos which can be downloaded over a USB1 link). He’s familiar with the concept, at least somewhat, having seen GUIs on Terah’s computer (Gnome 2) and mine (xfce4 + xmonad).

So last night, Oliver (age 2) and I went down to the basement on a mouse-finding expedition. Sure enough, I had an old PS/2 mouse down there that would work fine. The boys both helped string it through the desk up on our play room, and were tremendously excited to see the red light underneath it when the computer came on. Barely able to contain the excitement, really. A bit like I remember being when I got my first mouse (at a bit of an older age, I suppose.)

I helped him them in as root for the very first time. (Jacob typed “root”, and I typed the password, and provided the explanation for why we were telling the computer we were “root”.) Jacob and Oliver alternated typing bits of some apt-get command lines. Then while we waited for software to download, I had to answer repeated questions of “how soon will the mouse work?” and “what does ‘install’ mean?”

Finally it was there, and I told Jacob to type startx. I intentionally did not install a display manager; more on that later. He pressed Enter, the screen went blank for about 5 seconds, and then X appeared. “Excited” can’t begin to describe how they acted. They took turns playing with the mouse. They loved how the trash can icon (I started with XFCE) showed trash IN the trash can.

But they are just learning the mouse, and there’s a lot about a typical GUI that is unfriendly to someone that isn’t yet proficient with a mouse. The close buttons are disappointingly small, things can be too easily dragged on and off the panel and menus. When I sat down to think about it, the typical GUI design does not present a very good “it always works the same” interface that would be good for a child.

And then it occurred to me: the perfect GUI for a child would be simply xmonad (a tiling window manager that can be controlled almost entirely by keyboard and has no need for mouse movements in most cases.) No desktop environment, no file manager in the root window. Just a window manager in the classic X way. Of course!

So after the boys were in bed, I installed xmonad. I gave Jacob’s account a simple .xsession that starts a terminal and xmonad.

Today, Jacob informed me that he wanted his computer to look “just like yours.” Playing right into my hands, that was! But when he excitedly typed startx, he said it wasn’t just like mine. Uh oh. Turns out he wanted the same wallpaper as my computer uses. Whew. We found it, I figured out that xli(1) loads it in the root window, and so I added a third line to .xsession. More delight unlocked!

Jacob mastered the basics of xmonad really quickly. Alt-Shift-C to close a window. Alt-Shift-Q to quit back to the “big black screen”. Alt-Shift-Enter to get a terminal window.

We launched thunar (the XFCE file manager) and plugged in his camera. He had a good deal of fun looking at photos and videos from it. But then I dropped the true highlight of the day for him: I offered to install Tuxpaint for him. That’s probably his favorite program of all time.

He watched impatiently as apt-get counted down 1m30s for tuxpaint and its libraries. Then we launched it, and he wanted to skip supper so he could keep playing Tuxpaint on “my VERY OWN COMPUTER!”

I’d been debating how to introduce GUIs for a very long time. It has not escaped my attention that children that used Commodores or TRS-80s or DOS knew a lot more about how their computers worked, on average, than those of the same age that use Windows or MacOS. I didn’t want our boys to skip an entire phase of learning how their technology works. I am pleased with this solution; they still run commands to launch things, yet get to play with more than text-based programs.

At bedtime, Jacob asked me, very seriously:

“Dad, how do I start tuxpaint again?”

“First you log in and type startx. Then you can use the mouse.”

Jacob nods, a contemplative look on his face..

“Then,” I continue, “you type tuxpaint in the terminal, and it comes right up.”

Jacob nodded very seriously a second time, as if committing this very important information to long-term memory. Then gave a single excited clap, yelled “Great!”, and dashed off.

Windows & a dying hard disk: Solving with Linux

Today, my workstation sent me this email:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

and then a little later, this one:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Offline uncorrectable sectors

From the hard disk’s SMART data, this is a clue that the drive is failing or will soon. Sigh. Incidentally, if smartmontools isn’t installed on your machine, whether it’s a laptop, desktop, or server, it should be.

Although most of you know I run Linux on the metal on my machines almost exclusively, I do maintain a small drive with a Windows installation that I boot into every few months for various reasons. This is that drive.

The drive is non-redundant (no RAID), and although it is backed up, the backup is made via backuppc from the NTFS filesystem mounted on Linux, and is a partial backup – backing up certain data, not the OS. There are, of course, bare metal Windows backup solutions, but I generally don’t want to back up Windows from within Windows on this machine. Restoring Windows isn’t quite as simple as an mkfs, an untar, and a grub-install, either.

So my first thought is: immediately save whatever of the drive I can. So I ran apt-get install gddrescue to install the GNU ddrescue tool. ddrescue is somewhat similar to dd, but deals much more intelligently with bad blocks on the drive. It will try to read them repeatedly, with decreasing block sizes, in an effort to get every last good byte off the disk that it can. If it ultimately fails to get certain bytes read, it will write placeholder data to the output file in place of the missing data, so that the output file maintains proper size and alignment. It also saves a log file that notes what it found (see info ddrescue for more on that.)

So I created an LVM volume for the purpose (not enough free space on /home, and didn’t want to have to shrink it somehow later), and ran:

ddrescue /dev/sda /mnt/sdasave.ddrescue /mnt/sdasave.logfile

Then I went to dinner.

When I got back, I discovered there were 1 or 2 bad sectors, about halfway through the disk, but everything else was fine. So now, the question became: did I lose any data? If so, what? I needed to know if I had to revert to a backup for anything or not.

To answer THAT question, first I had to figure out the offset of the bad spots on the disk. That’s not too hard; the logfile gives it to me:

# Rescue Logfile. Created by GNU ddrescue version 1.15
# Command line: ddrescue /dev/sda /mnt/sdasave.ddrescue sdasave.logfile
# current_pos  current_status
0x3BBB8BFC00     +
#      pos        size  status
0x00000000  0x3BBB8BF000  +
0x3BBB8BF000  0x00001000  -
0x3BBB8C0000  0x38B5346000  +

what we see is that the bad sector starts at byte 0x3BBB8BF000 (256549580800 decimal) and extends for 0x1000 bytes (4096 decimal). Both the drive and NTFS use 512-byte sectors. So dividing by 512, we get sector 501073400 – 501073407 (4096 bytes is 8 sectors).

As a check, I ran grep sector /var/log/kern.log and turned up a bunch of lines like this:

Jun 14 21:39:11 hephaestus kernel: [35346.929957] end_request: I/O error, dev sda, sector 501073404

Which is within my calculated range.

But this is an absolute sector on the disk. We need the sector within the partition, so for that, we have to enlist fdisk to make that calculation.

fdisk shows, among other things:

Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   976771071   488384512    7  HPFS/NTFS/exFAT

So the Windows partition starts at disk sector 2048.

Let’s just confirm that. If I use dd if=/dev/sda1 bs=512 count=1 | hd | head, I see a line beginning with “.R.NTFS”. Exactly the same as with dd if=/dev/sda bs=512 count=1 skip=2048 | hd | head, so I read the partition table information correctly.

Subtract offset of 2048 from the earlier values, and I get relative sectors 501071352-501071359.

That’s enough to get some solid info from the filesystem via ntfscluster, part of Debian’s ntfs-3g package. I pass -s to it, and ignoring some irrelevant stuff, get my answer:

ntfscluster -s 501071352-501071359 /dev/sda1
Inode 190604 /System Volume Information/{b4816feb-b609-11e1-a908-50e549b934f7}{3808876b-c176-4e48-b7ae-04046e6cc752}/$DATA

I even reran it with a much larger sector range, just to be absolutely sure I had wiggle room in case calculations had an off-by-one error or something somewhere.

This is really great news, because the file in question is pretty much useless – I believe it’s a system restore point, which I won’t be needing anyhow.

So at this point, all that remains is to reinstall this on a different drive. For that, I could just use my ddrescue image. I thought I would take a second image, just to be very extra careful, and use that; I used:

partclone.ntfs --rescue -c -s /dev/sda1 -o sda1.partclone

although ntfsclone would work just as well. This captures only the partition; I’ll need the partition table as well, and perhaps also the space between the partition table and the first partition. I could capture it separately with dd, but it’s already in the ddrescue image, so there’s no need. (GRUB is installed on this drive, but there is no Linux filesystem on it, so it may well exceed the size of the MBR).

Note that for Linux ext[234] filesystems, debugfs can provide the same (and more) info as I got from ntfscluster.

I happen to have a drive of the right size sitting here, which I was about to install in a different machine. So a wipe and a swap and a restore later, and I should be good to go.

This scenario is commonplace enough that I thought I’d post how I dealt with it, in case anyone else ever has hard drive issues.

How to debugging Linux failure to resume from suspend?

I’m running a computer with a Gigabyte Z68A-D3H-B3 motherboard, and have never been able to get it to properly resume from suspend to RAM in Linux. It has worked fine on the rare occasion I’ve tried it in Windows 7.

My somewhat limited usual for debugging aren’t particularly helpful. The system appears to suspend perfectly fine. It just doesn’t resume. To be more precise, when I push the button to resume, the power comes up (fans whir, HDD spins up, etc.) but nothing happens. The USB keyboard and mouse don’t respond, Caps Lock doesn’t toggle any LEDs, it doesn’t respond on the wired LAN, and the display stays off.

Although it’s a desktop, I’d really like to save power on this thing by suspending it when it’s not in use. There’s no sense in wasting power I don’t need to be consuming.

I’ve tried what I used to try on laptops. I tried running in single-user mode, without X, or even the kernel modules for video acceleration loaded. I tried unloading whatever hardware modules I thought I could without completely destabilizing the system. I updated the BIOS to the latest release. I tried various combinations of video tweaks. I tried using s2ram from uswsusp instead of pm-suspend. Nothing made any difference. They all behaved exactly the same.

Googling showed a lot of resources for people that had trouble getting their machines to go to sleep. And also for people whose machines would wake up but just wouldn’t re-activate the display. But precious little for people with my particular symptoms.

What’s a good place to start looking to fix something like this?

Some details…

CPU is Core i5-2400. Kernel is wheezy’s 3.2.0-2-amd64, though this problem has persisted as long as I’ve had this machine, which was running squeeze at install time. Video is NVidia GeForce GTX 560 (GF114). Hard drives are SATA, Ethernet is integrated RTL8111/8168B. Userland is up-to-date amd64 wheezy.