Unreported Disk Data Corruption – Kernel Bug?

Well this is new, and I’m utterly baffled. Here’s a file that’s not in use by anything.


$ md5sum xppro.vdi
589cbb5501dcddda047344a3550aaa95 xppro.vdi
$ md5sum xppro.vdi
a69806ec60d39e06473edbb0abd71637 xppro.vdi

Every time I run md5sum on it, I get a different answer. Same story with sha256sum. If I grab just the first 100MB, it gives the same answer each time. dmesg doesn’t show any sort of errors whatsoever during the time I’m running the tools. The file is 13GB, and was copied from one laptop to another (the new one being a Thinkpad T420s). The old laptop gives the same answer every time. The new one doesn’t.

I’ve put the file on different ext4 filesystems on the same machine (one using LUKS encryption, the other not, both under LVM) – same result. This will have also guaranteed different placement on the underlying hard disk.

I verified that nothing is modifying the file by using lsof and inotify. The system is a freshly-installed Debian wheezy running kernel 3.2.0-1-amd64. Any ideas how I go about troubleshooting/fixing this? So far I don’t know if it’s hardware or software, though my gut says software; SMART isn’t showing issues here, and the kernel didn’t log hardware issues, either.

9 thoughts on “Unreported Disk Data Corruption – Kernel Bug?

  1. Here’s an update. All of a sudden the problem went away. There are three things I did right around then, having given up for the day. One, I power-cycled the 802.11n router/switch/access point that’s about 2ft from the laptop in question. Two, I moved a different laptop, and three, I unplugged the Ethernet port. Oh, also I unloaded the VirtualBox kernel modules.

    Putting all those things back where they were doesn’t cause the problem to recur. I can get the same md5sum on every run, and copying the file back produces the correct result.

    However, I still have a corrupted file on disk. I can md5sum it and get the same result each time, but what’s stored isn’t correct.

    Scary.

  2. Sounds like a hardware problem to me. You could try something like booting from a live usb and doing a md5sum of the full disk (while unmounted, ie. just the raw disk itself). Also check the usual culprits: ram (memtest) is probably the most likely. Try to figure out what the actual corruption is… Maybe with something as simple as ‘cp bigfile bigfile2; cmp -l bigfile bigfile2’.

  3. I had similar problems on a 6 month old machine.

    Run md5sum 50 times on a file – get the same answer
    Run md5sum 50 tinmes again – get 3 or 4 different answers
    Tried running from tmpfs – same problem
    Tried different fs, differrent disk – still happened
    memtest didn’t find anyhting.
    Turned out to be dodgy RAM.
    Changed memory for new RAM, OK now.

  4. Thanks for the tips. Over on the G+ discussion, I got a similar one. https://plus.google.com/107171595803164194992/posts/TnxZM4agwuS I ran memtest86+ and got errors almost immediately. Yeow.

    Now I’m debating whether to nuke my Debian install and restart from scratch. That’s a lot of time down the drain but might be worth it. I have no idea what could have been corrupted. debsums could verify my installed files, of course. I don’t have much in /home that would go unnoticed, or so I hope. and maybe some fsck’s would catch any fs issues. What do you think?

  5. this happens to me too in debian/ubuntu…but same file works fine on Windows Vista on the same hdd…..I wrote to WD about this and they changed my hdd but the problem still exists…..this problem usually happens while copying….may be it is because of RAM problem but RAM successfully passed the test @ boot time…sometime I download files then it shows corrupted then I have to download it again and then it successfully extract that file….& iso file they always get corrupted in debian….I have to use visit to download & to write them to cd/dvd.

    I posted this at debian forum a long ago: http://forums.debian.net/viewtopic.php?f=30&t=63313

    1. I wouldn’t count on the boot-time memory test catching issues. It is not particularly thorough. Try apt-get install memtest86+, then reboot and select memory test from the GRUB menu, and let it run. See what it finds.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.