All posts by John Goerzen

Migrated from Hetzner to OVH hosting

Since August 2011, my sites such as complete.org have been running on a Xen-backed virtual private server (VPS) at Hetzner Online, based in Germany. I had what they called their VQ19 package, which included 2GB RAM, 80GB HDD, 100Mb NIC and 4TB transfer.

Unlike many other VPS hosts, I never had performance problems. However, I did sometimes have hardware problems with the host, and it could take hours to resolve. Their tech support only works business hours German time, which was also a problem.

Meanwhile, OVH, a large European hosting company, recently opened a datacenter in Canada. Although they no longer offer their value-line Kimsufi dedicated servers there — starting at $11.50/mo — they do offer their midrange SoYouStart servers there. $50/mo gets a person a 4-core 3.2GHz Xeon server with 32GB RAM, 2x2TB SATA HDD, 200Mbps bandwidth. Not bad at all! The Kimsufi options are still good for lower-end needs as well.

I signed up for one of the SoYouStart servers. I’ve been pleased with my choice to migrate, and at the possibilities that having hardware like that at my disposal open up, but it is not without its downside.

The primary downside is lack of any kind of KVM console. If the server doesn’t boot, I can’t see the Grub error message (or whatever) behind it. They do provide hardware support and automatic technician dispatching when the server isn’t pingable, but… they state they have no KVM access at all. They support many OS flavors, and have a premade image for them, but there is no using a custom ISO to install; if you want ZFS on Linux, for instance, you can’t very easily build it into root.

My server was promised within 72 hours, but delivered much quicker: within about 1. I had two times they said they had to replace a motherboard within the first day; once they did it in 30 minutes, and the other took them 2.5 hours for some reason. They do have phone support, which answers almost immediately, but the people there are not the people actually in the datacenter. It was frustrating with a server down for hours and nobody really commenting on what was going on.

The server performs quite well, and after the initial issues, I’ve been happy.

I was initially planning an all-ZFS installation. SoYouStart does offer a rescue environment, but it doesn’t support ZFS, so I figured I better stick with an ext4 root at least. The default Debian install uses RAID1 on md-raid, with a 20GB root partition and the rest of the 2TB drive in /home, and then a swap partition on each drive (mysteriously NOT in the RAID!) So I broke the mirror on /home and converted those into the two legs of a mirrored vdev for a zpool.

I run all of the real work inside KVM VMs, so that should minimize the number of times I have to do anything to the root filesystem that could cause trouble.

SoYouStart includes 100GB of space on a separate FTP server for backup purposes. I have scripts that upload nightly tarballs of the root filesystem, plus full “zfs send” streams of everything else. Every hour, it uploads an incremental “zfs send” stream as well. This all works quite nicely; even if the machine is a complete loss, I’d never lose more than an hour’s work, and could restore it completely from a rescue environment. Very nice!

I’ll write more in a few days about the ZFS setup I’m using, and some KVM discoveries as well.

VirtFS isn’t quite ready

Despite claims to the contrary [PDF], VirtFS — the 9P-based virtio KVM/QEMU layer designed to pass through a host’s filesystem to the guest — is quite slow. I have yet to get it to perform at even 1/10 the speed of the virtual block device (VBD). That’s unfortunate, because in theory it should be significantly faster. At this rate, I suspect even NFS will be significantly faster.

Beyond that, it seems impossible to use VirtFS as the root filesystem in a VM, at least with Debian; initramfs-tools doesn’t know how to build an initrd in that situation, and the support is just not there.

It would make a great combination with btrfs or zfs, but unfortunately looks to be just not ready yet.

How to fix “fstrim: Operation not supported” under KVM?

Maybe someone out there will have some ideas.

I have a KVM host running wheezy, with wheezy-backports versions of libvirt and qemu. I have defined a guest, properly set discard=unmap in the domain XML file for it, verified that’s being passed to the guest, but TRIM/DISCARD is just not working.

Mounting the ext4 filesystem with discard has no effect, and fstrim / always reports:

fstrim: /: FITRIM ioctl failed: Operation not supported

Every single time.

I’ve tried with the virtio, IDE, and SCSI (both default and virtio-scsi) backend drivers. The guest is also running wheezy (i386 version; the host is amd64) and I’ve tried the latest 3.12 backported kernel for it. No dice.

If I shut down the VM and mount the filesystem on the host, fstrim works fine.

Everything says this should work. But it doesn’t.

Any ideas?

Why and how to run ZFS on Linux

I’m writing a bit about ZFS these days, and I thought I’d write a bit about why I am using it, why it might or might not be interesting for you, and what you might do about it.

ZFS Features and Background

ZFS is not just a filesystem in the traditional sense, though you can use it that way. It is an integrated storage stack, which can completely replace the need for LVM, md-raid, and even hardware RAID controllers. This permits quite a bit of flexibility and optimization not present when building a stack involving those components. For instance, if a drive in a RAID fails, it needs only rebuild the parts that have actual data stored on them.

Let’s look at some of the features of ZFS:

  • Full checksumming of all data and metadata, providing protection against silent data corruption. The only other Linux filesystem to offer this is btrfs.
  • ZFS is a transactional filesystem that ensures consistent data and metadata.
  • ZFS is copy-on-write, with snapshots that are cheap to create and impose virtually undetectable performance hits. Compare to LVM snapshots, which make writes notoriously slow and require an fsck and mount to get to a readable point.
  • ZFS supports easy rollback to previous snapshots.
  • ZFS send/receive can perform incremental backups much faster than rsync, particularly on systems with many unmodified files. Since it works from snapshots, it guarantees a consistent point-in-time image as well.
  • Snapshots can be turned into writeable “clones”, which simply use copy-on-write semantics. It’s like a cp -r that completes almost instantly and takes no space until you change it.
  • The datasets (“filesystems” or “logical volumes” in LVM terms) in a zpool (“volume group”, to use LVM terms) can shrink or grow dynamically. They can have individual maximum and minimum sizes set, but unlike LVM, where if, say, /usr gets bigger than you thought, you have to manually allocate more space to it, ZFS datasets can use any space available in the pool.
  • ZFS is designed to run well in big iron, and scales to massive amounts of storage. It supports SSDs as L2 cache and ZIL (intent log) devices.
  • ZFS has some built-in compression methods that are quite CPU-efficient and can yield not just space but performance benefits in almost all cases involving compressible data.
  • ZFS pools can host zvols, a block device under /dev that stores its data in the zpool. zvols support TRIM/DISCARD, so are ideal for storing VM images, as they can instantly release space released by the guest OS. They can also be snapshotted and backed up like the rest of ZFS.

Although it is often considered a server filesystem, ZFS has been used in plenty of other situations for some time now, with ports to FreeBSD, Linux, and MacOS. I find it particularly useful:

  • To have faith that my photos, backups, and paperwork archives are intact. zpool scrub at any time will read the entire dataset and verify the integrity of every bit.
  • I can create snapshots of my system before running apt-get dist-upgrade, making it easy to track down issues or roll back to a known-good configuration. Ideal for people tracking sid or testing. One can also easily simply boot from a previous snapshot.
  • Many scripts exist that make frequent snapshots, and retain the for a period of time as a way of protecting work in progress against an accidental rm. There is no reason not to snapshot /home every 5 minutes, for instance. It’s almost as good as storing / in git.

The added level of security in having cheap snapshots available is almost worth it by itself.

ZFS drawbacks

Compared to other Linux filesystems, there are a few drawbacks of ZFS:

  • CDDL will prevent it from ever being part of the Linus kernel tree
  • It is more RAM-hungry than most, although with tuning it can even run on the Raspberry Pi.
  • A 64-bit kernel is strongly preferred, even in low-memory situations.
  • Performance on many small files may be less than ext4
  • The ZFS cache does not shrink and expand in response to changing RAM usage conditions on the system as well as the normal Linux cache does.
  • Compared to btrfs, ZFS lacks some features of btrfs, such as being able to shrink an existing pool or easily change storage allocation on the fly. On the other hand, the features in ZFS have never caused me a kernel panic, and half the things I liked about btrfs seem to have.
  • ZFS is already quite stable on Linux. However, the GRUB, init, and initramfs code supporting booting from a ZFS root and /boot is less stable. If you want to go 100% ZFS, be prepared to tweak your system to get it to boot properly. Once done, however, it is quite stable.

Converting to ZFS

I have written up an extensive HOWTO on converting an existing system to use ZFS. It covers workarounds for all the boot-time bugs I have encountered as well as documenting all steps needed to make it happen. It works quite well.

Additional Hints

If setting up zvols to be used by VirtualBox or some such system, you might be interested in managing zvol ownership and permissions with udev.

Debian-Live Rescue image with ZFS On Linux; Ditched btrfs

I’m a geek. I enjoy playing with different filesystems, version control systems, and, well, for that matter, radios.

I have lately started to worry about the risks of silent data corruption, and as such, looked to switch my personal systems to either ZFS or btrfs, both of which offer built-in checksumming of all data and metadata. I initially opted for btrfs, because of its tighter integration into the Linux kernel and ability to shrink an existing btrfs filesystem.

However, as I wrote last month, that experiment was not a success. I had too many serious performance regressions and one too many kernel panics and decided it wasn’t worth it. And that the SuSE people got it wrong, deeply wrong, when they declared btrfs ready for production. I never lost any data, to its credit. But it simply reduces uptime too much.

That left ZFS. Before I build a system, I always want to make sure I can repair it. So I started with the Debian Live rescue image, and added the zfsonlinux.org repository to it, along with some key packages to enable the ZFS kernel modules, GRUB support, and initramfs support. The resulting image is described, and can be downloaded from, my ZFS Rescue Disc wiki page, which also has a link to my source tree on github.

In future blog posts in the series, I will describe the process of converting existing Debian installations to use ZFS, of getting them to boot from ZFS, some bugs I encountered along the way, and some surprising performance regressions in ZFS compared to ext4 and btrfs.

Married!

One week before the wedding, to Laura: “Mono won’t just clear up right away.”

One week before the wedding, to me: “That’s going to need stitches.”

Yes, not long before the wedding, Laura had come down with mononucleosis and I had cut into my finger with a very sharp knife while cutting bread requiring a trip to the emergency room to get stitches. Two days before we got married, instead of moving furniture, I was getting stitches out of my finger.

It wasn’t the kind of week we had planned.

But it was the happiest, most amazing occasion I could have ever imagined.

As I wrote last month, I am richly blessed indeed.

Our wedding was three days after Christmas. The church was still decorated for Christmas, with the tree in on corner, glittering stars suspended in mid-air on cables from the walls, wreaths and candles in the windows, and it was a joy-filled day.

Before the ceremony, we took pictures — the only part of the day Jacob and Oliver weren’t thrilled with. Nevertheless, we got some fun ones.

Laura and I seem to know quite a few pastors between us – and not just because Laura is a pastor. My brother officiated with the wedding vows, his wife with scripture and a prayer of blessing, and the church’s pastor gave the message.

Laura and I wrote in our wedding program, “Music has long been a thread running through both our lives. We have enjoyed singing together, playing piano and pennywhistle duets, attending concerts, and even exploring old hymnals. Music is also one of the best ways to have a conversation – even a conversation with God.” We wrote a page in the program about each of the hymns that were a part of the wedding, and why we picked them. The combined church choirs of my home church and Laura’s church sang John Rutter’s beautiful arrangement of For the Beauty of the Earth (click here to listen to a different choir). Hearing “For the beauty of each hour”, “For the joy of human love”, and “Lord of all, to thee we raise this our joyful hymn of praise” was perfect for the day.

It was with such great happiness that we walked out of the sanctuary, a married couple, to the sound of the congregation singing Joy to the World!

Jacob and Oliver were so very excited on our wedding day. They happily explored the church while waiting for things to happen. We had them help us light our unity candle, and they were pleased with that. Jacob loved his suit, which made him look just like me. And they were, of course, delighted with the cake and in the middle of it all.

For our honeymoon, we managed to get two weeks of vacation, and spent about half of it at home. We had looked at various options for retreats in the country, but eventually concluded that our house is a retreat in the country, so might as well enjoy it at home.

We also went to the Palo Duro Canyon area near Amarillo in the Texas panhandle, staying in a small B&B in Canyon, TX. Palo Duro is the second-largest canyon in North America, and quite colorful year-round. What a beautiful place to go for our honeymoon! By the time we got there, Laura was getting past mono, and we went for hikes in the canyon on two different days — hiking a total of 10 miles, including a hike up the side of the canyon.

After we got back home, on the last weekday of our honeymoon, we went back to the Flint Hills of Kansas, to some of the same places we had spent our third date. We climbed the windy staircase at the Chase County Courthouse, the oldest courthouse still in use in Kansas.

And peered out its famous oval window.

We found the last remnant of the old ghost town of Elk, ate at the same restaurant we had that day. It brought back wonderful memories, and it was a good day in itself. Because even though it was a gold, drizzly, overcast day in January, this time, we were married.

And this year, Thanksgiving is all year.

Richly Blessed

“It’s wedding week! Wedding week! Wedding week! Wedding week! Oh, also Christmas. Oh dad, it’s wedding week! I can’t believe it! It’s finally here! Wedding week!” – Jacob, age 7, Sunday

“Oh dad, this is the best Christmas EVER!” – Jacob, Wednesday

“Dad, is the wedding TODAY?” – Oliver, age 4, every morning this week

This has certainly been a Christmas like no other. I have never known something to upstage Christmas for Jacob, but apparently a wedding can!

Laura and I got to celebrate our first Christmas together this year — together, of course, with the boys. We enjoyed a wonderful day in the middle of a busy week, filled with play, family togetherness, warmth, and happiness. At one point, while I was helping the boys with their new model train components, Laura was enjoying playing Christmas tunes on the piano. Every time she’d reach the end, Jacob paused, and said, “That was awesome!”, beating me to it.

That’s a few days before Christmas — Jacob and Oliver demanding snow ice cream, and of course who am I to refuse?

Cousins opening presents

After his school Christmas program, Jacob has enjoyed singing. Here he is after the Christmas Eve program, where he excitedly ran up into the choir loft, picked up a hymnal, and pretended to sing.

And, of course, opening of presents at home.

Sometimes I think about how I didn’t know life could get this good. Soon Laura and I will be married, and it will be even better. Truly we have been richly blessed.

Delicious Holiday Recipes

I’ve come up with some new favorites this season. The boys and Laura were around for all three, and I am happy to report there were many kitchen smiles over these!

From-Scratch Hot Chocolate

There’s something about hot chocolate made from scratch, with chocolate melted into milk, instead of a powder stirred in. It takes quite a bit more time, and probably has more calories, but it is quite delicious.

The key to a delicious result where milk is concerned is to take things slow and keep stirring. You don’t want the chocolate to scorch at the bottom of the pan. Heating up the milk before the chocolate should help things mix in more easily as well.

  • Begin with 3 cups milk and 1 cup heavy whipping cream. Heat slowly over moderate to low heat, stirring periodically. Once you see bubbles start to form around the edges, it is plenty hot (or even a bit more hot than it needs to be).
  • Add one cup of semisweet chocolate chips, 1 teaspoon sugar, and 1/2 tsp vanilla extract.
  • Stir constantly until all the chocolate is melted and well mixed. There will still be some small bits of chocolate within, but if it is all done slowly like this, the chocolate should be pretty well melted.

The basis for this recipe was here, and it called for 2 cups milk and 2 cups half-and-half. I trust my heavy whipping cream was fine! <grin> There are also some other variations on that site.

This nearly made my little cast iron kettle overflow, so next time I made a 3/4 recipe.

Hot Spiced Cider

We put up a Christmas tree yesterday, so I thought hot spiced cider would be perfect for the occasion. I went searching for recipes, and many of them called for cloves (which have to be sifted out later or put in a spice bag). I wasn’t going to have time to delay two boys from setting up a Christmas tree long enough for that, so I found this basic recipe to work well. However, I, as usual, made some modifications ;-)

  • Warm 4 cups apple cider (not juice, as the recipe suggests) in a pot.
  • Add 1/2 tsp cinnamon
  • Add 1/4 tsp nutmeg or allspice (I used allspice because I was mysteriously out of nutmeg, but will probably use nutmeg next time)
  • Add 1 tbsp brown sugar
  • Stir constantly until sufficiently dissolved. Pour immediately before drinking, as the contents will tend to separate.

Mmmmm…. yum….

Turkey or Chicken Noodle Soup

The annual “what to do with all that leftover turkey” quest strikes again. I like chicken noodle soup, so why not a turkey noodle soup done the same way?

Here’s what I used, roughly, in my large 6-quart cast iron cooking pot (aka “Dutch oven”):

  • 9 cups chicken/turkey broth. Your own if you have it, or the canned variety works. Or make your own with boullion if you have it.
  • 2 chopped yellow onions. (I added half a chopped red onion as well because I had it sitting around. Nobody complained, but 2.5 onions was a little much.)
  • 4 tsp fresh basil or 1 tsp dried basil
  • 4 tsp fresh oregano or 1 tsp dried oregano
  • 1/2 tsp pepper
  • 1/2 tsp salt
  • 1 tsp beef bouillon
  • 2 bay leaves
  • 20 oz frozen mixed vegetables (I’d probably add more than that next year; this wasn’t quite enough)
  • Plenty of wide egg noodles. The recipe I used called for 1 cup, which was laughably inadequate. I just dumped until it looked right, and then the package was almost empty so I dumped the rest in too.
  • 4 cups cooked turkey or chicken, cubed (a kitchen scissors makes quick work of that)
  • Two 14.5-oz cans diced tomatoes (do not drain)

Start with the broth, onion, basil, oregano, pepper, and bay leaf. Heat up the mixture and add the vegetables. Bring it to boiling, then add the uncooked noodles. Return to boiling, then reduce heat, cover, and simmer for 8 minutes. Add the turkey or chicken and diced tomatoes, and simmer until hot enough to serve.

The nice thing about soups is that they freeze well and make great winter leftovers. This recipe makes quite a lot of soup; you may wish to halve it.

This recipe was adapted from one in a Better Homes & Gardens cookbook.

Results with btrfs and zfs

The recent news that openSUSE considers btrfs safe for users prompted me to consider using it. And indeed I did. I was already familiar with zfs, so considered this a good opportunity to experiment with btrfs.

btrfs makes an intriguing filesystem for all sorts of workloads. The benefits of btrfs and zfs are well-documented elsewhere. There are a number of features btrfs has that zfs lacks. For instance:

  • The ability to shrink a device that’s a member of a filesystem/pool
  • The ability to remove a device from a filesystem/pool entirely, assuming enough free space exists elsewhere for its data to be moved over.
  • Asynchronous deduplication that imposes neither a synchronous performance hit nor a heavy RAM burden
  • Copy-on-write copies down to the individual file level with cp --reflink
  • Live conversion of data between different profiles (single, dup, RAID0, RAID1, etc)
  • Live conversion between on-the-fly compression methods, including none at all
  • Numerous SSD optimizations, including alignment and both synchronous and asynchronous TRIM options
  • Proper integration with the VM subsystem
  • Proper support across the many Linux architectures, including 32-bit ones (zfs is currently only flagged stable on amd64)
  • Does not require excessive amounts of RAM

The feature set of ZFS that btrfs lacks is well-documented elsewhere, but there are a few odd btrfs missteps:

  • There is no way to see how much space subvolume/filesystem is using without turning on quotas. Even then, it is cumbersome and not reported with df like it should be.
  • When a maxmium size for a subvolume is set via a quota, it is not reported via df; applications have no idea when they are about to hit the maximum size of a filesystem.

btrfs would be fine if it worked reliably. I should say at the outset that I have never lost any data due to it, but it has caused enough kernel panics that I’ve lost count. I several times had a file that produced a panic when I tried to delete it, several times when it took more than 12 hours to unmount a btrfs filesystem, behaviors where hardlink-heavy workloads take days longer to complete than on zfs or ext4, and that’s just the ones I wrote about. I tried to use btrfs balance to change the metadata allocation on the filesystem, and never did get it to complete; it seemed to go into an endless I/O pattern after the first 1GB of metadata and never got past that. I didn’t bother trying the live migration of data from one disk to another on this filesystem.

I wanted btrfs to work. I really, really did. But I just can’t see it working. I tried it on my laptop, but had to turn of CoW on my virtual machine’s disk because of the rm bug. I tried it on my backup devices, but it was unusable there due to being so slow. (Also, the hardlink behavior is broken by default and requires btrfstune -r. Yipe.)

At this point, I don’t think it is really all that worth bothering with. I think the SuSE decision is misguided and ill-informed. btrfs will be an awesome filesystem. I am quite sure it will, and will in time probably displace zfs as the most advanced filesystem out there. But that time is not yet here.

In the meantime, I’m going to build a Debian Live Rescue CD with zfsonlinux on it. Because I don’t ever set up a system I can’t repair.

Why are we still backing up to hardlink farms?

A person can find all sorts of implementations of backups using hardlink trees to save space for incrementals. Some of them are fairly rudimentary, using rsync --link-dest. Others, like BackupPC, are more sophisticated, doing file-level dedup to a storage pool indexed by a hash.

While these are fairly space-efficient, they are really inefficient in other ways, because they create tons of directory entries. It would not be surprising to find millions of directory entries consumed very quickly. And while any given backup set can be deleted without impact on the others, the act of doing so can be very time-intensive, since often a full directory tree is populated with every day’s backup.

Much better is possible on modern filesystems. ZFS has been around for quite awhile now, and is stable on Solaris, FreeBSD and derivatives, and Linux. btrfs is also being used for real workloads and is considered stable on Linux.

Both have cheap copy-on-write snapshot operations that would work well with a simple rsync --inplace to achieve the same effect ad hardlink farms, but without all the performance penalties. When creating and destroying snapshots is a virtually instantaneous operation, and the snapshots work at a file block level instead of an entire file level, and preserve changing permissions and such as well (which rsync --link-dest can have issues with), why are we not using it more?

BackupPC has a very nice scheduler, a helpful web interface, and a backend that doesn’t have a mode to take advantage of these more modern filesystems. The only tool I see like this is dirvish, which someone made patches for btrfs snapshots three years ago that never, as far as I can tell, got integrated.

A lot of folks are rolling a homegrown solution involving rsync and snapshots. Some are using zfs send / btrfs send, but those mechanisms require the same kind of FS on the machine being backed up as on the destination, and do not permit excluding files from the backup set.

Is this an area that needs work, or am I overlooking something?

Incidentally, hats off to liw’s obnam. It doesn’t exactly do this, but sort of implements its own filesystem with CoW semantics.