Monthly Archives: February 2009

Dad Have A Light

We got a new camcorder recently. I was trying it out after it was unboxed, of course. Jacob came over and got a glimpse of the LCD screen on it, which he called a light.

“Dad have a light. See it.”

So of course I rotated it around so he could see it. Then I couldn’t, so I was shooting blind. But he enjoyed it.

Here’s the video:

If the embedded video doesn’t work in your RSS reader, try this link.

That’s the first video I’ve ever uploaded to a public video sharing site. Finally catching up with the Internet, I guess.

How To Think About Compression, Part 2

Yesterday, I posted part 1 of how to think about compression. If you haven’t read it already, take a look now, so this post makes sense.

Introduction

In the part 1 test, I compressed a 6GB tar file with various tools. This is a good test if you are writing an entire tar file to disk, or if you are writing to tape.

For part 2, I will be compressing each individual file contained in that tarball individually. This is a good test if you back up to hard disk and want quick access to your files. Quite a few tools take this approach — rdiff-backup, rdup, and backuppc are among them.

We can expect performance to be worse both in terms of size and speed for this test. The compressor tool will be executed once per file, instead of once for the entire group of files. This will magnify any startup costs in the tool. It will also reduce compression ratios, because the tools won’t have as large a data set to draw on to look for redundancy.

To add to that, we have the block size of the filesystem — 4K on most Linux systems. Any file’s actual disk consumption is always rounded up to the next multiple of 4K. So a 5-byte file takes up the same amount of space as a 3000-byte file. (This behavior is not unique to Linux.) If a compressor can’t shrink enough space out of a file to cross at least one 4K barrier, it effectively doesn’t save any disk space. On the other hand, in certain situations, saving one byte of data could free 4K of disk space.

So, for the results below, I use du to calculate disk usage, which reflects the actual amount of space consumed by files on disk.

The Tools

Based on comments in part 1, I added tests for lzop and xz to this iteration. I attempted to test pbzip2, but it would have taken 3 days to complete, so it is not included here — more on that issue below.

The Numbers

Let’s start with the table, using the same metrics as with part 1:

Tool MB saved Space vs. gzip Time vs. gzip Cost
gzip 3081 100.00% 100.00% 0.41
gzip -1 2908 104.84% 82.34% 0.36
gzip -9 3091 99.72% 141.60% 0.58
bzip2 3173 97.44% 201.87% 0.81
bzip2 -1 3126 98.75% 182.22% 0.74
lzma -1 3280 94.44% 163.31% 0.63
lzma -2 3320 93.33% 217.94% 0.83
xz -1 3270 94.73% 176.52% 0.68
xz -2 3309 93.63% 200.05% 0.76
lzop -1 2508 116.01% 77.49% 0.39
lzop -2 2498 116.30% 76.59% 0.39

As before, in the “MB saved” column, higher numbers are better; in all other columns, lower numbers are better. I’m using clock seconds here on a dual-core machine. The cost column is clock seconds per MB saved.

Let’s draw some initial conclusions:

  • lzma -1 continues to be both faster and smaller than bzip2. lzma -2 is still smaller than bzip2, but unlike the test in part 1, is now a bit slower.
  • As you’ll see below, lzop ran as fast as cat. Strangely, lzop -3 produced larger output than lzop -1.
  • gzip -9 is probably not worth it — it saved less than 1% more space and took 42% longer.
  • xz -1 is not as good as lzma -1 in either way, though xz -2 is faster than lzma -2, at the cost of some storage space.
  • Among the tools also considered for part 1, the difference in space and time were both smaller. Across all tools, the difference in time is still far more significant than the difference in space.

The Pretty Charts

Now, let’s look at an illustration of this. As before, the sweet spot is the lower left, and the worst spot is the upper right. First, let’s look at the compression tools themselves:

compress2-zoomed

At the extremely fast, but not as good compression, end is lzop. gzip is still the balanced performer, bzip2 still looks really bad, and lzma -1 is still the best high-compression performer.

Now, let’s throw cat into the mix:

compress2-big

Here’s something notable, that this graph makes crystal clear: lzop was just as fast as cat. In other words, it is likely that lzop was faster than the disk, and using lzop compression would be essentially free in terms of time consumed.

And finally, look at the cost:

compress2-efficiency

What happened to pbzip2?

I tried the parallel bzip2 implementation just like last time, but it ran extremely slow. Interestingly, pbzip2 < notes.txt > notes.txt.bz2 took 1.002 wall seconds, but pbzip2 notes.txt finished almost instantaneously. This 1-second startup time for pbzip2 was a killer, and the test would have taken more than 3 days to complete. I killed it early and omitted it from my results. Hopefully this bug can be fixed. I didn’t expect pbzip2 to help much in this test, and perhaps even to see a slight degradation, but not like THAT.

Conclusions

As before, the difference in time was far more significant than the difference in space. By compressing files individually, we lost about 400MB (about 7%) space compared to making a tar file and then combining that. My test set contained 270,101 files.

gzip continues to be a strong all-purpose contender, posting fast compression time and respectable compression ratios. lzop is a very interesting tool, running as fast as cat and yet turning in reasonable compression — though 25% worse than gzip on its default settings. gzip -1 was almost as fast, though, and compressed better. If gzip weren’t fast enough with -6, I’d be likely to try gzip -1 before using lzop, since the gzip format is far more widely supported, and that’s important to me for backups.

These results still look troubling for bzip2. lzma -1 continued to turn in far better times and compression ratios that bzip2. Even bzip2 -1 couldn’t match the speed of lzma -1, and compressed barely better than gzip. I think bzip2 would be hard-pressed to find a comfortable niche anywhere by now.

As before, you can download my spreadsheet with all the numbers behind these charts and the table.

How To Think About Compression

… and the case against bzip2

Compression is with us all the time. I want to talk about general-purpose lossless compression here.

There is a lot of agonizing over compression ratios: the size of output for various sizes of input. For some situations, this is of course the single most important factor. For instance, if you’re Linus Torvalds putting your code out there for millions of people to download, the benefit of saving even a few percent of file size is well worth the cost of perhaps 50% worse compression performance. He compresses a source tarball once a month maybe, and we are all downloading it thousands of times a day.

On the other hand, when you’re doing backups, the calculation is different. Your storage media costs money, but so does your CPU. If you have a large photo collection or edit digital video, you may create 50GB of new data in a day. If you use a compression algorithm that’s too slow, your backup for one day may not complete before your backup for the next day starts. This is even more significant a problem when you consider enterprises backing up terabytes of data each day.

So I want to think of compression both in terms of resulting size and performance. Onward…

Starting Point

I started by looking at the practical compression test, which has some very useful charts. He has charted savings vs. runtime for a number of different compressors, and with the range of different settings for each.

If you look at his first chart, you’ll notice several interesting things:

  • gzip performance flattens at about -5 or -6, right where the manpage tells us it will, and in line with its defaults.
  • 7za -2 (the LZMA algorithm used in 7-Zip and p7zip) is both faster and smaller than any possible bzip2 combination. 7za -3 gets much slower.
  • bzip2’s performance is more tightly clustered than the others, both in terms of speed and space. bzip2 -3 is about the same speed as -1, but gains some space.

All this was very interesting, but had one limitation: it applied only to the gimp source tree, which is something of a best-case scenario for compression tools.

A 6GB Test
I wanted to try something a bit more interesting. I made an uncompressed tar file of /usr on my workstation, which comes to 6GB of data. My /usr contains highly compressible data such as header files and source code, ELF binaries and libraries, already-compressed documentation files, small icons, and the like. It is a large, real-world mix of data.

In fact, every compression comparison I saw was using data sets less than 1GB in size — hardly representative of backup workloads.

Let’s start with the numbers:

Tool MB saved Space vs. gzip Time vs. gzip Cost
gzip 3398 100.00% 100.00% 0.15
bzip2 3590 92.91% 333.05% 0.48
pbzip2 3587 92.99% 183.77% 0.26
lzma -1 3641 91.01% 195.58% 0.28
lzma -2 3783 85.76% 273.83% 0.37

In the “MB saved” column, higher numbers are better; in all other columns, lower numbers are better. I’m using clock seconds here on a dual-core machine. The cost column is clock seconds per MB saved.

What does this tell us?

  • bzip2 can do roughly 7% better than gzip, at a cost of a compression time more than 3 times as long.
  • lzma -1 compresses better than bzip2 -9 in less than twice the time of gzip. That is, it is significantly faster and marginally smaller than bzip2.
  • lzma -2 is significantly smaller and still somewhat faster than bzip2.
  • pbzip2 achieves better wall clock performance, though not better CPU time performance, than bzip2 — though even then, it is only marginally better than lzma -1 on a dual-core machine.

Some Pretty Charts

First, let’s see how the time vs. size numbers look:

compress-zoomed

Like the other charts, the best area is the lower left, and worst is upper right. It’s clear we have two outliers: gzip and bzip2. And a cluster of pretty similar performers.

This view somewhat magnifies the differences, though. Let’s add cat to the mix:

compress-big

And finally, look at the cost:

compress-efficiency

Conclusions

First off, the difference in time is far larger than the difference in space. We’re talking a difference of 15% at the most in terms of space, but orders of magnitude for time.

I think this pretty definitively is a death knell for bzip2. lzma -1 can achieve better compression in significantly less time, and lzma -2 can achieve significantly better compression in a little less time.

pbzip2 can help even that out in terms of clock time on multicore machines, but 7za already has a parallel LZMA implementation, and it seems only a matter of time before /usr/bin/lzma gets it too. Also, if I were to chart CPU time, the numbers would be even less kind to pbzip2 than to bzip2.

bzip2 does have some interesting properties, such as resetting everything every 900K, which could provide marginally better safety than any other compressor here — though I don’t know if lzma provides similar properties, or could.

I think a strong argument remains that gzip is most suitable for backups in the general case. lzma -1 makes a good contender when space is at more of a premium. bzip2 doesn’t seem to make a good contender at all now that we have lzma.

I have also made my spreadsheet (OpenOffice format) containing the raw numbers and charts available for those interested.

Update

Part 2 of this story is now available, which considers more compression tools, and looks at performance compressing files individually rather than the large tar file.

Video Hosting Sites Review

Last July, I wrote about video uploading sites. Now that I’m starting to get ready to post video online, some public but a lot of it just for friends or family, I’ve taken another look. And I’m disappointed in what I see.

Youtube has made the biggest improvements since then. Now, they can handle high-definition video, an intermediate “HQ” encoding, and the standard low-bandwidth encoding. Back then, there was no HD support, and I don’t think any HQ support either.

There are two annoying things about Youtube. One is the 10 minute limit per video file, though that can be worked around. The other is the really quite terrible options for sharing non-public videos. In essence, the only way to do this is to, on each video, manually select which people you want to be able to see it. If suddenly a new person gets a Youtube account, you can’t just give them access to the entire back library of videos. What I want it to tell Youtube that all people in a certain GROUP should have access, and then I can add people to the group as needed. That’s a really quite terrible oversight.

Vimeo, on the other hand, has actually gotten worse. Back a year ago, they were an early adopter on the HD bandwagon. Now that they’ve introduced their pay accounts, the free accounts have gotten worse than before. With a free Vimeo account, you can only upload 1 HD video a week. You also get dumped in the “4-hour encoding” line, and get the low-quality encoding. Yes, it’s noticeable, and much worse than Youtube HQ, let alone Youtube HD. You have no time limit, but a 500MB upload limit per week.

The sharing options with Vimeo are about what I’d want.

blip.tv seems about the same, and I’m still avoiding them because you have to pay $100/yr to be able to keep videos non-public.

Then there’s viddler. I am not quite sure what to make of them. They seem to be, on the one hand, Linux fans with a clue. On the other hand, their site seems to be chock full of get-rich-quick and real estate scheme videos, despite a ToS that prohibits them. They allow you to upload HD videos but not view them. They have a limit of 500MB per video file, but no limits on how many files you can upload or the length of each one, and the sharing options seem good.

So I’m torn. On the one hand, it would be easy to say, “I’ll just dump everything to viddler.” On the other hand, are they going to do what Vimeo did, or worse, start overlaying ads on all my videos?

Any suggestions?

Review: Video Editing Software

We recently bought a Canon Vixia HG20 camcorder. The HG20 records in AVCHD format (MPEG-4 h.264) at up to 1920×1080. To get from the camcorder to a DVD (or something we can upload to the web), I need some sort of video editing software. This lets me trim out the boring bits, encode the video for DVD, etc.

Background

In addition to DVD creation and web uploading, I want the ability to burn high-definition video discs. 1920×1080 is significantly higher resolution than you get from a DVD. There are two main ways to go: a blu-ray format disc, or an AVCHD disc. A blu-ray disc has to be burned onto BD-R media, which costs about $5 each, using a blu-ray burner, which costs about $200. AVCHD discs use the same h.264 encoding that the camcorder does, meaning they have better compression and can be burned onto regular DVD+R media, fitting about 30 minutes onto a DVD. Moreover, it is possible to move AVCHD files directly from a camcorder to an AVCHD disc without re-encoding, resulting in higher quality and lower playing time. The nicer blu-ray players, such as the PS3, can play AVCHD discs.

AVCHD seems pretty clearly the direction the industry is moving. Compared to the tape-based HDV, ACVHD has higher quality with lower bitrates, better resolution, and much greater convenience. Hard disk or SD-based AVCHD camcorders are pretty competitive in terms of price by now too, often cheaper than tape-based ones.

The downside of AVCHD is that it takes more CPU power to process. Though as video tasks are often done in batch, that wouldn’t have to be a huge downside. The bigger problem is that, though all the major video editing software claims to support AVCHD, nobody really supports it well yet.

The Contenders

Back when I got my first camcorder in about 2001 — the one that I’m replacing now — you pretty much had to have a Mac to do any sort of reasonable consumer or prosumer-level video editing. We bought our first iMac back then to work with that, and it did work well with the MiniDV camera.

Today, there’s a lot more competition out there. The Mac software stack has not really maintained its lead — some would even say that it’s regressed — and the extremely high cost of buying a Mac capable of working with AVCHD, plus Final Cut Express, makes that option completely out of the question for me. It would be roughly $2500.

Nobody really supports AVCHD well yet, even on the Mac. Although most programs advertise support of “smart rendering” — a technique that lets the software merely copy unchanged footage when outputting to the same format as the input — none of them have smart rendering that actually works with AVCHD source material. Though this fact is never documented, though discussed on forums.

Another annoyance, having used Final Cut Express in the past, is that with these programs you can’t just go to the timeline and say “delete everything between 1:35 and 3:52”; you have to go in and split up clips, then select and delete them. They seem to be way too concerned about dealing with individual clips.

I briefly used Cinelerra on Linux to do some video editing. It’s a very powerful program, but geared at people that are far more immersed in video editing than I. For my needs, it didn’t have enough automation and crashed too much — and that was with MiniDV footage. It apparently does support AVCHD, but I haven’t tried it.

I’ve tried three programs and considered trying a fourth. Here are my experiences:

Ulead/Corel VideoStudio Pro X2

Commonly referenced as the “go to” program for video editing on Windows, I started with downloading the Free Trial of it from Corel. Corel claims that the free trial is full-featured all over on their website, but I could tell almost instantly that it wasn’t. I wound up buying the full version, which came to about $65 after various discounts.

I wanted to like this program. Its output options include AVCHD disc, Blu-ray disc, DVD+R, and the like. Its input options include MiniDV, AVCHD, ripping from DVD, ripping from Bluray, and just about every other format you can think of. And it heavily advertised “proxy editing”, designed to let you edit a scaled-down version of AVCHD video with a low-CPU machine, but refer back to the original high-quality footage for the output.

It didn’t pan out that way.

The biggest problem was the constant crashing. I really do mean constant. It probably crashed on me two dozen times in an hour. If you are thinking that means that it crashes pretty much as soon as I can get it re-opened, you’d be correct. Click the Play button and it hangs. Click a clip and it hangs. Do anything and it hangs.

It did seem to work better with the parts of the source that had been converted to a low-res version with Smart Proxy, though it didn’t eliminate the hangs, just reduced them. And every time I’d have to End Task, it would forget what it had already converted via Smart Proxy — even if I had recently saved the project — and have to start over from scratch.

I spent some time trying to figure out why it always thought my project was 720×480 even when it was 1920×1080, and why the project properties box didn’t even have an option for 1920×1080. After some forum searching, it turns out that the project properties box is totally irrelevant to the program. Yay for good design, anyone?

VideoStudio Pro X2 does have good output options, allowing combination of multiple source files onto a single DVD or AVCHD disc as separate titles. Unfortunately, its DVD/AVCHD rendering process also — yes — hangs more often than not.

The documentation for VideoStudio Pro X2 is of the useless variety. It’s the sort of thing that feels like it’s saying “The trim tool is for trimming your clips” without telling you what “trimming your clips” means, or making it obvious how to remove material from the middle of a clip.

The proxy editing feature isn’t what it should be either. Instead of being something that just automatically happens and Works in the background, you have to manage its queue in the foreground — and it forgets what it was doing whenever the program hangs.

On the rare occasion when pressing Play did not cause a hang, the AVCHD footage played back at about 0.5fps — far, far worse than PowerDirector manages on the same machine. Bits that had been rendered for proxy editing did appear to play at full framerate.

I have applied for a refund for this purchase from Corel under their 30-day return policy, and have already uninstalled it from my disk. What a waste.

CyberLink PowerDirector 7 Ultra

This was the second program I tried, and the one I eventually bought. Its feature set is not quite as nice as Corel’s, especially when it comes to versatility of output options. On the other hand, it feels… done. It only crashed two or three times on me — apparently that’s GOOD on Windows? Things just worked. It appears to have proxy editing support, but it is completely transparent and plays back with a decent framerate even without it. It can output to AVCHD, Bluray, and DVD, though smart rendering doesn’t work with AVCHD source material.

Its weakness compared to the Corel package is that it doesn’t have as many options for formatting these discs. You can have only one title on a disc, though you can have many chapters. You have some, but not much, control over compression parameters. The same goes for exporting files for upload to the web or saving on your disk.

The documentation is polished and useful for the basics, though not extensive.

Overall, this package works, supports all the basics I wanted from it, so I’m using it for now.

Adobe Premiere Elements 7

I downloaded the trial of this one too. I opened it up, and up popped a dialog box asking what resolution my project would be, interlacing settings, etc. I thought “YES — now that’s the kind of program I want.” As I tried out the interface, I kept thinking the same. This was a program not just for newbies, but for people that wanted a bit more control.

Until it came to the question of output. Premiere Elements 7 was the only package I looked at that had no option to burn an AVCHD disc. DVD or Blu-ray only. That’s a deal-breaker for me. There’s no excuse for a program in this price range to not support the only affordable HD disc option out there. So I didn’t investigate very much farther.

Another annoying thing is that Adobe seems to treat all of their software as a commercial. I’m a user, not an audience, dammit. I do not want to buy some photoshop.net subscription when I buy a video editing program. I do not want to see ads for stuff when I’m reading PDFs. LEAVE ME ALONE, ADOBE.

I just felt sleazy even giving them my email address, let alone installing the program on my system. I think I will feel like a better person once I reboot into Windows and wipe it off my system.

Pinnacle Studio 12

Another program that comes highly rated. But I never installed it because its “minimum system requirements” state that it needs an “Intel Core 2 Quad 2.66GHz or higher” for 1920×1080 AVCHD editing. And I have only a Core 2 Duo 2.66GHz — half the computing horsepower that it wants. And since they offer no free trial, I didn’t bother even trying it, especially since PowerDirector got by fine with my CPU.

Conclusions

This seems to be a field where we can say “all video editing software sucks; some just suck a little less.” I’m using PowerDirector for now, but all of the above programs should have new versions coming out this year, and I will be keeping a close eye to see if any of them stop being so bad.