March 3rd, 2007
Shell scripts are good for a lot of things. It’s quick and easy to design shell scripts that take input from one program, pass it to another program, munge it for filenames, etc.
But there are a few drawbacks to shell scripts.
The #1 drawback, in my opinion, is that it is extremely difficult to get quoting and escaping right. I often see things like $@ in shell scripts (breaks if a parameter has a space in it). I also see people failing to check for errors properly (set -e helps that). It’s also difficult to do a more modern style of exception handling (do a sequence of actions in a temporary directory, and always remove that directory, even if there’s an error, but stop processing and propogate the error). Command-line parsing is esoteric and odd, even with getopt. That’s not to say that it’s impossible to make a secure shell script that handles filenames with spaces in them properly. Just that it’s difficult, and makes using common operators like backticks difficult.
HSH makes it easy to run shell commands, set up pipelines, etc. straight from Haskell. You can either use simple strings to invoke commands (they’ll be passed to sh -c), or you can specify arguments as a list (like exec…() takes), which eliminates the strange filename problems.
But the really cool thing is that HSH doesn’t just let you pipe from one external program to another. It also lets you pipe to/from pure Haskell functions. Yes, you can pipe the output of ls -l straight into a Haskell version of grep. I’ve found it to be very nice, especially for more complex processing tasks.
I put these simple examples on the HSH homepage:
run $ "echo /etc/pass*" :: IO String -> "/etc/passwd /etc/passwd-" runIO $ "ls -l" -|- "wc -l" -> 12 runIO $ "ls -l" -|- wcL -> 12
In this example, wcL is a pure-Haskell line-counting function.
The results were surprising. According to SLOCCount, porting hg-buildpackage from a shell script to a HSH script achieved a 20% reduction in source lines of code. And at the same time, gained better error handling, better safety of filenames, better type safety (compile-time type checking), etc. Yet it does exactly the same thing in almost exactly the same way.
Even greater savings will occur too. I decided to reimplement a small part of sed just for fun, and that code is still in my tree. If I removed that and replaced it with a call to sed as in the shell version, that would probably buy another 5% savings.
I didn’t really expect to achieve a reduction in lines of code. I thought that I’d be lucky to come close to breaking even. After all, who’d expect something other than the shell to be better at shell scripting?
I don’t know if these results are generalizable, but I’m really excited about it.