A better environment for shell scripting

March 3rd, 2007

Shell scripts are good for a lot of things. It’s quick and easy to design shell scripts that take input from one program, pass it to another program, munge it for filenames, etc.

But there are a few drawbacks to shell scripts.

The #1 drawback, in my opinion, is that it is extremely difficult to get quoting and escaping right. I often see things like $@ in shell scripts (breaks if a parameter has a space in it). I also see people failing to check for errors properly (set -e helps that). It’s also difficult to do a more modern style of exception handling (do a sequence of actions in a temporary directory, and always remove that directory, even if there’s an error, but stop processing and propogate the error). Command-line parsing is esoteric and odd, even with getopt. That’s not to say that it’s impossible to make a secure shell script that handles filenames with spaces in them properly. Just that it’s difficult, and makes using common operators like backticks difficult.

Awhile back, I toyed with the idea of making Haskell a shell scripting language. This week, I spent some time to make this a reality. I released HSH, a shell scripting environment for Haskell.

HSH makes it easy to run shell commands, set up pipelines, etc. straight from Haskell. You can either use simple strings to invoke commands (they’ll be passed to sh -c), or you can specify arguments as a list (like exec…() takes), which eliminates the strange filename problems.

But the really cool thing is that HSH doesn’t just let you pipe from one external program to another. It also lets you pipe to/from pure Haskell functions. Yes, you can pipe the output of ls -l straight into a Haskell version of grep. I’ve found it to be very nice, especially for more complex processing tasks.

I put these simple examples on the HSH homepage:

run $ "echo /etc/pass*" :: IO String
 -> "/etc/passwd /etc/passwd-"

runIO $ "ls -l" -|- "wc -l"
 -> 12

runIO $ "ls -l" -|- wcL
 -> 12

In this example, wcL is a pure-Haskell line-counting function.

The results were surprising. According to SLOCCount, porting hg-buildpackage from a shell script to a HSH script achieved a 20% reduction in source lines of code. And at the same time, gained better error handling, better safety of filenames, better type safety (compile-time type checking), etc. Yet it does exactly the same thing in almost exactly the same way.

Even greater savings will occur too. I decided to reimplement a small part of sed just for fun, and that code is still in my tree. If I removed that and replaced it with a call to sed as in the shell version, that would probably buy another 5% savings.

I didn’t really expect to achieve a reduction in lines of code. I thought that I’d be lucky to come close to breaking even. After all, who’d expect something other than the shell to be better at shell scripting?

I don’t know if these results are generalizable, but I’m really excited about it.

Categories: Programming

Tags: Leave a comment

Comments Feed4 Comments

  1. David Nusinow

    This looks very cool! I’ve been trying to find time to learn haskell, but I generally spend most of my time doing Debian work, which really involves lots of shell scripting. This will finally give me the chance to do both! I’m looking forward to having it enter the archive.

    Reply

  2. Gwern

    Yes, this certainly is neat. (Although how hard is “$@”? I know the Bash manuals and FAQs linked to from #bash all say to just use that)

    I have a few questions: how far is this going to go, though? Are you going to add in more programs in native Haskell, like the list of simple unix tools < http://haskell.org/haskellwiki/Simple_unix_tools>? Because that’d be pretty neat and would make pipes even more neat.

    Also, why wcL and not wcC or wcW? Seems kind of odd to single out the “wc -l” functional of wc for implementation.

    Anyway, I’ve already written a somewhat useful program using HSH: < http://en.wikipedia.org/wiki/User:Gwern/Bot> The nice thing about HSH is that it allows me (albeit not entirely easily – type mismatches make things more difficult than it feels like they should be) to do the thinking and hard stuff in Haskell and let the Python (programs from Pywikipediabot) handle the details of actually connecting to Wikipedia and getting and saving articles. I discovered it can be quite painful to really use the monads and I particularly had trouble dealing with errors.

    For example, if a Python script errors out saying there’s no page, how does one handle it usefully in the IO monad in Haskell without weird hacks like redirecting output to a file and reading the file in? Eventually I found a way to make catch work for me, but that was seriously not fun to work around.

    Reply

    John Goerzen Reply:

    What you’re talking about is a weakness in HSH, not in Haskell itself. I simply haven’t yet coded up support to let someone retrieve output even in the event of an error. The infrastructure permits it; I just haven’t gotten to it yet.

    catch is generally pretty easy to use; you might look at the various info in the library reference for Control.Exception for more info.

    The general idea of catch is this. You pass it two functions. The first is the action you want to perform. If it raises no error, catch returns its return value unmodified. If it raises an exception, catch calls your second function, passing the exception in as an argument, and returns the return value of that function to the program (assuming your second function doesn’t re-raise the exception itself). HSH provides some utilities that let you catch only shell command exceptions, BTW.

    There are other functions such as bracket that are akin to Python’s “finally”, which ensure that certain tasks are carried out after an operation is complete, whether or not it was successful.

    Hope that helps.

    Reply

  3. weakish

    Have you heard about HsShellScript?

    http://www.volker-wysk.de/hsshellscript/

    Reply

Leave a comment

 

Feed

http://changelog.complete.org / A better environment for shell scripting