Tag Archives: haskell

Real World Haskell update

There’s been quite the activity around this book lately.

Pat Eyler of On Ruby published an interview with the three of us RWH authors. It apparently got some very positive comments on Reddit.

Over on Amazon, the book is continuing its streak of 5-star reviews. When I see a review there titled “Finally a Haskell book that I could understand”, I am very glad that we took the time to write Real World Haskell. It is great to know that people find it useful.

Some of you may know that the book is licensed under a Creative Commons license. This is most certainly not common in the publishing world these days, and O’Reilly took a risk by letting us do it. We had tremendous community participation while we were writing it, and O’Reilly is quite pleased with the sales so far. I hope that this bodes well for future books released under such a license.

If Programming Languages Were Christmas Carols

Last spring, I posted If Version Contol Systems Were Airlines, which I really enjoyed. Now, because I seem to have a desire to take a good joke way too far, it’s time for:

IF PROGRAMMING LANGUAGES WERE CHRISTMAS CAROLS

I apologize in advance. (Feel free to add your own verses/carols in the comments.)

Away in a Pointer (C)

(to Away in a Manger)

Away in a pointer, the bits in a row.
A little dereference to see where they go.
I look down upon thee, and what do I see?
A segfault and core dump, right there just for me.

I saw thy init there, a reaping away
My process, from its address space, so sorry to say.
I thought I had saved thee, from void pointers all,
But maybe I missed one, and doomed you to fall.

Be near me, debugger, I ask thee to stay
Close by my terminal, and help me, I pray;
To find all the bugs and the void pointers too,
And if my kernel oopses, help me reboot for you.

Joy to the Wall (Perl)

(to Joy to the World)

Joy to the Wall, the Perl is come!
Let awk receive her King;
Let every grep prepare him room,
And bash and sed shall sing,
And bash and sed shall sing,
And bash, and bash, and sed shall sing.

Joy to the keyboard, we’ll use it all!
Let men, shift keys, employ;
Implicit variables, and globals never fall.
Repeat the line noise now,
Repeat the line noise now,
Repeat, repeat, the line noise now.

Perl rules the world with truth and ASCII,
And makes the doctors prove
The glories of carpal tunnel hands,
And we do it more than one way,
And we do it more than one way,
And we do it, and we do it, more than one way.

Hark! The Herald Coders Sing (Haskell)

(to Hark! The Herald Angels Sing)

Hark! The herald coders sing,
“Map and fold, recursive King;
Recursion and patterns wild,
Pure and IO — they’re reconciled!”
Joyful, all ye functions rise,
Join the typeclasses of the types,
With recursion, do proclaim,
“Laziness is born in this domain.”

Refrain
Hark! The herald coders sing,
“Map and fold, recursive king!”

Monads, by highest Heav’n adored;
Monads, their depths still unexplored;
Late in time, behold they’re good,
Never once were understood.
Veiled in functions, the Monads stay,
Used for IO, and more, each day,
With excitement, Monads say,
“Arrows are stranger, so with us stay.”

(Refrain)

Hail the glorious compiler of Glasgow!
Hail the threaded run-time system!
Join the beautiful Cabal of Hackage,
Upload there thy perfect package.
We know best, what we will Handle,
You’re safe with us: no pointers, no vandals.
Born to make your exceptions throw,
Unless you unsafePerformIO.

(Refrain)

Lispy the Paren

(to Frosty the Snowman)

Lispy the paren was a jolly happy soul,
With a lot of cars and a little cons
And two ends made out of curves.
Lispy the paren is a fairy tale, they say,
He was just common, but the children know
how he came to life one day.
There must have been some magic in that
Old Symbolics they found.
For when they placed him on its disk,
It recursed around and ’round.

O, Lispy the paren,
Was recursive as can be.
And the coders say it would take a day
To put his parens away.
Clunkety clunk clunk,
Clunkety clunk clunk,
Look at Lispy go.
Clunkety clunk clunk,
Clunkety clunk clunk,
Consing on the car.

Lispy the snowman knew
The keyboard was hot the day,
So he said, “Let’s cons and we’ll have some fun
now before they Scheme away.”
Down to the function,
With a list there in his RAM,
Running here and there,
all around the LAN, saying
“cdr me if you can.”
He led them down the streets of disk
Right to the traffic bus.
And only paused a moment when
He heard them holler (quit).

Oh BASIC Night

(to O Holy Night)

Oh BASIC night, the LEDs are brightly glinting;
It is the night of the dear GOSUB’s birth!
Long lay the world in sin and error printing,
Till you appeared and the RAM felt its worth.
Shiver of fear, line numbers do inspire,
For yonder breaks a mostly harmless GOTO.
Fall on your bits, O hear the Visual voices!
O BASIC divine, O BASIC where GOTO was born!
O BASIC, O Holy BASIC, O BASIC, you’re mine!

Some want to say, “GOTO is harmful always,”
But what of them, in their post-modern world.
We PRINT the truth, in the line-numbered goodness,
But Dijkstra appeared, and the faith, it was lost.
A thrill of hope, when .NET BASIC announces,
But Visual BASIC, what kind of thing are you?
Fall on your GUI, O see the old line numbers!
Behold BASICA, O BASIC when DOS was born!
O numbers, O lines, spaghetti divine!

Guido We Have Heard on High (Python)

(to Angels We Have Heard on High)

Guido we have hard on high
Sweetly indenting o’re the code,
And the functions in reply
Their exceptions sweetly flowed.

Refrain

Indent….. in your whitespace careful!
Indent…… in your whitespace careful!

Spaces, why this jubilee?
Why semicolons have you so wronged?
What backslashes must we use
If we want our lines so long?

(Refrain)

Come to Guido here to see
“One Right Way” is good, of course.
There’s no need for Perl, you know,
We have to be more verbose.

(Refrain)

Now the PEP will show the way
To the future, we shall see.
Banish lambda and the rest
Of the things we liked the best.

(Refrain)

Real World Haskell Update

Times are exciting. Our book, Real World Haskell, is now available in a number of venues. But before I get to that, I’ve got to talk about what a thrill this project has been.

I created our internal Darcs repository in May, 2007. Since then, the three of us has made 1324 commits — and that doesn’t count work done by copyeditors and others at O’Reilly.

We made available early drafts of the book online for commenting, which served as our tech review process. By the time we finished writing the book, about 800 people had submitted over 7,500 comments. I’ve never seen anything like it, and really appreciate all those that commented about it.

As for availability, RWH is available:

  • For immediate purchase with electronic delivery, from O’Reilly’s page
  • For immediate viewing on Safari Books Online, at its book page
  • Paper editing timing is still tentative, but we’re estimating arrival in bookstores the week of December 8.

People are talking about it on blogs, twitter, etc. We’re excited!

New version of datapacker

I wrote before about datapacker, but I didn’t really describe what it is or how it’s different from other similar programs.

So, here’s the basic problem the other day. I have a bunch of photos spanning nearly 20 years stored on my disk. I wanted to burn almost all of them to DVDs. I can craft rules with find(1) to select the photos I want, and then I need to split them up into individual DVDs. There are a number of tools that did that, but not quite powerful enough for what I want.

When you think about splitting things up like this, there are a lot of ways you can split things. Do you want to absolutely minimize the number of DVDs? Or do you keep things in a sorted order, and just start a new DVD when the first one fills up? Maybe you are adding an index to the first DVD, and need a different size for it.

Well, datapacker 1.0.1 can solve all of these problems. As its manpage states, “datapacker is a tool in the traditional Unix style; it can be used in pipes and call other tools.” datapacker accepts lists of files to work on as command-line parameters, piped in from find, piped in from find -print0. It can also output its results in various parser-friendly formats, call other programs directly in a manner similar to find -exec, or create hardlink or symlink forests for ease of burning to DVD (or whatever you’ll be doing with it).

So, what I did was this:


find Pictures -type f -and -not -iwholename "Pictures/2001/*.tif" -and \
-not -wholename "Pictures/Tabor/*" -print0 | \
datapacker -0Dp -s 4g --sort -a hardlink -b ~/bins/%03d -

So I generate a list of photos to process with find. Then datapacker is told to read the list of files to process in a null-separated way (-0), generate bins that mimic the source directory structure (-D), organize into bins preserving order (-p), use a 4GB size per bin (-s 4g), sort the input prior to processing (–sort), create hardlinks for the files (-a hardlink), and then name the bins with a 3-digit number under ~/bins, and finally, read the list of files from stdin (-). By using –sort and -p, the output will be sorted by year (Pictures/2000, Pictures/2001, etc), so that photos from all years aren’t all mixed in on the discs.

This generates 13 DVD-sized bins in a couple of seconds. A simple for loop then can use mkisofs or growisofs to burn them.

The datapacker manpage also contains an example for calling mkisofs directly for each bin, generating ISOs without even an intermediate hardlink forest.

So, when I wrote about datapacker last time, people asked how it differed from other tools. Many of them had different purposes in mind. So I’m not trying to say one tool or the other is better, just highlighting differences. Most of these appear to not have anything like datapacker –deep-links.

gaffiter: No xargs-convenient output, no option to pass results directly to shell commands. Far more complex source (1671 lines vs. 228 lines)

dirsplit: Park of mkisofs package. Uses a random iterative approach, few options.

packcd: Similar packing algorithm options, but few input/input options. No ability to read a large file list from stdin. Could have issues with command line length.

Real-World Haskell

Today, Bryan O’Sullivan, Don Stewart, and I are announcing a new book we’re working on with O’Reilly: Real-World Haskell. I’m excited about the book and about working with Bryan and Don on this project.

O’Reilly has agreed to publish this book under a Creative Commons license! We plan to post drafts of chapters incrementally at the book’s website, seeking feedback from readers and reviewers as we go.

Haskell makes a great practical parsing and scripting language, but this aspect of it has been under-documented. I look forward to helping change that!

A better environment for shell scripting

Shell scripts are good for a lot of things. It’s quick and easy to design shell scripts that take input from one program, pass it to another program, munge it for filenames, etc.

But there are a few drawbacks to shell scripts.

The drawback, in my opinion, is that it is extremely difficult to get quoting and escaping right. I often see things like $@ in shell scripts (breaks if a parameter has a space in it). I also see people failing to check for errors properly (set -e helps that). It’s also difficult to do a more modern style of exception handling (do a sequence of actions in a temporary directory, and always remove that directory, even if there’s an error, but stop processing and propogate the error). Command-line parsing is esoteric and odd, even with getopt. That’s not to say that it’s impossible to make a secure shell script that handles filenames with spaces in them properly. Just that it’s difficult, and makes using common operators like backticks difficult.

Awhile back, I toyed with the idea of making Haskell a shell scripting language. This week, I spent some time to make this a reality. I released HSH, a shell scripting environment for Haskell.

HSH makes it easy to run shell commands, set up pipelines, etc. straight from Haskell. You can either use simple strings to invoke commands (they’ll be passed to sh -c), or you can specify arguments as a list (like exec…() takes), which eliminates the strange filename problems.

But the really cool thing is that HSH doesn’t just let you pipe from one external program to another. It also lets you pipe to/from pure Haskell functions. Yes, you can pipe the output of ls -l straight into a Haskell version of grep. I’ve found it to be very nice, especially for more complex processing tasks.

I put these simple examples on the HSH homepage:

run $ "echo /etc/pass*" :: IO String
 -> "/etc/passwd /etc/passwd-"

runIO $ "ls -l" -|- "wc -l"
 -> 12

runIO $ "ls -l" -|- wcL
 -> 12

In this example, wcL is a pure-Haskell line-counting function.

The results were surprising. According to SLOCCount, porting hg-buildpackage from a shell script to a HSH script achieved a 20% reduction in source lines of code. And at the same time, gained better error handling, better safety of filenames, better type safety (compile-time type checking), etc. Yet it does exactly the same thing in almost exactly the same way.

Even greater savings will occur too. I decided to reimplement a small part of sed just for fun, and that code is still in my tree. If I removed that and replaced it with a call to sed as in the shell version, that would probably buy another 5% savings.

I didn’t really expect to achieve a reduction in lines of code. I thought that I’d be lucky to come close to breaking even. After all, who’d expect something other than the shell to be better at shell scripting?

I don’t know if these results are generalizable, but I’m really excited about it.

Haskell Time Travel

There is something very cool about a language in which the easiest, most direct way to explain how it solves a problem is to say, “When we pass the output of [this function] into the input for the oracle we are actually sending the data backwards in time. So when [the code] queries the oracle we get a result from the future.”

Sweet.

The story goes on to say, however, “Time travel is a very dangerous business. One false move and you can create a temporal paradox that will destroy the universe (which in this case means that the computation will diverge). When programming with values from the future, it is important never, never, to do anything with the values that might change the future. This is the temporal prime directive.”

The Haskell Blog Tutorial

The first installment of Mark C. Chu-Carroll’s Haskell tutorial series went up last week.

It begins this way:

Before diving in and starting to explain Haskell, I thought it would be good to take a moment and answer the most important question before we start:

Why should you want to learn Haskell?

It’s always surprised me how many people don’t ask questions like that.

Farther down:

So what makes Haskell so wonderful? Or, to ask the question in a slightly better way: what is so great about the pure functional programming model as exemplified by Haskell?

The answer is simple: glue.

Languages like Haskell have absolutely amazing support for modular development.

An interesting and though-provoking article, even for someone that’s been using Haskell for more than 2 years now. (Yikes, I had no idea it was that long)

You can also see all his posts on Haskell, which include a couple more installments.

Another Haskell Solution to Lars’ Problem

Yesterday, I posted an 18-line solution to Lars’ language problem. One problem with it was that it was not very memory-efficient (or time-efficient, for that matter). In other words, it was optimized for elegance.

Here is a 22-line solution that is much more memory-efficient and works well with his “huge” test case. Note to Planet readers: Planet seems to corrupt code examples at times; click on the original story to see the correct code.

import System.Environment
import Data.List
import Data.Char
import qualified Data.Map as Map

custwords = filter (/= "") . lines . map (conv . toLower)
    where iswordchar x = isAlphaNum x && isAscii x
          conv x = if iswordchar x then x else '\n'

wordfreq inp = Map.toList $ foldl' updmap (Map.empty::Map.Map String Int) inp
    where updmap nm word = case Map.lookup word nm of
                             Nothing -> Map.insert word 1 nm
                             Just x -> (Map.insert word $! x + 1) nm

freqsort (w1, c1) (w2, c2) = if c1 == c2
                                 then compare w1 w2
                                 else compare c2 c1

showit (word, count) = show count ++ " " ++ word
main = do args <- getArgs
          interact $ unlines . map showit . take (read . head $ args) .
                     sortBy freqsort . wordfreq . custwords

The main change from the previous example to this one is using a Map to keep track of the frequency of each word.

A Haskell solution to Lars’ Problem

Thanks to a little glitch in planet, one of Lars’ posts from 2004 came to my attention. In it, he proposes a test for language benchmarking:

Read text from the standard input and count the number of times each word occurs. Convert letters to lower case. Order the words according to frequency, words with the same frequency should be ordered in ascending lexicographic order according to character code. Print out the top N words, where N is a decimal number given on the command line. Each output line must contain the count, a space, and the word (in lower case), and end in an ASCII LINE FEED character. Output must contain exactly N such output lines and no other output lines.

A word contains only ASCII letters A through Z and a through z (convert upper case to lower case) and ASCII digits 0 through 9 and is not empty. All other characters separate words and are ignored except to notice word boundaries. Word boundaries only occur at the beginning and end of the file and at non-word characters. You may not assume a maximum length for the word, line, or input file.

He provides a tarball with sample implementations in C, Python, and Shell.

His C code is 183 lines long, Python 57, and Shell 11. The specs for this test seem particularly suited for shell.

I wrote a version in Haskell, commented and formatted approximately the same as his Python version, but using an algorithm more like the shell version. It comes in at 18 lines. Here it is:

import System.Environment
import Data.List
import Data.Char

custwords = filter (/= "") . lines . map (conv . toLower)
    where iswordchar x = isAlphaNum x && isAscii x
          conv x = if iswordchar x then x else '\n'

wordfreq = map (\x -> (head x, length x)) . group . sort

freqsort (w1, c1) (w2, c2) = if c1 == c2
                                 then compare w1 w2
                                 else compare c2 c1

showit (word, count) = show count ++ " " ++ word
main = do args <- getArgs
          interact $ unlines . map showit . take (read . head $ args) .
                     sortBy freqsort . wordfreq . custwords

Taking a look at this, one thing that might strike you is the function composition in main. This takes the output from one function and feeds it into the next -- and the Haskell syntactic sugar for this makes it look a lot like pipes in the shell version. The interact call takes, as a parameter, a function that takes a string and returns a string. interact supplies stdin as the input and prints the output to stdout. Note that, since Haskell is lazily, this does not mean buffering up the entire input or output -- it is read and written on demand.

The rest of the functions are also standard in Haskell, and you can find them in the index to the library reference if you want to learn more.

I understand and agree that short code doesn't necessarily mean good code, but I think that Haskell provides a very elegant and expressive solution to many problems -- one that also happens to be remarkably concise.

Updated 9/4: Changed isLower to isAlphaNum to fix a bug, and removed unnecessary Data.Map import