datapacker

April 15th, 2008

Every so often, I come across some utility that need. I think it must have been written before, but I can’t find it.

Today I needed a tool to take a set of files and split them up into directories in a size that will fit on DVDs. I wanted a tool that could either produce the minimum number of DVDs, or keep the files in order. I couldn’t find one. So I wrote datapacker.

datapacker is a tool to group files by size. It is perhaps most often used to fit a set of files onto the minimum number of CDs or DVDs.

datapacker is designed to group files such that they fill fixed-size containers (called “bins”) using the minimum number of containers. This is useful, for instance, if you want to archive a number of files to CD or DVD, and want to organize them such that you use the minimum possible number of CDs or DVDs.

In many cases, datapacker executes almost instantaneously. Of particular note, the hardlink action can be used to effectively copy data into bins without having to actually copy the data at all.

datapacker is a tool in the traditional Unix style; it can be used in pipes and call other tools.

I have, of course, uploaded it to sid. But while it sits in NEW, you can download the source tarball (with debian/ directory) from the project homepage at http://software.complete.org/datapacker. I’ve also got an HTML version of the manpage online, so you can see all the cool features of datapacker. It works nicely with find, xargs, mkisofs, and any other Unixy pipe-friendly program.

Those of you that know me will not be surprised that I wrote datapacker in Haskell. For this project, I added a bin-packing module and support for parsing inputs like 1.5g to MissingH. So everyone else that needs to do that sort of thing can now use library functions for it.

Update… I should have mentioned the really cool thing about this. After datapacker compiled and ran, I had only one mistake that was not caught by the Haskell compiler: I said < where I should have said <= one place. This is one of the very nice things about Haskell: the language lends itself to compilers that can catch so much. It’s not that I’m a perfect programmer, just that my compiler is pretty crafty.

Categories: Linux

Leave a comment

Comments Feed8 Comments

  1. Roland

    You were right: it has been written before. And it’s even in Debian (well, of course it is). Try “gaffitter”.

    Reply

    John Goerzen Reply:

    Sigh. Indeed it is there.

    But it doesn’t do a lot of things like datapacker does. For instance, its output is not convenient for feeding to xargs, it has no option to pass the results directly to a shell command, etc. So I’m not going to feel TOO bad ;-)

    I did an apt-cache search CD | sort | less and scanned all the descriptions. From the short description, it sounds like this tool pulls out parts of individual files, rather than selecting entire files from a set.

    Reply

    Jedai Reply:

    In fact gaffitter have references to others similar tools.

    Still, if you ignore all comments and empty lines, gaffiter is 1671 lines long, while datapacker is 211 lines long and datapacker has a output format which is more convenient as a filter. So I guess it wasn’t an useless exercice anyway.

    Reply

    Magnus Reply:

    Ah, gafitter brings back memories. I hacked up a similar thing, just much more specialised, to create mixed CDs. I’d have to leave my system, Linux on 486, running for a few hours to be sure to get a reasonable result. Oh, those were the days…

    Reply

    Cute Ambroso Reply:

    This Gafitter was made for the same purpose but with a different approach.

    I think this mentioned exercise in coding was meant for pure fun, rather than using third party code.

    Reply

  2. Georg

    What is the worst case runtime of your algorithm (to compute optimal grouping of files)?

    Reply

  3. taggart

    http://kitenet.net/~joey/blog/entry/file_set_split_utility/

    Reply

  4. ynw

    I am actually happy with a remote SVN backing up my files.

    Cheers

    Reply

Leave a comment

 

Feed

http://changelog.complete.org / datapacker