datapacker | The Changelog

I wrote before about datapacker, but I didn’t really describe what it is or how it’s different from other similar programs.

So, here’s the basic problem the other day. I have a bunch of photos spanning nearly 20 years stored on my disk. I wanted to burn almost all of them to DVDs. I can craft rules with find(1) to select the photos I want, and then I need to split them up into individual DVDs. There are a number of tools that did that, but not quite powerful enough for what I want.

When you think about splitting things up like this, there are a lot of ways you can split things. Do you want to absolutely minimize the number of DVDs? Or do you keep things in a sorted order, and just start a new DVD when the first one fills up? Maybe you are adding an index to the first DVD, and need a different size for it.

Well, datapacker 1.0.1 can solve all of these problems. As its manpage states, “datapacker is a tool in the traditional Unix style; it can be used in pipes and call other tools.” datapacker accepts lists of files to work on as command-line parameters, piped in from find, piped in from find -print0. It can also output its results in various parser-friendly formats, call other programs directly in a manner similar to find -exec, or create hardlink or symlink forests for ease of burning to DVD (or whatever you’ll be doing with it).

So, what I did was this:

find Pictures -type f -and -not -iwholename "Pictures/2001/*.tif" -and \ -not -wholename "Pictures/Tabor/*" -print0 | \ datapacker -0Dp -s 4g --sort -a hardlink -b ~/bins/%03d -

So I generate a list of photos to process with find. Then datapacker is told to read the list of files to process in a null-separated way (-0), generate bins that mimic the source directory structure (-D), organize into bins preserving order (-p), use a 4GB size per bin (-s 4g), sort the input prior to processing (–sort), create hardlinks for the files (-a hardlink), and then name the bins with a 3-digit number under ~/bins, and finally, read the list of files from stdin (-). By using –sort and -p, the output will be sorted by year (Pictures/2000, Pictures/2001, etc), so that photos from all years aren’t all mixed in on the discs.

This generates 13 DVD-sized bins in a couple of seconds. A simple for loop then can use mkisofs or growisofs to burn them.

The datapacker manpage also contains an example for calling mkisofs directly for each bin, generating ISOs without even an intermediate hardlink forest.

So, when I wrote about datapacker last time, people asked how it differed from other tools. Many of them had different purposes in mind. So I’m not trying to say one tool or the other is better, just highlighting differences. Most of these appear to not have anything like datapacker –deep-links.

gaffiter: No xargs-convenient output, no option to pass results directly to shell commands. Far more complex source (1671 lines vs. 228 lines)

dirsplit: Park of mkisofs package. Uses a random iterative approach, few options.

packcd: Similar packing algorithm options, but few input/input options. No ability to read a large file list from stdin. Could have issues with command line length.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

The Changelog

Comments on family, technology, and society

Tag Archives: datapacker

New version of datapacker