Category Archives: Software

datapacker

Every so often, I come across some utility that need. I think it must have been written before, but I can’t find it.

Today I needed a tool to take a set of files and split them up into directories in a size that will fit on DVDs. I wanted a tool that could either produce the minimum number of DVDs, or keep the files in order. I couldn’t find one. So I wrote datapacker.

datapacker is a tool to group files by size. It is perhaps most often used to fit a set of files onto the minimum number of CDs or DVDs.

datapacker is designed to group files such that they fill fixed-size containers (called “bins”) using the minimum number of containers. This is useful, for instance, if you want to archive a number of files to CD or DVD, and want to organize them such that you use the minimum possible number of CDs or DVDs.

In many cases, datapacker executes almost instantaneously. Of particular note, the hardlink action can be used to effectively copy data into bins without having to actually copy the data at all.

datapacker is a tool in the traditional Unix style; it can be used in pipes and call other tools.

I have, of course, uploaded it to sid. But while it sits in NEW, you can download the source tarball (with debian/ directory) from the project homepage at http://software.complete.org/datapacker. I’ve also got an HTML version of the manpage online, so you can see all the cool features of datapacker. It works nicely with find, xargs, mkisofs, and any other Unixy pipe-friendly program.

Those of you that know me will not be surprised that I wrote datapacker in Haskell. For this project, I added a bin-packing module and support for parsing inputs like 1.5g to MissingH. So everyone else that needs to do that sort of thing can now use library functions for it.

Update… I should have mentioned the really cool thing about this. After datapacker compiled and ran, I had only one mistake that was not caught by the Haskell compiler: I said < where I should have said <= one place. This is one of the very nice things about Haskell: the language lends itself to compilers that can catch so much. It’s not that I’m a perfect programmer, just that my compiler is pretty crafty.

At long last, software.complete.org migrated to Redmine

I’ve been writing a bit about Trac and Redmine lately. For approximately the 1/3 of the publically-available software that I’ve written, I maintain a Trac site for it at software.complete.org. This 1/3 is generally the third that has the most interest from others, and there’s a bug tracker, wiki, download area, etc.

Trac is nice, and much nicer than one of the *Forge systems for a setup of this scale. But it has long bugged me that Trac has no integration between projects. To see what open bugs are out there on my software, I have to check — yes — 17 individual bug trackers.

To keep track on the wikis to make sure that nobody is adding spam, I have to subscribe to 17 different RSS feeds.

It took me some time just to hack up a way so I didn’t have to have 17 different accounts to log in to…

So, mainly, my use case for Trac isn’t what it was intended for.

Enter Redmine. It’s similar in concept to Trac — a lightweight project management system. But unlike Trac, Redmine allows you to have separate projects, but still manage them all as one if you please.

Redmine didn’t have Git support in its latest release, but there was a patch in Redmine’s BTS for it. I discussed why it wasn’t being applied with Redmine’s author, and then went in and fixed it up myself. (I used Git to make a branch off the Redmine SVN repo — very slick.) Unlike Trac’s Git support, Redmine’s is *fast*. I tested it against a clone of the Linux kernel repo on my local machine.

There are a few things about Redmine I don’t like, but I have learned that they mainly have to do with Ruby on Rails. As someone pointed out on Planet Debian lately (sorry, can’t find the link), the very nature of Rails makes it almost impossible for OS developers like Debian to include Rails apps in the distribution.

Not only that, but it seems like Rails assumes that even if you are just going to *use* an app, you know how to *write* one. For instance, this is pretty much the extent of documentation on how to set up a Rails app to be able to send out mail:


# See Rails::Configuration for more options

And of course, googling that turns up nothing useful.

Redmine is a rails app, so it cannot escape some of this. It seems to be a solid piece of work, but Rails seems to make things unnecessarily complex. That, and I’ve found some bugs in the underlying Rails infrastructure (like activerecord not quoting the schema name when talking to PostgreSQL) that make me nervous about the stack.

But the site is up and running well now, so I’m happy, and am planning to keep working with Redmine for quite some time.

If Version Control Systems were Airlines

Many of you have seen the net classic If Operating Systems Were Airlines. Today, let’s consider what the world might be like if version control systems were airlines…

Before anyone gets mad, this is all in fun, OK?

RCS Airlines: One of the first airlines, from way back when this whole aviation thing was new and exciting. Each RCS flight carries exactly one passenger, which RCS believes is a superior way to fly. Although most RCS airplanes are rusty and battered today, RCS Airlines still retains its historic dedication to security. Each airplane is kept locked as much as possible for safety. Occasionally flights will be delayed for hours because the pilot can’t open the locked plane. When this happens, the pilot will frantically try to get the cell phone number of whoever it is that has locked the plane. When the plane finally gets unlocked, you may be tempted to ask why it was locked for so long. Veteran RCS users have learned that the answer is usually disgusting, and never ask anymore. Main competitor: CP/M airlines.

CVS Airlines: Founded on the belief that they could be more efficient than RCS by carrying multiple passengers per flight. They still carry each passenger in a separate RCS-built airplane, but the airplanes fly in a goose-like “,V” formation. Watch out for layovers, though. It can take hours to merge new passengers into the formation properly, and it might take several attempts to take off afterwards.

CVS flights often feature fights over who gets to fly. CVS piloting fights are legendary; rumor has it that OpenBSD got started after CVS airlines refused to allow a passenger to board on the grounds that he had in the past refused to stow his tray table in the upright and locked position.

CVS airlines mostly counts as customers the “over-50” crowd who grew up using CVS and don’t like change. Its in-flight magazine features advertisements for balding-reversal treatments and uuencode tools.

Main competitor: AIX airlines.

Subversion Airlines: Started by some grey-haired CVS executives with long, wispy beards, Subversion airlines got started by trying to be “CVS, but better”. Subversion airlines was the first major airline to use planes that seat more than one passenger. Unlike CVS airlines, all passengers on a Subversion flight travel in the same plane.

Subversion airlines is famous for its Soviet-like centralized control. All operations must be approved by the Kremlin, and you are allowed, by the grace of the Party Leader, to gaze at the massive airplanes. Those that have served the Party and Airline well for many years are allowed to enter the Great Shrine of the First-Class Comitter, and actually make changes to the airplanes themselves. Plainclothes Subversion Airlines security agents lurk on every flight, and you should not be surprised to be thrown out an airplane window if you make a joke in bad taste about the pilot’s flying skills.

Subversion airlines thrives on the concept that “photocopying is cheap”. You are encouraged to make photocopies of your ticket, or to photocopy your photo ID, and give copies of each to as many people as you can. At checkin time at the gate, if more than one person arrives with a copy of the same ticket, they are ushered into the “merging room” and each person is given a brick. The door is closed, something magical occurs, and the one person that emerges still able to walk is allowed to board the plane.

Main competitor: Windows airlines with no Administrators allowed.

tla airlines: Founded by one of those eccentric British noblemen, Lord Tom’s airline is the utopian philosopher’s airline. Chafed by the heavy-handed control of Subversion Airlines, tla airlines wants every passenger to be created equal. As you approach the gate area in the terminal, you will find many philosophers occupying the gate area, extolling the virtues of tla airlines. They compare tla airlines to reaching out and touching the heavens, leaving behind the bonds of a ground-based life, actually merging with the stars. Oh, the gorgeous beauty of it all! The things we will see!

As you see people arriving from another flight, you observe that some of them have burn marks. One of them comments that “merging with the stars doesn’t work.” Immediately, a dozen philosophers get in a fight with him, claiming that he simply doesn’t understand what it means to merge with the stars, and that if he gets his inner being in the proper state first, he’ll have a much better experience.

As you board the tla airplane, you obvserve that the jetway is a mile long. The airplane itself reminds you of something of a cross between a gothic cathedral and a level of Doom. There are spectacular archways everywhere, sometimes where they don’t really belong. Each archway is supported by ornate curly braces which you don’t normally see on airplanes, and frankly, you’d rather not, because they look all pointy and confuse the kids.

As you arrive as your destination terminal, you see it too is full of philosophers, most of them dining.

Main competitor: VMS airlines.

Darcs Airlines: Unlike every other airline, this one uses physicists instead of engineers to design its airplanes. One brilliant Darcs physicist has finally come up with The Theory of Everything, and as such, Darcs knows where you want to go before even you do. Darcs airlines prides itself on customer service, and asks your preference for even the tiniest details about your trip.

Each seat pocket features a copy of the Theory of Everything for your reading enjoyment, but nobody actually understands it.

Occasionally, you will find that Darcs pilots get into angry conflicts with the control tower in mid-flight. This results in the control tower revoking your permission to land. Legend has it that one Darcs pilot of a plane with exceptionally large fuel tanks actually resolved his conflict with the tower and landed two weeks after taking off. Experienced Darcs users board with several parachutes: one for themselves, and a few more for the newbies.

The Darcs physicists claim that the Theory of Everything predicted the pilots would act this way, and that all pilots eventually act this way throughout the entire universe. They toil day and night finding a way to adjust the gravitational constant of the universe, thereby reducing the anger factor of the pilots.

Main competitor: OS/2 airlines.

bzr airlines: Founded by a South African who had been injured by a curly brace on tla airlines, bzr airlines aims to be “tla done right”. They have shortened the jetway, gotten rid of the curly braces, chased out the philosophers, and no longer have a vision of merging with the stars. Many that were injured on tla airlines fly bzr airlines, and out of respect for tla airlines, bzr airlines will still honor tla tickets.

bzr passengers consider themselves part of an exclusive club because each flight takes off from a launchpad. They often can be seen standing in the terminal passing out bzr literature, trying to get passengers of other airlines to fly bzr, and can’t understand how other airlines continue to exist while people keep walking past their airplanes.

Main competitor: BeOS Airlines.

Bitkeeper Airlines: One of the world’s faster airlines, Bitkeeper airlines occupied that obscure gate for rich people at the end of the terminal for many years. Tickets on Bitkeeper Airlines were rumored to cost thousands of dollars, and were rare and jealously guarded. Then for awhile, Bitkeeper Airlines started giving away tickets for free, though they also kept around the expensive tickets for those with discriminating tastes. Free tickets were made widely available, but the 3-point type on the back of tickets said that you were never allowed to think about another airline before, after, or during your flight, and some people claimed they actually saw the small print morphing right before their eyes.

Bitkeeper flights often featured arguments over whether people were harboring secret thoughts of other airlines. If you were caught thinking about another airline, you were expected to scream vigorously while being thrown out the escape hatch without a parachute. All of this commotion tarnished the rarified air that the rich people paid to experience, so one day it was decided that there would be a Great Purge, because obviously all free ticket holders had harbored lustful thoughts of other airlines, so they were all thrown off the airplanes simultaneously. Today, people aren’t exactly sure where the Bitkeeper gate is, but everyone suspects it still lurks somewhere.

Main competitor: SCO Airlines.

Mercurial Airlines: The “there’s one right way to do it” airline, Mercurial is a sterile, agile, and shiny airline. Every Mercurial airplane looks identical to every other one, shiny and clean. You could swear that all the passengers look alike too, and as you approach the gate, it seems like you too look like everyone else. Mercurial passengers tend to be a happy bunch, who can’t comprehend anybody that flies Git Airlines. Specks of dirt and dust confuse the pilots, so it is best to make sure you have showered before boarding. It is rumored that, through bolting on more engines, some Mercurial airlines can fly to as many places as Git airlines can, but most Mercurial passengers are content to not worry about that.

Main competitor: Python Airlines.

Git Airlines: The “there’s more than one way to do it” airline, Git flies the world’s largest and fastest airplanes. Git Airlines was founded by some priests who were flying for free on Bitkeeper Airlines and survived the fall after the Great Purge. Git airplanes start as spartan, empty cabins, with no carpeting, chairs, or piloting controls. At the departure gate, each passenger is handed a bag containing 173 standard airplane components, accompanied by a 4×5″ sheet of information on the theory of flight, written in 1950. Once onboard, the passengers use these components to finish out the airplane for flight: installing chairs, rudder controls, etc. Every flight results in a plane assembled in a different way, and passengers on each flight believe they are flying the world’s best airplane. Arguments in the terminal after a flight are common, as passengers from different flights debate the merits of their particular design.

Despite all this, Git planes turn out to be safe, and Git passengers believe they get to their destinations in half the time it takes any other passengers, though sometimes they secretly wonder if the Mercurial flight got there faster. Occasionally, passengers on Git airlines build an airplane that appears to go into a tailspin. When that happens, they simply assemble a tool that lets them go back in time and change history so that it doesn’t crash, although it is rumored that if you are a member of the public watching this happen from the ground, it will lead to seizures.

Git airlines takes special pride in the one piece that passengers don’t have to assemble: the plumbing. Every Git lavatory is equipped with state-of-the-art never-fail plumbing, and the best porcelain washroom fixtures money can buy. None of these cheap plastic toilets like you get on every other airline. Here, we have fine porcelain fixtures.

During a flight, after passengers use the lavatory, they frequently get into arguments with each other about which style of porcelain toilet is the best. These arguments are only resolved by the Zen-like Git Priests, who insist that only inferior passengers need to use a toilet while in the air.

Main competition: Perl Airlines.

Git Feature Branches

I’m really liking this.

So I set up some Git feature branches to help get Redmine patches from their BTS into their SVN trunk faster. (I don’t know why, but it seems to take a *very* long time for that to happen.)

Each BTS patch gets a Git feature branch. My Git repo for this project has about 21 branches in it.

So, I pull upstream into a branch called, well, upstream.

Each feature branch is created off upstream.

Then, the master branch merges all the feature branches in. I wrote a simple git-merge-fb shell script that just runs git-merge for each feature branch. Very simple. I expect to have a git-pull-fb script of some sort that merges upstream into each feature branch when I update against upstream. It could also run a diff at the end to see if there is any difference remaining, and if not, delete the branch.

It’s trivial to give an updated diff to upstream for any given patch: git diff feature-blah..upstream will do it.

I only wish gitweb had a way to do that so I could just hand out a URL that always corresponds to the latest diff against upstream for a given feature. Now that would rock.

Thoughts on Redmine

A few days ago, I discussed Trac and Redmine. Redmine is a project management tool, similar to Trac, with built-in download tools, bug tracking, etc.

Redmine has a lot of nice features. Chief among them is better integration between multiple projects, so I don’t have to go to 17 separate pages to see the open bugs on my projects.

But I’m worried about the Redmine community. It appears to live in an insular Ruby world, without much participation outside. I wrote about some of those concerns in their forums. I’ve also submitted bugs to Redmine, some with patches.

Also, it’s concerning that, although Redmine includes a very nice forum module, the Redmine forums are still on RubyForge. Also, there are many bugs in the Redmine BTS that have patches but little, if any, comments from the Redmine people that have commit access.

It could just be that Redmine is a fairly new project and just needs some time to get on its feet more. It’s been around since July, 2006, which isn’t all that long on the one hand… or quite awhile, depending on how you look at it.

The git support patch for Redmine looks very nice. However, after a month, it still hasn’t been replied and there’s no indication why. Which also is troubling.

So I think I’ll sit with Trac for a little while until I get a better feel of how Redmine is progressing.

hg.complete.org is no more

As of today, hg.complete.org is no more. I have removed mercurial and hgwebdir from my server, removed hg from my DNS zone, and converted everything that was in Mercurial over to Git. (Except for hg-buildpackage, which I have orphaned) So there is now stuff at git.complete.org.

I still have a ton of Darcs repos to convert, which will take more time.

Also I have heard a lot of people say that the GitPlugin for Trac is not very good. I have two Trac instances running it: one for commithooks and another for ListLike. Both seem OK so far, but I haven’t pushed them very much yet.

Trac & Git

For quite some time now, I’ve been running Trac over at software.complete.org. Most of my free software projects — well, the ones where I actually go to the effort to make formal releases — have a Trac instance. This Trac instance provides a wiki, bug tracker, downloads area, timeline (with RSS feeds), and VCS integration.

Trac is a nice program, but one thing has bugged me about it all this time:

Every trac instance is its own island.

I have 17 trac instances out there for my projects. To see what bugs are out there on my own server, I have to check 17 websites (or 17 RSS feeds or whatnot). Publishing a new program is not a lightweight process.

So today I started poking around looking for something better. I really like Trac’s way of integrating the wiki with the BTS and the commits; wiki markup can refer to a bug or a changeset, and bugs can use wiki markup too.

I looked at Redmine, Mantis, and Roundup, and I also have experience with RT.

Of these, Redmine looks the most interesting. Multiple projects support, per project wiki and forums, gantt charting even, and support for SVN, CVS, Mercurial, Bazaar, and Darcs — with Git support out there as patches to their development tree already too. Oh, and I saw references to a Trac importer as well. One thing that makes me nervous, though, is that they have no links to sites that use Redmine (except one in the news section), and Google isn’t turning up users either. Does nobody use this thing?

What else should I be looking at?

Over on the Git side, I’m still liking Git. I have now migrated several Mercurial projects over to git (see git.complete.org). I am also playing with Darcs to git migration using darcs2git, which also is going well. Sometimes gitk shows a nicer representation of a Git repo converted from Darcs than I was able to get from Darcs.

Experimenting with Git

I’ve been writing about Git a bit lately.

I’ve decided to switch some of my Debian work over to it to start with, as well as some of my other projects.

Although I was thoroughly frustrated with Git a year ago, now I am quite pleased with it. What’s different? The documentation is a LOT better. So far I have only found one manpage (git-show) that omits lots of its options. The system is friendlier, keystroke-happier, and powerful.

Compared to Mercurial, I’ve found some nice things:

In-directory branching. I didn’t expect to care about this, since both git and hg permit lightweight clones. But it turns out to be so easy to use that it is great. Especially since I don’t have to setup multiple branch repos on the server. I really like this. Note that “hg branch” is not the same as a git branch, and see the discussion on the hg lists about renaming that before 1.0.0 for why.

Flexibility in getting things around. Plain HTTP works fine (no static-http:// hack). ssh. git daemon. rsync. Very slick.

Performance. Surprisingly, git actually feels faster than Mercurial, especially when pushing or pulling. I didn’t expect that.

Tags. They seem smarter in git. No more merging of .hgtags all the time. Also I like that I can attach a message to a tag and sign it.

All that power. There is a *lot* that Git can do. I should have been taking notes about it all.

My main complaint is still that Git doesn’t have something as nice as “darcs send”. Mercurial doesn’t either, but it’s a bit closer. Git has moved closer, but still has room to improve on that.

So I have set up git.complete.org and am starting to publish my Debian stuff on Debian’s alioth server as well.

Also, hg-fast-export in the fast-export project is *awesome*. Branch-aware and everything. It made a perfect Git version of my Mercurial work.

Git looks really nice, until….

So I have been learning about Git this weekend. It has some really nice-looking features for sure — some things Mercurial doesn’t have.

I was getting interested in switching, until I found what I consider a big problem.

Many projects that use git require you to submit things using git-format-patch instead of pushing/pulling from you. They don’t want your merge history.

git-format-patch, though, doesn’t preserve SHA1s, nor does it preserve merges.

Now, say we started from a common base where line 10 of file X said “hi”, I locally changed it to “foo”, upstream changed it to “bar”, and at merge time I decide that we were both wrong and change it to “baz”. I don’t want to lose the fact that I once had it at “foo”, in case it turns out later that really was the right decision.

When we track upstream changes, and submit with git format-patch, the canonical way to merge upstream appears to be:

git fetch; get rebase origin/master

Now, problem with that is it loses your original pre-conflict code on a case like this.

There appears to be no clean way around that whatsoever. I tried a separate “submission” branch, that rebases a local development-with-merge branch, but it requires a ton of git rebase –skip during the rebase process.

Thoughts?

Revisiting Git and Mercurial

Exactly one year ago today, I wrote about Git, Mercurial, and Bzr. I have long been interested in VCS, and looked at the three main DVCS systems back then.

A Quick Review

Mercurial was, and for the moment, remains, my main VCS. Bzr remains really uninteresting; I don’t see it offering anything compelling that Mercurial or Git can’t do. My Git gripes mainly revolved around its interface and documentation. Also, I do have Windows people using my software, and need a plausible solution for them, even though I personally do no development on that platform.

Ted Tso wrote his own article in reply to mine, noting that the Git community had identified many of the same things I had ans was working on them.

I followed up to Ted with:

… So if Ted’s right, and a year from now git is easier to use, better documented, more featureful, and runs well on Windows, it won’t be that hard to switch over and preserve history. Ted’s the sort of person that usually is right, so maybe I should starting looking at hg2git right now.

So I guess that means it’s time to start looking at Git again.

This is rather rambly, I know. It’s late and I want to get these thoughts down before going to sleep…

Looking at Git

I started at the Git wikipedia page for an overview of the software. It linked to two Google Tech Talks about Git: one by Linus Torvalds and another by Randal Schwartz. Of the two, I found Linus’ more entertaining and Randal’s more informative. Linus’ point that CVS is fundamentally broken, and that SVN trying to be “a better CVS” (an early goal of svn, at least) means it too is fundamentally broken, strikes me as quite sound.

One other interesting tidbit I picked up is that git can show you where functions have moved from one file to another, thanks to its rename-detection heuristic. That sounds really sweet, and is the best reason I’ve yet heard for Git’s stubborn refusal to track renames.

The Landscape

I’ve been following Mercurial and Darcs somewhat, and not paying much attention to Git. Mercurial has been adding small features, and is nearing version 1.0. Darcs has completed a major overhaul both of its repository format and internal algorithms and is nearing version 2.0, and appears to have finally killed the doppleganger (aka conflict spinlock) bug for good.

Git, meanwhile, seems to have made strides in usability and documentation in its 1.5.x versions.

One thing particularly interesting to me is: what projects are using the different VCSs. High-profile projects now using Mercurial include OpenSolaris, OpenJDK (Java 7), and Mozilla’s projects. Git has, of course, the Linux kernel. It also has just about everything associated with freedesktop.org, including X. Also a ton of Unixy stuff.

Both Mercurial and Git communities are working on TortoiseHg/TortoiseGit types of GUIs for Windows users. Git appears to have a sane Windows port now as well, putting it on pretty much even footing with Mercurial and Darcs there. However, I didn’t spot anything with obvious Windows ties in the Git “what projects use git” pages.

The greater speed of Mercurial and Git — even for pushing and pulling small patches — likely will keep me away from Darcs for the moment.

Onwards…

As time allows (I do have other things keeping me busy), I plan to install git and work through some tutorials and try to use it in practice as much as possible, to get a good feel for it.

Future

It is beneficial to be using a VCS that is popular, though that is certainly not a major criterion for me. I refuse to use SVN because its lack of distributed functionality makes it too unproductive to be useful. But it looks like Git is gaining a lot of traction these days, especially in Debian circles, which also makes it more interesting.

I notice that Ted did convert e2fsprogs over to git as he said he might, incidentally.