Some more git, mercurial, and darcs

Ted Ts’o had an interesting post about git recently. He has a lot of good thoughts on the subject. He comments that he wound up using git because it’s so Unixy (with its small commands to do things), that he sees the git community developing innovations faster than Mercurial, and that they are working to improve the documentation and user interface problems.

The being so Unixy is a double-edged sword. On the one hand, it can make it easy to write shell scripts to extend Git. That itself can be a double-edged sword (think filename quoting and the like). But one doesn’t have to use the shell. The other downside is that being Unixy makes it hard to run on platforms that aren’t, such as Windows. So if one is working on Unix-only software (X, the kernel, e2fsprogs, etc.), there’s no need to care about it. But if you’re a person like me, who has Windows users using my software, or a large organization like Mozilla, it’s maybe a showstopper. Of course, workarounds exist (cygwin, git-cvsserver), but none of them are particularly nice.

I think that both Git and Mercurial are working to address their shortcomings. I’ve chosen hg for now because it does what I need now. And because there are very nice tools to convert hg to git, and vice-versa. So if Ted’s right, and a year from now git is easier to use, better documented, more featureful, and runs well on Windows, it won’t be that hard to switch over and preserve history. Ted’s the sort of person that usually is right, so maybe I should starting looking at hg2git right now

So following up on my bzr post, here are the things that Mercurial is great at right now:

  1. Performance. Approximately even with git, occasionally faster. Nobody else can compete with these two right now.
  2. Simplicity. It’s almost as easy to get started as with darcs, and with recent patches, will be even closer in the future.
  3. Lots of ways to interact. You can send hg bundles, which preserve all metadata (parents, hash, authors, etc), or you can send git-format email patches, or you can push and pull between repos. The email tools will shortly be able to automatically detect what patches to send. Your choice. git doesn’t seem to support lossless emailing of bundles like this, and bzr doesn’t make emailing of anything easy by default.
  4. Merging. hg seems to be able to automatically resolve more merge conflicts than anything else, and when it can’t automatically resolve them, has a nicely configurable system to let you use your choice of tool to manually resolve them.
  5. Community. The Mercurial community is open and inviting, and open to new/different ideas. It seems similar to Darcs in that respect, and somewhat dissimilar to git.
  6. Rebase does not trash history like it does (barring undocumented manual intervention) in git.

I’ve written before about Darcs, so I won’t duplicate that here.

12 thoughts on “Some more git, mercurial, and darcs

  1. I also prefer Mercurial to tla/bzr/git, but one thing I hate is the need to ‘hg merge’ after your 2 branches differ by a single (nonconflicting!) changeset.

    Darcs is the clear winner in this one. I wonder why is it the only one vcs that does not use the ‘tree of changesets’, but the superb ‘pool of changesets’.

  2. Is Mercurial really fast on big projects?

    Clone OpenSolaris. Do “hg log” and find the last two changeset ids. Now do “hg up ” to move between them.

    That’s around 5 seconds even if only one file has changed. That doesn’t seem fast to me. Git is much quicker.

    Other than that Mercurial looks really promising.

  3. 1. Actually, in terms of speed, usually git is slightly faster than hg. There are two reasons for this. First, git repositories now are more disk efficient than hg. An e2fsprogs repository in .hg is currently 20,420k, while the same repository converted to git takes 11,996k. So there is simply less disk to read. In addition, there are a number of optimizations which are based on git’s tree/subtree revision control model which doesn’t work as well given how hg stores each revision separately on a per-file basis; git can determine that two subtrees are identical by comparing a single SHA1 hash, whereas hg requires checking every single file’s SHA1 hash.

    2. Simplicity — no contest. Hg is simpler, but part of this is it has less functionality. Still, if it has the functionality you care about, the lack of extra commands is a virtue.

    3. “git bundle” will be in git 1.5.1, which will be releases shortly; git 1.5.1-rc2 is quite stable, and is available already today.

    4. “git mergetool” was written by yours truly, and will also be in git 1.5.1. It has pretty much all of the functionality of hg’s merge integration, with the exception of MacOS’s opendiff/FileMerge, for which someone has submitted patches but I haven’t had a chance to download MacOS’s developers tool so I can test the patch.

    5. The mercurial community is probably more receptive to ideas, yes. The active community on the git list seems slightly larger, but that’s hard to judge. The git list is definitely more active. As far as being receptive to new ideas, part of it is that Linus has pretty strong ideas over what will work and not with SCM systems. If you try to do something that doesn’t agree with his philosphy, he isn’t afraid to say you’re wrong. The thing is, very often he’s right, although sometimes I wish he would do so with a tad bit more diplomacy.

    6. Does mercurial have rebase functionality at all? I didn’t think it did. You can pull a branch and merge it into tip of another branch, and it won’t lose the history, yes, but you can do the exact same thing with git. “git rebase” is an option, but you don’t have to use it.

    I’ll note that sometimes trashing history is specifically one of the things that various development processes actually want. For example, requests have come from more than one project to both mercurial and git that they implement “bk collapse”, which takes a series of commits, and makes them disappear into a single commit. It turns out it’s much harder to implement this in hg, because of how it stores its file-level revision information. Fundamentally, hg makes it harder to drop changesets, which can either be viewed as a feature or a bug. Myself, I like to be able to do development in internal branches, and then drop them later on without penalty, and that’s something which in the hg model is best done by cloning an entire separate repository. But everyone’s workflow is different….

    1. Hi Ted,

      Thanks as always for your feedback.

      Regarding #1, I hg cloned your e2fsprogs repo and get 15,676K in Mercurial. I wonder if your repo hasn’t yet switched to hg’s revlog-NG? Still bigger than git, but about half the difference as before.

      Regarding #6, the transplant extension does the rebase. hg strip can prune of unwanted revisions out of hg trees as well.

      1. Good point, yes, I hadn’t converted e2fsprogs to use revlog-NG. I had forgotten about that; I had run across the new format a week or two, and wasn’t sure what the compatibility story was with revlog NG, since I run bleeding-edge development tip for both hg and git, and I couldn’t find any documentation about what the backwards compatibility impacts might be of using the new format. But yes, it would be much fairer to use the revlog-HG size when comparing against git (although if you are willing to set repack.UseDeltaBaseOffset, which will make your local repo directory incompatible with git versions older than 1.4.3, the git repository shrinks from 11,996k to 10,740k, and if you pass –window=100 –depth=30 and manually run git-repack, you can push the repository size down to 9,988k, although with the tradeoff that certain repository operations will be slowed down — although with some of the new optimizations in git 1.5.1, not by as much, so perhaps we need to revisit the default settings of –window and –depth when repacking.)

        The trouble with “hg strip” is that it can only strip the most recent revisions that are closest to the tip. If you create internal branches in hg, and then later commit or pull new revisions on top of your changes, and then you want to drop the internal drop, I believe right now the only way to do it require rewriting the whole repository.

        1. RevlogNG is just a different on-disk representation of the same data structure. While older hg versions can’t read this directly, you can use ‘hg serve’ on this repo and pull with an old client.

          If you use bundle repositories with hg (which are again just a different representation of the same data), you can shrink the size of current e2fsprogs repo to 7948K, though using it is only partly wired to the UI:

          cd e2fsprogs
          hg bundle –base null ../e2fsprogs.hg
          hg init ../dummy
          cd ../dummy
          hg -R ../e2fsprogs.hs

          One “small” downside is, that the bundle repository is read-only, but that doesn’t matter if we’re just comparing repository sizes.

  4. Oh, one more thought. Neither git or hg is doing anything more intelligent with merging other than using 3-way merges. For example, neither is using the Codeville merge algorithm, which works out to be equivalent to what Bitkeeper is using. However, git does have one trick up its sleeve which as far as I know hg doesn’t have today.

    If you create the directory .git/rr-cache, then if you have a long-lived branch where you are developing some feature that isn’t quite ready for mainline, but while you are developing it, you are constantly pulling and merging from the mainline, this can result in the same conflicts needing to be resolved over and over again. This is sometimes called “cross-cross merges”, and if the .git/rr-cache directory is created, git will remember how particular merge conflicts were resolved, and try to resolve them the same way in the future, automatically. More information about this can be found here.

  5. Bazaar makes it easy to send bundles that preserve history:

    bzr merge-directive

    generates a bundle containing all the new changes compared to the parent branch. In the typical use of branching to implement a feature or fix which is then integrated back in, you don’t need to give any other arguments.

    People often save this to a file and then post through their regular mail agent, but you can also send it directly with –mail-to.

    I believe git added support for this after a crossposted thread which discussed how well it worked in Bazaar.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.