Recent PhotosCalendar
ArchivesQuicksearch |
Monday, August 14. 2006HP Officially Supports DebianThursday, August 10. 2006
Posted by John Goerzen
in Software at
19:51
Comments (24) Trackbacks (3) Defined tags for this entry: darcs, version control
Whose Distributed VCS Is The Most Distributed?
Lately I have been trying out a number of distributed version control systems (VCS or SCM).
One of my tests was a real problem: I wanted to track the Linux 2.6.16.x kernel tree, apply the Xen patches to it, and pull only specific patches (for the qla2xxx driver) from 2.6.17.x into this local branch. I wanted also to be able to upgrade to 2.6.17.x later (once Xen supports it) and have the version control system properly track which patches I already have. But before going on, let's establish what it means to be an ideal distributed VCS:
There are also some things that we would generally want:
EvaluationLet's look at some common VCSs against these criteria. I'll talk about Arch (tla, baz, etc), bzr (bazaar-ng), Darcs, Git, Mercurial (hg), and Subversion (svn) for reference. 1. The fundamental method of collaboration must be a branchAll of the tools pass this test except for svn. 2. Branching should be cheapEveryone except svn generally does this reasonably well. The tla interface for Arch had a pretty terrible interface for this, so it took awhile simply due to all the typing involved. That's better these days. Darcs supports hardlinking of history to other local repositories and will do this automatically by default. Git also supports that, but defaults to not doing it, or you can store a path to look in for changesets that aren't in the current repo. I believe Mercurial also can use hardlinks, though I didn't personally verify that. bzr appears to have some features in this area, but not hardlinks, and the features were too complex (or poorly documented) to learn about quickly. svn does not support branching across repositories, so doesn't really pass this test. Branches within a repository are not directly supported either, but are conventionally simulated by doing a low-cost copy into a specially-named area in the repository. 3. Merging between branches is intelligentArch was one of the early ones to work on this problem. It works reasonably well in most situations, but breaks in spectacular and unintelligble ways in some other situations. When asked to merge one branch to another, Darcs will simply merge in any patches from the source branch onto the destination which the destination doesn't already have. This goes farther than any of the other systems, which generally store a "head" pointer for each branch that shows how far you've gone. (Arch is closer to darcs here, though ironically bzr is more like the other systems) Merging between branches in svn is really poor, and has no support for recognizing changesets that have been applied both places, resulting in conflicts in many development models. 4. Inividial changesets should be mergeable without bringing across the whole historyDarcs is really the only one that can do this right. I was really surprised that nobody else could, since it is such a useful and vital feature for me. Both bzr and git have a cherry-pick mode that simulates this, but really these commands just get a diff from the specific changeset requested, then apply the diff as with patch. So you really get a different changeset committed, which can really complicate your history later -- AND lead to potential conflicts in future merges. bzr works around some of the conflict problems because on a merge, it will silently ignore patches that attempt to perform an operation that has already occured. But that leads to even more confusing results, as the merge of the patch is recorded for a commit that didn't actually merge it. (That could even be a commit that doesn't modify the source.) Sounds like a nightmare for later. Arch has some support for it, but in my experience, actually using this support tends to get it really confused when you do merges later. Neither Mercurial nor svn have any support for this at all. 5. Branching preserves full historygit, darcs, and Mercurial get this right. Making a branch from one of these repos will give you full history, including individual diffs and commit logs for each changeset. Arch and bzr preserve commit logs but not the individual changesets on a new branch. I was particularly surprised at this shortcoming with bzr, but sure enough, a standard bzr merge from a remote branch commited three original changesets into one and did not preserve the individual history on the one commit. svn doesn't support cross-repo branching at all. 6. Merging preserves full historyAgain, darcs, git, and Mercurial get this right (I haven't tested this in Mercurial, so I'm not 100% sure). Arch and bzr have the same problem of preserving commit logs, but not individual changesets. A merge from one branch to another in Arch or bzr simply commits one big changeset on the target that represents all the changesets pulled in from the source. So you lose the distinctness of each individual changeset. This can result in the uncomfortable situation of being unable to re-create full history without access to dozens of repositories on the 'net. Subversion has no support for merging across repositories, and its support for merging across simulated local branches isn't all that great, either. 7. It is possible to commit, branch, merge, and work with history offlineEveryone except Subversion does a good job of this. 8. The program is fast enough for general-purpose useAll tools here are probably fast enough for most people's projects. Subversion can be annoying at times because many more svn commands hit the network than those from others. In my experience, Arch was the slowest. Though it was still fine for most work, it really bogged down with the Linux kernel. bzr was next, somewhere between arch and darcs. bzr commands "felt" sluggish, but I haven't used it enough to really see how it scales. Darcs is the next. It used to be pretty slow, but has been improving rapidly since 1.0.0 was released. It now scales up to a kernel-sized project very well, and is quite usable and reasonably responsive for such a thing. The two main things that slow it down are very large files (10MB or above) and conflicts during a merge. Mercurial and git appear to be fastest and pretty similar in performance. All of these tools perform best with periodic manual (or scheduled cron jobs) intervention -- once a month to once a year, depending on your project's size. Arch users have typically created a new repo each year. Darcs users periodically tag things (if things are tagged as part of normal work, no extra work is needed here) and can create checkpoints to speed checkouts over the net. git and Mercurial also use a form of checkpoints. (not sure about bzr) Subversion works so differently from the others that it's hard to compare. (For one, a checkout doesn't bring down any history.) ConclusionsI was surprised by a few things. First, that only one system actually got #4 (merging individual changesets) right. Second, that if you had to pick losers among VCSs, it seems to be Arch and bzr -- the lack of history in branching and merging is a really big issue, and they don't seem to have any compelling features that git, darcs, or Mercurial lack. #4 was a unique feature to Darcs a few years ago, but I figured it surely would have been cloned by all the other new VCS projects that have popped up since. It seems that people have realized it is important, and have added token workaround support for it, but not real working support. On the other hand, it was interesting to see how VCS projects have copied from each other. Everyone (except tla) seems to use a command-line syntax similar to CVS. The influence of tla Arch is, of course, plainly visible in baz and bzr, but you can also see pieces of it in all the other projects. I was also interested to see the Darcs notion of patch dependencies was visible (albeit in a more limited fashion) in bzr, git, and Mercurial. So, I will be staying with Darcs. It seems to really take the idea of distributed VCS and run with it. Nobody else seems to have quite gotten the merging thing right yet -- and if you are going to make it difficult to do anything but merge everything up to point x from someone's branch, I just don't see how your tool is as useful as Darcs. But I am glad to see ideas from different projects percolating across and getting reused -- this is certainly good for the community. Updates / CorrectionsI got an e-mail explaining how to get the individual patch diffs out of bzr. This will work only for "regular", non-cherry-picked merges, and requires some manual effort. You'll need to run bzr log, and find the patch IDs (these are the long hex numbers on the "merged:" line) of the changeset you're interested in, plus the changeset immediately before it on the same branch (which may not be on the same patch and may not be obvious at all on busy projects.) Then, run bzr diff -r revid:old-revid-string..new-revid-string. I think this procedure really stinks, though, since it requires people to manually find previous commits from the same branch in the log. Tuesday, August 8. 2006First steps with git and I'm not all that pleased
I have been tracking 2.6.16.x here because Xen doesn't have patches for 2.6.17 yet (why on earth that is, I don't know.) But I need a few 2.6.17.x patches to the qla2xx driver. So I figure this is an opportunity to learn git.
I learned git, but then quickly learned that I can't just pull random commits from one branch to another. I have to use git cherry-pick, which doesn't actually pull the commit unmodified -- it takes a diff, and commits a new patch based on that diff. So I expect problems later when I bump the local branch to 2.6.17. This seems depressingly like arch/baz/bzr, and is a good reason for me to stay with darcs for now. This same operation with darcs would have been trivial, AND darcs would have automatically pulled in prerequisite patches rather than giving a merge failure and making me find them manually. |
The ChangelogMost Popular TagsSyndicate This BlogBlog Administration |
|||||||||||||||||||||||||||||||||||||||||||||||||

Comments
Sat, 06.09.2008 01:00
Anything is fixable in the sam e sense that any turing compli ant programming language can d o what any other TC lang [...]
Fri, 05.09.2008 22:14
Sarah has her doctor, the doct or is in Wasilla, AK, which is 810 miles away from Junuea, w here she lives in the Go [...]
Fri, 05.09.2008 16:22
Anything is fixable. It might be hard, but it is doable. Every country has its problem s, too. There is no pan [...]
Fri, 05.09.2008 14:38
I always find it funny reading or hearing people talk about how one side (e.g. Republicans ) are all talk but don't [...]
Fri, 05.09.2008 02:07
Republicans give lip service t o freedom and liberty but usua lly deliver a fascist form of socialism. Democrats tal [...]
Fri, 05.09.2008 00:43
First off, you mentioned "the Republicans know that their po licies aren't working" which i sn't actually true. The [...]
Fri, 05.09.2008 00:00
Though I ride to work in nothi ng like hurricane winds, I hav e been using an electric bike and it really helps figh [...]
Thu, 04.09.2008 21:23
Except that the Democrats aren 't talking about ending war, t hey are just looking to shift focus from Iraq to Afgha [...]