Rebase Considered Harmful

Today I was musing about different version control systems and merge algorithms. I’ve been thinking specifically about how I maintain Debian packages in Darcs. I tend to import upstream tarballs into one branch, and maintain the Debian packages in another, simply merging when a new upstream is released.

Now, there seem to be two prevailing philosophies on how to handle merges in this case. I’m thinking here about merges back to upstream. Say I want to contribute my Debian patches to them.

  1. Commit “clean” patches upstream. Don’t have a bunch of history — the fixing typos commits, the fixing bugs commits, or the merging to track new upstream releases. Just something like a series of diffs against the current head.
  2. Bring across the full history, warts and all, and keep it around permanently.

git encourages option , with its rebase option. Darcs encourages option (though some use its amend-record option to work more like ).

As I got to thinking about it, it occured to me that git-rebase would be very nice if you are going to use philosophy . In short, rebase will remove your local patches from a repo, update it to the latest upstream, then re-apply your local changesets — aborting to have you fix any conflicts. This is as opposed to a more traditional merge, where you add the upstream changesets to your local branch and then commit new changesets to resolve conflicts. (So a rebase would be totally useless in situation )

I got to thinking about this, and started wondering what would happen to people that I’m working with that in turn work off my branches. And sure enough, the git-rebase manpage says, “When you rebase a branch, you are changing its history in a way that will cause problems for anyone who already has a copy of the branch in their repository and tries to pull updates from you.”

I maintain, therefore, that git-rebase is evil and should be avoided. It only works for a situation where someone maintains a private branch of a project, never shared in any way except to submit patches to an upstream. Forget it if you have a team maintaining that branch, or want to post that branch online for others to help with (as I do with my Debian darcs package). Even if you keep it private now, do you really want to adopt a work process that forces you to keep it private forever, or else completely change how you work?

And this brings me back to the original question of patch philosophy. Personally, I dislike philosophy . I’d much rather have the full history of a change, warts and all. Look at the Linux kernel example: changesets that introduced bugs that made it into the official tree have their fixes documented, but changesets that introduced bugs that were fixed before being merged into the official tree could be lost to the public due to rebasing by submitters. Is that really what we want? I don’t think so.

With Darcs, tagging is very cheap and it is quite trivial to write an “apply a changeset bundle” script that makes a before tag, applies a series of patches, and makes an after tag. One could then run a darcs diff between the two tags to see the net effect on the repository, or could still look at the individual patches. (Or, you can avoid tagging and manually specify the “from” and “to” patches.) I find that a much better model: you can have it both ways. I’d think that most modern VCSs ought to support some variant on that, too.

And I think that git-rebase should be removed on the grounds that it encourages poor version tracking practices.

22 thoughts on “Rebase Considered Harmful

  1. Some excellent reasons exist to clean up history before you submit it
    upstream. First, having more buggy versions in version control makes
    git-bisect harder, because you run into those buggy versions. Second, you
    want to ensure that you order patches so that every version builds; hence,
    when submitting a new driver to Linux, you put the patch with the Makefile
    last.

    I agree with you that git-rebase doesn’t make sense on shared repositories.
    Any repository with more than one user should certainly not use git-rebase. A
    “copy-rebase” technique works fine, though; make a new branch at the same
    location, and rebase the *new* branch.

    1. > First, having more buggy versions in version control makes git-bisect harder, because you run into those buggy versions.

      This is like “we have to do this to support the broken tool”. The right thing here is to only count the merge revision when bisecting.

      > Second, you want to ensure that you order patches so that every version builds; hence, when submitting a new driver to Linux, you put the patch with the Makefile last.

      Similarly, the key is to only count revisions committed directly to the mainline, including merges.

      1. I disagree with you on the notion that git is equivocated by design.

        Rebase is a wonderful tool, and bisect works really nice with it. I wouldn’t want to bisect merges only as that would be limited in nature. I like the paradigm of making each commit a complete patch. It makes following the history much easier.

        1. Following history much easier but a pain for developer , it is really a pain when you end up spending time resolving the conflicts.

  2. I recently wrote an article or two about distributed version control systems.

    I’ve been using Darcs since 2005. I switched to Darcs, in fact, 10 days after the simultaneous founding announcements of git and Mercurial.

    Overall, I have been happy. I

  3. Rebase is a much misunderstood tool. Use it as intended as Josh mentioned. I agree that one should avoid rebasing any published commits.

    Personally, I like to use rebase -i all the time to reorganize my commits by splitting, squashing and fixing comment etc. It’s part of my dev process for partitioning and composing code. But once it’s published, it’s documented history and should not be changed to avoid confusion.

    I’ve always cringe when I see cvs/p4/svn people pushing huge and sometimes broken stuff into a central repository, simply because they want a “checkpoint”, due to large un-versioned changes already built up in their trees, simply because they have tools like git rebase -i or git add -p to pick and choose patches within files. Ugh.

  4. I agree with vicaya. Having the possibility to edit your history (rebase -i) is a great thing, because it enables you to group changes in a logical way while working on a feature.
    Another use is to keep branches up-to-date: e.g. in a web application, I have a “testing” and a “production” branch, the latter being split off an earlier “testing” version with some additional changes (e.g. db configuration). Now I decide that I want to roll out tested things — I simply do “git rebase testing production”, and the production branch is up-to-date, yet still preserves its additional modifications.
    Apart from that, you write “… rebase will … — aborting to have you fix any conflicts”. This is utter nonsense. Rebase will require you to resolve conflicts in the same way that merge would. Try this:
    [font=monospace]git init; echo abc >foo; git add foo; git commit -m 1; git branch test; echo def >>foo; git commit -am 2; git checkout test; echo ghi >>foo; git commit -am 3; git rebase master[/font]

    All in all, I could not disagree more with your final conclusion that git-rebase should be removed because it encouraged bad practices. Just because it isn’t useful to you, it doesn’t mean that there’s no use for it at all (see above). Git gives you the tools for both philosophies, it’s your decision whether to rebase or to merge.

  5. Well…. should command rm be removed from operation system, just because “rm -Rf /” may cause a lot of damage? No, you just have to know how to use tool and how not. I’ve found rebase command very useful in some merging tasks, witch i do in local repository. It very useful, if there is need to combine, remove or reorder patches before actual merge.

  6. Hi John,

    I’m glad to have come across your post via the google query ‘git merge rebase’ precisely because I didn’t know when either was more appropriate. Despite the grilling responses, I’ve found the entire thread very informative.

    I am using git locally to sidestep the finer ‘features’ of clearcase. I use git-rebase between my various branches and master where I perform clearcase rebase/deliveries. For me, master is the central repository and cc-rebases are indeed modifications in history.

    Cheers,
    Alex

  7. “Look at the Linux kernel example: changesets that introduced bugs that made it into the official tree have their fixes documented, but changesets that introduced bugs that were fixed before being merged into the official tree could be lost to the public due to rebasing by submitters. Is that really what we want? I don’t think so.”

    I’m not quite following you as to why this would be a bad thing. If a bug was fixed before it makes into the official tree, why would the public care about it? IOW, what’s so important for the rest of the world to see:

    commit a: the big shining feature!
    commit b: oops, fixed a typo in the big shining feature.

    instead of:

    commit a’: the big shining feature!

    ?

    I doubt even a maintainer needs/wants to see all the noises in the patch series.

  8. Of course it depends of the nature of the patch and the sub commits. Sometimes it is just noisier when you are doing something locally and just commit often.

    In the end just look at your commit message, if you need to write a whole page of changes you probably should break it into smaller steps.

    The point is that even ‘atomic’ feature can be broken into logical steps, which are much easier to review. But anyway git is no forcing you to do anything.

  9. Rebase is a more misunderstood tool. Use it as intended as Josh mentioned. I agree that one should avoid rebasing any published commits.

    Actually, I like to use rebase – all the time to reorganize my commits by splitting, squashing and fixing comment etc. It’s part of my dev process for partitioning and composing code. But once it’s published, it’s documented history and should not be changed to avoid confusion.

    I’ve always cringe when I see cvs/p4/svn people pushing huge and sometimes broken stuff into a central repository, simply because they want a “checkpoint”, due to large un-versioned changes already built up in their trees, simply because they have tools like git rebase -i or git add -p to pick and choose patches within files. Ugh

  10. IMHO more power is always better. If developers can’t handle the danger of rebase than git isn’t the tool for them (I hear Visual Source Safe is good).

    Even when working with other developers at my job we work for a while, then we clean up the commits via rebasing and start working on a new branch, keeping the old one just in case. It works great. Before we push to production we also rebase in order to clean everything up and make our jobs easier when we have to look through history later.

  11. Any powerful tool can be misused. The git manpage warns against rebasing published commits. What more do you want?

    If you don’t understand how to safely use a chainsaw, then don’t touch it. If you don’t understand how to safely rebase, then don’t do it on any codebase you care a lot about.

    If you don’t think the man page is clear enough, there are plenty of good git books and online resources to help you. Read them and learn, or ask a more experienced git user to help you.

  12. It could be renamed to something like “private-branch-rebase”. Possibly git could work out if this branch or something derived from it has been pushed as well and it could warn or abort the process.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.