Category Archives: Technology

Trac & Git

For quite some time now, I’ve been running Trac over at software.complete.org. Most of my free software projects — well, the ones where I actually go to the effort to make formal releases — have a Trac instance. This Trac instance provides a wiki, bug tracker, downloads area, timeline (with RSS feeds), and VCS integration.

Trac is a nice program, but one thing has bugged me about it all this time:

Every trac instance is its own island.

I have 17 trac instances out there for my projects. To see what bugs are out there on my own server, I have to check 17 websites (or 17 RSS feeds or whatnot). Publishing a new program is not a lightweight process.

So today I started poking around looking for something better. I really like Trac’s way of integrating the wiki with the BTS and the commits; wiki markup can refer to a bug or a changeset, and bugs can use wiki markup too.

I looked at Redmine, Mantis, and Roundup, and I also have experience with RT.

Of these, Redmine looks the most interesting. Multiple projects support, per project wiki and forums, gantt charting even, and support for SVN, CVS, Mercurial, Bazaar, and Darcs — with Git support out there as patches to their development tree already too. Oh, and I saw references to a Trac importer as well. One thing that makes me nervous, though, is that they have no links to sites that use Redmine (except one in the news section), and Google isn’t turning up users either. Does nobody use this thing?

What else should I be looking at?

Over on the Git side, I’m still liking Git. I have now migrated several Mercurial projects over to git (see git.complete.org). I am also playing with Darcs to git migration using darcs2git, which also is going well. Sometimes gitk shows a nicer representation of a Git repo converted from Darcs than I was able to get from Darcs.

Experimenting with Git

I’ve been writing about Git a bit lately.

I’ve decided to switch some of my Debian work over to it to start with, as well as some of my other projects.

Although I was thoroughly frustrated with Git a year ago, now I am quite pleased with it. What’s different? The documentation is a LOT better. So far I have only found one manpage (git-show) that omits lots of its options. The system is friendlier, keystroke-happier, and powerful.

Compared to Mercurial, I’ve found some nice things:

In-directory branching. I didn’t expect to care about this, since both git and hg permit lightweight clones. But it turns out to be so easy to use that it is great. Especially since I don’t have to setup multiple branch repos on the server. I really like this. Note that “hg branch” is not the same as a git branch, and see the discussion on the hg lists about renaming that before 1.0.0 for why.

Flexibility in getting things around. Plain HTTP works fine (no static-http:// hack). ssh. git daemon. rsync. Very slick.

Performance. Surprisingly, git actually feels faster than Mercurial, especially when pushing or pulling. I didn’t expect that.

Tags. They seem smarter in git. No more merging of .hgtags all the time. Also I like that I can attach a message to a tag and sign it.

All that power. There is a *lot* that Git can do. I should have been taking notes about it all.

My main complaint is still that Git doesn’t have something as nice as “darcs send”. Mercurial doesn’t either, but it’s a bit closer. Git has moved closer, but still has room to improve on that.

So I have set up git.complete.org and am starting to publish my Debian stuff on Debian’s alioth server as well.

Also, hg-fast-export in the fast-export project is *awesome*. Branch-aware and everything. It made a perfect Git version of my Mercurial work.

Git looks really nice, until….

So I have been learning about Git this weekend. It has some really nice-looking features for sure — some things Mercurial doesn’t have.

I was getting interested in switching, until I found what I consider a big problem.

Many projects that use git require you to submit things using git-format-patch instead of pushing/pulling from you. They don’t want your merge history.

git-format-patch, though, doesn’t preserve SHA1s, nor does it preserve merges.

Now, say we started from a common base where line 10 of file X said “hi”, I locally changed it to “foo”, upstream changed it to “bar”, and at merge time I decide that we were both wrong and change it to “baz”. I don’t want to lose the fact that I once had it at “foo”, in case it turns out later that really was the right decision.

When we track upstream changes, and submit with git format-patch, the canonical way to merge upstream appears to be:

git fetch; get rebase origin/master

Now, problem with that is it loses your original pre-conflict code on a case like this.

There appears to be no clean way around that whatsoever. I tried a separate “submission” branch, that rebases a local development-with-merge branch, but it requires a ton of git rebase –skip during the rebase process.

Thoughts?

Revisiting Git and Mercurial

Exactly one year ago today, I wrote about Git, Mercurial, and Bzr. I have long been interested in VCS, and looked at the three main DVCS systems back then.

A Quick Review

Mercurial was, and for the moment, remains, my main VCS. Bzr remains really uninteresting; I don’t see it offering anything compelling that Mercurial or Git can’t do. My Git gripes mainly revolved around its interface and documentation. Also, I do have Windows people using my software, and need a plausible solution for them, even though I personally do no development on that platform.

Ted Tso wrote his own article in reply to mine, noting that the Git community had identified many of the same things I had ans was working on them.

I followed up to Ted with:

… So if Ted’s right, and a year from now git is easier to use, better documented, more featureful, and runs well on Windows, it won’t be that hard to switch over and preserve history. Ted’s the sort of person that usually is right, so maybe I should starting looking at hg2git right now.

So I guess that means it’s time to start looking at Git again.

This is rather rambly, I know. It’s late and I want to get these thoughts down before going to sleep…

Looking at Git

I started at the Git wikipedia page for an overview of the software. It linked to two Google Tech Talks about Git: one by Linus Torvalds and another by Randal Schwartz. Of the two, I found Linus’ more entertaining and Randal’s more informative. Linus’ point that CVS is fundamentally broken, and that SVN trying to be “a better CVS” (an early goal of svn, at least) means it too is fundamentally broken, strikes me as quite sound.

One other interesting tidbit I picked up is that git can show you where functions have moved from one file to another, thanks to its rename-detection heuristic. That sounds really sweet, and is the best reason I’ve yet heard for Git’s stubborn refusal to track renames.

The Landscape

I’ve been following Mercurial and Darcs somewhat, and not paying much attention to Git. Mercurial has been adding small features, and is nearing version 1.0. Darcs has completed a major overhaul both of its repository format and internal algorithms and is nearing version 2.0, and appears to have finally killed the doppleganger (aka conflict spinlock) bug for good.

Git, meanwhile, seems to have made strides in usability and documentation in its 1.5.x versions.

One thing particularly interesting to me is: what projects are using the different VCSs. High-profile projects now using Mercurial include OpenSolaris, OpenJDK (Java 7), and Mozilla’s projects. Git has, of course, the Linux kernel. It also has just about everything associated with freedesktop.org, including X. Also a ton of Unixy stuff.

Both Mercurial and Git communities are working on TortoiseHg/TortoiseGit types of GUIs for Windows users. Git appears to have a sane Windows port now as well, putting it on pretty much even footing with Mercurial and Darcs there. However, I didn’t spot anything with obvious Windows ties in the Git “what projects use git” pages.

The greater speed of Mercurial and Git — even for pushing and pulling small patches — likely will keep me away from Darcs for the moment.

Onwards…

As time allows (I do have other things keeping me busy), I plan to install git and work through some tutorials and try to use it in practice as much as possible, to get a good feel for it.

Future

It is beneficial to be using a VCS that is popular, though that is certainly not a major criterion for me. I refuse to use SVN because its lack of distributed functionality makes it too unproductive to be useful. But it looks like Git is gaining a lot of traction these days, especially in Debian circles, which also makes it more interesting.

I notice that Ted did convert e2fsprogs over to git as he said he might, incidentally.

Two new bashisms

I learned about two bash features I hadn’t known about today.

From a colleague, GLOBIGNORE. A colon-separated glob of files to ignore when expanding globs. Helpful behavior when set to “*~” and used with grep.

From the Git FAQ, in a section explaining that it breaks the Git build process, CDPATH. A colon-separated search path to use when you type cd. Possibly useful to refer to subdirs of ~ or other common areas. Seems like it’s prone to break a ton of scripts if exported though.

Registrar Dynadot Conspires to Help Take Down Wikileaks?

Yesterday, Wired ran a story on the Cayman Islands bank that got Wikileaks.org blocked. This story said, in part:

When the bank’s lawyers indicated they would be filing a suit, she asked them to tell her where so that Wikileaks could find an attorney in the appropriate jurisdiction to represent it. She says the lawyers refused to tell her. Two and a half weeks later, the bank filed a restraining order against Dynadot and Wikileaks in San Francisco. Wikileaks received notice only a few hours before the case went to a judge who accepted the agreement between Dynadot and the bank.

(emphasis mine)

Now, Dynadot and this Cayman Islands bank apparently had an agreement to block wikileaks.org already. Not only did Dynadot effectively take wikileaks.org down, but also they “lock(ed) the wikileaks.org domain name to prevent transfer of the domain name to a different domain registrar.” The U.S. Disctrict Court for Northern California issued this injunction without ever giving Wikileaks a chance to respond. The bank only filed the request against Dynadot. Apparently Wikileaks received notification that this was going to happen only 6 hours before the hearing (an incredibly short time in legal terms), nowhere near enough time to prepare a case.

Now, the reason I post this is because I have looked at Dynadot as a registrar before. They have good prices and a whois privacy service that makes sense: where you remain the owner of record, making it easier if you need to transfer the domain or prove your ownership of it.

But before signing up, I read their AUP carefully. Among many alarming things, I noticed this paragraph:

You further agree that Dynadot, in its sole discretion and without liability to You for any resulting loss or damages, may take immediate corrective action, including, but not limited to, removal of all or a portion of Your domain services and/or deletion, suspension, cancellation, termination, or other interruption of domain services or Your customer account with Dynadot, at any time during the term of this Agreement, in the event of notice of any possible violation of this Agreement by You or Your end users, or if such service or account is used in association with morally objectionable activities, or for any reason whatsoever. In such cases, any and all fees paid to Dynadot will be non-refundable and ineligible for account credit.

So, I thought I would write to them about it. Here is an excerpt from their response:

We always conduct an investigation before taking action against a domain. We will give you a chance to respond to the complaints.

From Wired’s story, it doesn’t look like that really happened. The US Government has already issued advisories about Cayman Islands banks, and it is unclear (to me at least) what law Wikileaks broke, or how Dynadot could find their actions of exposing fraud “morally objectionable”. What’s more, collaborating with the bank to get a takedown order written, but not talking to Wikileaks, seems to go against their statements to me (assuming again that the Wired story is accurate).


Here is my entire mail. You may also find archive.org’s copy of the AUP from last July to be helpful. (I wrote the email in November, and they’ve added some sections since then, so the section numbers don’t necessarily match up)

Subject: Re: SITE: Questions about your AUP
From: Dynadot Info <info@dynadot.com>
Date: Sat, 3 Nov 2007 13:42 -0800
To: jgoerzen@complete.org

Hello,

Thank you for your email. Responses are below.

Best Regards,
Dynadot Staff

--------------------------------------------------
DYNADOT... $8.99 domain names... $1/mo. web hosting
http://www.dynadot.com



Hi,

I currently have several domains being hosted with Gandi.  I have long been looking for someone that can provide a level of privacy for my whois data in a sane way.  I think Dynadot is the first I've seen that looks like it can do that and still be affordable.

In preparation to transfer over a first test domain, I read your AUP and frankly am quite troubled by what I saw.  I hope that you can allay my fears.

In section 4 of part 2, the last paragraph states that Dynadot "for any reason whatsoever" may delete, cancel, or terminate my domain services or customer account.  It also lists "notice of possible violation" as a justification for that.  That makes me even more nervous -- any random person could send you an email claiming something nefarious is happing with my domain name, and I'm agreeing to just let you delete it because of that?


We always conduct an investigation before taking action against a domain. We will give you a chance to respond to the complaints. 

A very similar clause appears in section 7 of part 1 ("cancellation of services").  It goes on to cast a very broad net around objectionable material and says that Dynadot decides what's objectionable.  Now, if you go to my website at www.complete.org or blog at changelog.complete.org, you'll see I'm an upstanding netizen.  But the AUP says that things that "are designed to or effectively... embarrass" third parties could get my domain cancelled.  So if I post a review of Vista that says I think Microsoft did a poor job of engineering, would my account be yanked if one of their engineers complained?  What if I (legally!) linked to a Comedy Central sketch using real TV footage to mock George W. Bush or Hillary Clinton?


The cases you described above would never happen with us. Complaining about Microsoft or making fun of George Bush are protected free speech. The service agreement is designed to give us some flexibility in dealing with customers that break the law. 

I'm particularly concerned about this because apparently DynaDot felt it worthwhile to try to take down the website of someone that found a security hole: http://www.jhuskisson.com/friends/dynadot-fights-back-bans-nick-from-everything

While I wouldn't condone step-by-step cracking instructions in most cases, this is concerning to me.


He was posting a step by step guide to hacking our website, which is illegal. The hack did not work, but we noticed an upsurge in strange activity in our logs, so we requested the hacking guide be taken down. 

Finally, item (i) under the Domain Privacy Service section says that you could immediately reveal all my information upon the receipt of merely a *claim* (not even a court order), even if it's invalid.  But I thought that NOT doing this was what you were saying made your service better, over at http://www.dynadot.com/resource/article/qa.html?aid=0


Once again we need to build some flexibility into our service agreement to deal with people who use their domains for criminal activity. Otherwise we could be liable for the damages that they cause to others. So far, we have never dropped anyones privacy except in the few cases we were forced to by a FBI subpoena.

I know this is a long message, and I appreciate your time.  I really do want to use your service, but -- no offense intended -- I want to make sure I'm not dealing with scammers first, and from reading the AUP, I'm not so sure!


No offence taken. I will email our counsel to see if we can tighten up the agreement a bit. 

-- John Goerzen




A Cloud Filesystem

A Slashdot question today about putting to use all the unused disk space on corporate desktops got me to thinking. Now, before I start, comments there raised valid points about performance, reliability, etc.

But let’s say that we have a “cloud filesystem”. This filesystem would, at its core, have one configurable parameter: how many copies of each block of data must exist in the cloud. Now, we add servers with disk space to the cloud. As we add servers, the amount of available space on the cloud increases, subject to having enough space for replication according to our parameters.

Then, say we say we want a minimum of 3 copies of each block replicated. Each write to the filesystem will then cause a write to at least 3 different servers. Now, what if one server goes down? If the cloud filesystem is short on space, we may be down to only 2 copies of some blocks until that server comes back up. Otherwise, space permitting, it can rebuild that third copy on other servers.

Now, has this been done before? As far as I can tell, no. Wouldn’t it be sweet?

But there are some projects that are close. Most notably, GlusterFS. GlusterFS does all of the above, except the automated bits. You can have this 3 copy redundancy, but you have to manually tell it where each copy goes, manually reconfigure if a server goes offline, etc. Other options such as NBD, OpenAFS, GFS, DRBD, Lustre, GFS, etc. aren’t really well-suited for this scenario for various reasons.

So, what does everyone think? Can this work? Has it been done outside of Google?

LinuxCertified Laptop LC2100S

As you might know from reading my blog, at my workplace, we have largely standardized on Linux on the desktop and laptop.

We use systemimager to maintain a standard desktop image and a separate standard laptop image. These images differ because there are different assumptions. The desktop machines mount /home over NFS, authenticate to LDAP, etc. This doesn’t work on laptops. Moreover, desktops don’t use network-manager or wifi, but laptops do.

Our desktop image uses Debian’s hardware autodetection — plus a little hacking in /etc/init.d/gdm — to automatically adjust to a wide range of hardware. So far this has worked well.

Laptops are much more picky. Our standard laptop model had been the HP nc4400 — a small and light 12″ model that people here loved. HP discontinued that model. Their replacement was the 2510p. We ordered one in here for evaluation. Try as we might, we couldn’t get it to suspend and resume properly in Linux.

So I went out scouring the field of Linux laptops. Companies such as Emperor Linux buy retail laptops from people like Lenovo, test them for Linux, and sell them — at a premium. These were too expensive to justify at the quantities we need them.

Then I stumbled across Linux Certified. I’d never heard of them before. I called them up and asked a few questions. They don’t buy retail laptops, but instead have OEMs in Taiwan build laptops to their spec. They happen to use the same OEM that Fujitsu does, I believe. (No big company builds laptops in the USA these days). I asked them about wifi chipsets, video chipsets, whether they use stock kernels. I got clueful answers to all of these.

So we ordered one of their LC2100s models. They didn’t offer Debian preinstalled, but did offer Ubuntu, so I selected that. The laptop arrived a couple of days (!!) later, configured with the particular CPU, etc. that I selected.

I was surprised at the thrill I felt at taking a brand new laptop out of its box, turning it on, and watching Grub appear before my eyes. Ubuntu proceeded to boot. I then of course installed our regular Debian image on the thing to check it out.

It needed a kernel and xserver-xorg-video-intel from lenny, as well as the ipw3945 driver for wifi, but otherwise worked with the exact same software as our HP nc4400 image. (In fact, it wasn’t hard to support both laptops with that image, since both use a lot of Intel hardware.) The one trick was making hibernate call /etc/init.d/ipw3945d stop so that the ipw3945 module could be unloaded before suspend. (Why this particular chipset needs a daemon is beyond me, but oh well.)

The hardware is great. As far as I know, the ipw3945 was the only component that wasn’t directly and automatically supported by DFSG-free software in lenny main. The screen is sharp and high-contrast (it’s glossy, which I personally don’t like, but I bet our users will). The device itself feels sturdy. It’s small and dense. I haven’t opened it up, but it looks like all you need is a screwdriver to do so.

The only downside is that they don’t sell docking stations for it. Their standard answer on that is to buy a USB docking station. That’s a partial answer, but can’t handle power or video like a standard docking station will.

Also, the LC2100s is much cheaper than the HP laptop, even when configured when nicer specs in every way. That is no doubt partially due to the lack of the Windows tax.

I’m sending off an order for 4 more today, I believe.

Viper

Well, now this is quite the experience.

I’ve been trying Viper for the past few days. Viper, for those that don’t know, is usually described as a set of Vi bindings for Emacs.

After reading the nearly 100 pages of documentation and trying it a bit, I have realized that this is not really an accurate description. Viper is a port of vi to Elisp.

But that doesn’t really do it justice. Viper seems to have pretty much everything going for it that Vim does, and then some. It is extensible with Elisp, and works with all the Emacs major modes (indentation and so forth). Yet it also is a very authentic Vi implementation, yet more customizable than Vim. And, in my opinion, more capable than Vim too.

On the one hand, this is a really neat combination: the power of the vi editing commands with the power of Emacs and Elisp for indentation, customization, etc.

On the other hand, it makes my head hurt. While Viper and Vim both are supersets of the vi command set, they don’t always implement extensions (such as multiple windows) the same way or with the same keys. Of course, you could remap them in both, but it’s a bit jarring to run Viper in expert mode, press C-w to start creating a new window, and have it run the Emacs cut command. (You can run Viper in a more limited mode where it does not recognize any regular Emacs keys if you don’t want that)

It’s just weird. It mostly looks like Emacs. It is modal like Vim, and responds to all Vi and most Vim commands. It has an additional mode: the Emacs mode. Also if configured to run in expert configuration, Emacs commands are accepted most places. Yes, you can move with h, j, k, l and C-n, C-p, C-f, C-b all at the same time.

The main drawback I can see is that Viper mode doesn’t work well with Info mode, which has other bindings for keyboard shortcuts… so all of a sudden, hjkl don’t work in info mode.

I don’t know yet if I’ll use viper much, but it is a slick program.

A little more on Vim and Emacs file handling

Yesterday’s post about switching back to Emacs saw quite a few comments from people (most of them useful, even). I learned a few things.

My biggest gripe about Vim was that for the file types I worked with most, its indentation and syntax highlighting was inferior to that of Emacs. I’d like to illustrate that with an example.

Let’s consider one of those file types: XML containing DocBook markup.

Vim has a DocBook mode. It doesn’t autodetect DocBook files, so I have this at the top of each one:

<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->

Now, why should Vim need a separate DocBook mode? DocBook is just XML or SGML, and these things have a well-formed nature. Well, part of the reason is that /usr/share/vim/vim71/syntax/docbk.vim has a ton of lines like this:

syn keyword docbkKeyword chapter citation citerefentry citetitle city contained

Yes, they are hard-coding all the DocBook element names into the editing mode. It’s probably used for completion, highlighting, maybe indentation. I’m not sure, really. I remember that editing these files without the DocBook mode was much more painful anyway, but that was 8 months ago and I can’t quite remember why.

Now, what about Emacs? I don’t know if Emacs even has a DocBook mode, mainly because I don’t have to care. The Emacs psgml mode actually parses the DTD for your XML or SGML files. It knows exactly what the valid tags are from doing so. This means it has full functionality not just for DocBook, but for any XML or SGML file with a DTD.

Not only that, but it knows more about the files than Vim does. For instance, both Emacs and Vim can do completion of various things. Vim </ C-x C-o (ooo, sounds like Emacs!) can complete my closing tags. But it can’t autocomplete my opening tags, and it certainly isn’t aware

Not only can Emacs autocomplete opening and closing tags, but it knows exactly what tags are valid at a given place in the document (thanks to the DTD) and will only consider those tags for completion. Moreover, depending on how you have configured it, it could also insert spots for you to add any required attributes. So, for instance, if you’re editing XHTML and autocompletion gives you an <img> tag, it would add src="" in it for you, as a reminder that src is required.

There are a host of other smart things that Emacs can do with XML or SGML documents. For instance, you can get a list of all tags valid at the current point with C-c C-t or Shift-RightClick — useful if you’ve forgotten the name of a tag for a moment.

The difference isn’t as great with everything. But it sure is noticable as I work with XML and Haskell files.