Category Archives: Online Life

Suspicious Blog Activity – any advice?

I’ve been noticing a number of odd things happening surrounding my blog lately, and I thought it’s about time to figure out what’s going on and how to stop it.

The first problem is that people are illegally copying my posts, probably using RSS scraping, and putting them up on their own ad-infested sites. It is trivial to find them using Google for any somewhat unique word or phrase in one of my posts. Lately one of them,, actually sends me pingbacks announcing the fact that they’ve scraped me! Most of these sites seem to be nothing but content farms for selling ad impressions, and almost none of them have any identifiable names for the owners.

(There is an exception: I have specifically set up sites like Planet Debian and Goodreads to copy my blog posts.)

I’m obviously an advocate of open content, but I do not feel it right that others should be profiting by putting photos and stories about Free Software, or photos of my family, on their ad farms. While I release a great deal of content under GPL or Creative Commons licenses, I have never done so with my blog – an intentional decision.

What should I do about this? Is it worth fighting a battle over, or is it about as useless as trying to block every spam follower on my twitter account?

So that’s the first weird thing. The second weird thing just started within the last few weeks. I have been getting a surprising amount (a few a week) of email addressed to me. It does not bear the appearance of being 100% automated spam, though it is possible that it is. It’s taken a few forms:

  • Someone wanting to buy an ad on my blog
  • Someone wanting to send me a story hyping their product (and intending me to pretend that I wrote the story)
  • Someone wanting me to write a story about their website and link to it

The profit motive in all of these is high, and in at least the second and third, so is the sleaze factor.

I’ve gotten two emails lately of this form:

Hi John,

I am curious if you are the administrator for this site:

I am a researcher / writer involved with a new project whose mission it is to provide accurate and useful information for those interested in the practice of law, whether as a lawyer or paralegal. I recently produced an article detailing the complex relationship between law and technology and the legal implications on personal privacy and free speech. I would love to share this resource with those who might find it useful and am curious of you are the correct person to contact about such a request?

Thank you!

All my best,

The details vary – the URLs appear to be random (the one cited above was little more than a link to an article), the topics the website claims to discuss range from law to schizophrenia (that one actually came with a link to the site, which again seemed to be a content farm). I am slightly tempted to reply to one of these and ask where the heck people are getting my name. It seems as if somebody has put me into a mailing list they sell containing sleazebag bloggers.

Frankly, I am puzzled at this attention. I guess I haven’t checked, but I can’t imagine that my blog has anything even remotely resembling a high PageRank or anything else. It’s not high-traffic, not Slashdot, etc. Either people are desperate, naive, failing to be selective, or maybe working some scam on me that I don’t know yet.

In any case, I’m interested if others have seen this, or any advice you might have.

Social Overload

I’m finding social media is becoming a bit annoying. I enjoy using it to keep in touch with all sorts of people, but my problem is the proliferation of services that don’t integrate well with each other. Right now, I have:

  • A blog, which I have had for years. I used to post things like short links, daily thoughts, etc – almost every day. It seems that there is some social pressure to not do that on blogs anymore, so I don’t too much. My blog gets mostly edited, more carefully thought-out, longer-form posts now. I’m not entirely happy with that direction though, since it means I don’t post much on the blog because it takes a lot of time to compose things nicely for it.
  • A twitter account, which I sometimes use to post links and such. However, I have noticed a significant decline in the number of actual conversations I have on Twitter since Google+ came out, and I wonder how relevant Twitter will remain to people in the future.
  • I also have an account, though I almost never have any interactions there anymore.
  • A Facebook account, which is mostly used to keep in touch with people I know offline in one way or another. Many of them use Facebook exclusively, sometimes even more than email.
  • A Google+ account. I post similar content there as I do on twitter, though probably more of it because it doesn’t have a character limit. I really enjoy the community on Google+ – there are few people I’ve met in person in my circles, but many people I know from various online activities. And many just plain brilliant, engaging, or interesting people. As an example: I follow Edd Dumbill, the (former?) chair of OSCon, on Google+. He started talking about his Fitbit getting broken, which led me to ask him some questions about it – which he, and others, answered – and me ordering one myself. I just don’t have that kind of interaction anywhere else.
  • A Diaspora account that I created but honestly haven’t had time to use.

So my problems are:

  1. Posting things multiple places. I currently can post on, which automatically posts to twitter, which automatically posts to Facebook. But then I’d still have to post to Google+, assuming it’s something that I’d like to share with both my Facebook friends and my Google+ circles – it usually is.
  2. The situation is even worse for re-tweeting/re-sharing other people’s posts. That is barely possible between platforms and usually involves cutting and pasting. Though this is somewhat more rare.
  3. It’s probably possible to make my blog posts automatically generate a tweet, but not to automatically generate a G+ post.

All the hassle of posting things multiple places leads me to just not bother at all some of the time, which is annoying too. There are some tools that would take G+ content and put it on Twitter, but without a character counter on G+, I don’t think this would be useful.

Anyone else having similar issues? How are you coping?

Download A Piece of Internet History

Back in the early 1990s, before there was a World Wide Web, there was the Internet Gopher. It was a distributed information system in the same sense as the web, but didn’t use hypertext and was text-based. Gopher was popular back then, as it made it easy to hop from one server to the next in a way that FTP didn’t.

Gopher has hung on over the years, and is still clinging to life in a way. Back in 2007, I was disturbed at the number of old famous Gopher servers that had disappeared off the Internet without a trace. Some of these used to be known by most users of the Internet in the early 90s. To my knowledge, no archive of this data existed. Nobody like had ever attempted to save Gopherspace.

So I decided I would. I wrote Gopherbot, a spidering archiver for Gopherspace. I ran it in June 2007, and saved off all the documents and sites it could find. That saved 40GB of data, or about 780,000 documents. Since that time, more servers have died. To my knowledge, this is the only comprehensive archive there is of what Gopherspace was like. (Another person is working on a new 2010 archive run, which I’m guessing will find some new documents but turn up fewer overall than 2007 did.)

When this was done, I compressed the archive with tar and bzip2 and split it out to 4 DVDs and mailed copies to a few people in the Gopher community.

Recently, we’ve noted that hard disk failures have hobbled a few actually maintained Gopher sites, so I read this archive back in and posted it on BitTorrent. If you’d like to own a piece of Internet history, download the torrent file and go to town (and please stick around to seed if you can). This is 15GB compressed, and also includes a rare video interview with two of the founders of Gopher.

There are some plans to potentially host this archive publicly in the manner of; we’ll have to wait and see if anything comes of it.

Finally, I have tried to find a place willing to be a permanent host of this data, and to date have struck out. If anybody knows of such a place, please get in touch. I regret that so many Gopher sites disappeared before 2007, but life is what it is, and this is the best snapshot of the old Gopherspace that I’m aware of and would like to make sure that this piece of history is preserved.

Update: The torrents are now permaseeded at See the 2007 archive and the 2006 mirror collection.

Update: The ibiblio mirror is now down, but you can find them on See the 2007 archive and the 2006 mirror collection.

Review: Linux IM Software

I’ve been looking at instant messaging and chat software lately. Briefly stated, I connect to Jabber and IRC networks from at least three different computers. I don’t like having to sign in and out on different machines. One of the nice features about Jabber (XMPP) is that I can have clients signing in from all over the place and it will automatically route messages to the active one. If the clients are smart enough, that is.


I have been using Gajim as my primary chat client for some time now. It has a good feature set, but has had a history of being a bit buggy for me. It used to have issues when starting up: sometimes it would try to fire up two copies of itself. It still has a bug when being fired up from a terminal: if you run gajim & exit, it will simply die. You have to wait a few seconds to close the terminal you launched it from. It has also had issues with failing to reconnect properly after a dropped network connection and generating spurious “resource already in use” errors. Upgrades sometimes fix bugs, and sometimes introduce them.

The latest one I’ve been dealing with is its auto-idle support. Sometimes it will fail to recognize that I am back at the machine. Even weirder, sometimes it will set one of my accounts to available status, but not the other.

So much for my complaints about Gajim; it also has some good sides. It has excellent multi-account support. You can have it present your multiple accounts as separate sections in the roster, or you can have them merged. Then, say, all your contacts in a group called Friends will be listed together, regardless of which account you use to contact them.

The Jabber protocol (XMPP) permits you to connect from multiple clients. Each client specifies a numeric priority for its connection. When someone sends you a message, it will be sent to the connection with the highest priority. The obvious feature, then, is to lower your priority when you are away (or auto-away due to being idle), so that you always get IMs at the device you are actively using. Gajim supports this via letting you specify timeouts that get you into different away states, and using the advanced configuration editor, you can also set the priority that each state goes to. So, if Gajim actually recognized your idleness correctly, this would be great.

I do also have AIM and MSN accounts which I use rarely. I run Jabber gateways to each of these on my server, so there is no need for me to use a multiprotocol client. That also is nice because then I can use a simple Jabber client on my phone, laptop, whatever and see all my contacts.

Gajim does not support voice or video calls.

Due to an apparent bug in Facebook, the latest Gajim release won’t connect to Facebook servers, but there is a patch that claims to fix it.


Psi is another single-protocol Jabber client, and like Gajim, it runs on Linux, Windows, and MacOS. Psi has a nicer GUI than Gajim, and is more stable. It is not quite as featureful, and one huge omission is that it doesn’t support dropping priority on auto-away (though it, weirdly, does support a dropped priority when you manually set yourself away).

Psi doesn’t support account merging, so it always shows my contacts from one account separately from those from another. I like having the option in Gajim.

There is a fork of Psi known variously as psi-dev or psi-plus or Psi+. It adds that missing priority feature and some others. Unfortunately, I’ve had it crash on me several times. Not only that, but the documentation, wiki, bug tracker, everything is available only in Russian. That is not very helpful to me, unfortunately. Psi+ still doesn’t support account merging.

Both branches of Psi support media calling.


Kopete is a KDE multiprotocol instant messenger client. I gave it only about 10 minutes of time because it is far from meeting my needs. It doesn’t support adjustable priorities that I can tell. It also doesn’t support XMPP service discovery, which is used to do things like establish links to other chat networks using a Jabber gateway. It also has no way to access ejabberd’s “send message to all online users” feature (which can be accessed via service discovery), which I need in emergencies at work. It does offer multimedia calls, but that’s about it.

Update: A comment pointed out that Kopete can do service discovery, though it is in a very non-obvious place. However, it still can’t adjust priority when auto-away, so I still can’t use it.


Pidgin is a multiprotocol chat client. I have been avoiding it for years, with the legitimate fear that it was “jack of all trades, master of none.” Last I looked at it, it had the same limitations that Kopete does.

But these days, it is more capable. It supports all those XMPP features. It supports priority dropping by default, and with a plugin, you can even configure all the priority levels just like with Gajim. It also has decent, though not excellent, IRC protocol support.

Pidgin supports account merging — and in fact, it doesn’t support any other mode. You can, for instance, tell it that a given person on IRC is the same as a given Jabber ID. That works, but it’s annoying because you have to manually do it on every machine you’re running Pidgin on. Worse, they used to support a view without merged accounts, but don’t anymore, and they think that’s a feature.

Pidgin does still miss some nifty features that Gajim and Psi both have. Both of those clients will not only tell you that someone is away, but if you hover over their name, tell you how long someone has been away. (Gajim says “away since”, while Pidgin shows “last status at”. Same data either way.) Pidgin has the data to show this, but doesn’t. You can manually find it in the system log if you like, but unhelpfully, it’s not on the log for an individual person.

Also, the Jabber protocol supports notifications while in a chat: “The contact is typing”, paying attention to a conversation, or closed the chat window. Psi and Gajim have configurable support for these; you can send whatever notifications your privacy preferences say. Pidgin, alas, removed that option, and again they see this as a feature.

Pidgin, as a result, makes me rather nervous. They keep removing useful features. What will they remove next?

It is difficult to change colors in Pidgin. It follows the Gtk theme, and there is a special plugin that will override some, but not all, Gtk options.


Empathy supports neither priority dropping when away nor service discovery, so it’s not usable for me. Its feature set appears sparse in general, although it has a unique desktop sharing option.

Update: this section added in response to a comment.


I also use IRC, and have been using Xchat for that for quite some time now. I tried IRC in Pidgin. It has OK IRC support, but not great. It can automatically identify to nickserv, but it is under-documented and doesn’t support multiple IRC servers for a given network.

I’ve started using xchat with the bip IRC proxy, which makes connecting from multiple machines easier.

Mailing List Hosting

I’ve hosted email lists of one sort or another probably all the way back to 1995, when I first bought as an email-only domain fed off a UUCP connection on a long-distance dialup link.

I’ve only used two mailing list hosting programs in that whole time: Majordomo and Ecartis (used to be known as Listar). Unfortunately, Ecartis has not seen upstream work in several years, and as a result was removed from Debian in May.

That got me to thinking: what am I going to do with the mailing lists I host? I’m not pleased with my current list archives, which are very similar to what you get from Mailman: no search engine, and every thread is broken at the end of each month.

I also want to preserve my archives.

I’m presently looking at whether to continue hosting the lists myself, or turn to something like Google Groups or Nabble. Hosting it myself, the main choice is Mailman, which really has more features than I need in most areas, and fewer than I need for archiving.

For other hosts, I’ve looked at Google Groups, Yahoo Groups, and Nabble. Google Groups looks like the best option, and even has a (somewhat hidden) way to subscribe via pure email without having a Google account. They can import and export subscriber lists, though not archives.

I’m thinking I’ll also make sure all the lists have full archives at Gmane. Then, based on message IDs, I can generate a bunch of RedirectPermanent lines for Apache to links to the archives don’t get broken.

My current thought for list hosting itself is Google Groups. It would be nice to be free of the hassle of administering a mailing list host, which is nothing special these days. Another benefit of Google Groups is that those people that like web forums (who ARE those people anyway?) get a forum-like interface to the list if they so choose.

Nabble has some interesting features, and can optionally import a full history of a list, but it concentrates far more on the forum than the email aspects. It doesn’t even appear to have basic moderation settings.

Web Design Companies That Understand Technology

There are a lot of companies out there that do web design work that looks fabulous.

Unfortunately, a lot of these sites look fabulous only when viewed in IE6 build xxxx, with a 75dpi monitor, fonts set to the expected size, running on Windows XP SP2, with JavaScript enabled. Try looking at the site through Safari, Firefox, with larger-than-expected fonts, and things break down: text boxes overlap each other, buttons that should work don’t, and it becomes a mess.

So, if your employer wanted a web design company that has a good grasp of Web standards and the appropriate use of them, where would you look? A company that can write good HTML, CSS, and JavaScript, and still make the site look appealing? A company that has heard of Apache and gets the appropriate nausea when someone mentions ColdFusion or Frontpage?

So far, I’ve seen these places mentioned by others:
Happy Cog
Crowd Favorite

Converted to WordPress

I have been using Serendipity on my blog for some time now. Overall, I’ve been pleased with it, but the conversion was a pain.

Serendipity is a simple blog engine, and has a wonderful built-in plugin system. It can detect what plugins need upgrading, and install those upgrades, all from directly within the management interface. There’s no unzipping stuff in install directories as with WordPress.
Continue reading Converted to WordPress

Twitter and Identica Dilemma

Since July, I’ve been trying out Twitter and its open-source competitor Both are microblogging sites, with Twitter being the largest and most well-established of them.

Both let you follow people with their 140-character updates via the web, or with alerts on your phone.

My dilemma involves how to make this work for me.

For some people, I’d like to get an alert as soon as they post an update. For others, maybe get a non-intrusive alert a couple of times a day. I want to get these notices on my computers, whichever one I’m using.

In theory, Twitter lets you follow updates on IM with Jabber. But their Jabber gateway has been down for literally a month now, and though they still have a note saying it will be back RSN, there’s little hope. has a working Jabber gateway. But unlike Twitter, you can only specify if you want notices from everyone, or nobody; with Twitter, you can sign up for IM notices from just a few people. I already have a Jabber client on all my machines.

So here are my options:

First, I could just use the Twitter and Identica web interfaces only. Not really all that appealing; I don’t want to have to go load up a webpage a few times a day. Also it is annoying to have to open a web browser, pull up a web page, just to enter 60 characters of status.

Second, I could use Twitterfox and Identifox firefox plugins. They look nice, but add yet more bloat to Firefox — and that’s two more plugins per machine to set up and maintain, not to mention that one machine is not aware of what I’ve already seen elsewhere. They do make it easier to post updates.

Third, I could use RSS feeds for reading in bloglines. Not all that realtime though.

Fourth, I could set up two Identica accounts, one which sends all notices to my IM and one which doesn’t. It’d be annoying, and still doesn’t solve my problem with Twitter at all.

Fifth, I could install some Twitter-watching app on all my machines. That’s annoying as it’s yet another piece of software to maintain everywhere, and yet another one to keep updated, AND if that wasn’t annoying enough, they still don’t know what I’ve seen everywhere.

How are all of you using Twitter or Identica?

Also, I’m curious how all these companies that use Twitter and instantly find out when anyone mentions Dell or JetBlue are able to do that. I don’t see a “search everyone’s tweets” feature anywhere.

Video uploading sites?

I’m working on switching from using a Mac to using Linux for editing video. I have a mini-DV camcorder that a bought a few years back, and I’ve been looking at capture and editing software for Linux.

Along with that, I want to post some videos online for family to be able to see I want to preserve the original quality as much as possible, offer the option to download the video, and be able to share some videos with family only (not the entire Internet).

I’ve been looking at various reviews of video sites (such as this PCWorld one) and decided to look at and Vimeo in more detail.

Blip seems to have lots of controls, options, etc. And, they seem to really care about end users, respond fast, and care about freedom. There’s an impressive response from their support team concerning Ogg Theora out there. They offer FTP uploads (which are a huge improvement over HTTP POST uploading, in my opinion, and easily scriptable). They can also automatically post your video to or about a dozen other video or blogging sites.

But what I want to do is not really what they are aiming at. They are set up for “channels” (you can apparently only have one channel per user), and for more professional users. Most notably, you can’t make videos private or restricted without paying for their $100/year or so “pro” account.

Vimeo looks very much like the Flickr of video. They do offer various options for restricting who can see a video. When they transcode video to Flash, they have the option of preserving it in HD, which doesn’t (both go 640×480 or so by default, and blip maximizes out that that). Though both offer the option to download the full, unmodified original. Vimeo has only one option for uploading, and it doesn’t seem to work well with Firefox. They have little detail about anything in their docs. Maybe it’s more the Photobucket of video than the Flickr of video. (Oh, who am I kidding — that’s Youtube).

Of course, there is Youtube. Maxes out at 320×240, doesn’t offer the original for downloading. Doesn’t make me think all that positively about them.

I could also use Flickr. I’m not sure if they offer the original, but there’s a 90-second limit on uploads there.

Any other thoughts?

Towards Better Bookmark Syncing: and diigo

I use Firefox (well, Iceweasel) from several machines. On a daily basis, at least three: my workstation at home, my workstation at work, and my laptop. I have wanted to have my bookmarks synced between all three of them for some time. I’ve been using unison to sync them, which mostly works. But firefox likes to store a last-visited timestamp in bookmarks.html, so if I have a browser open at more than one place, I get frequent unison conflicts.

I started searching for better alternatives again, and noticed that the new alternative plugin for Firefox supports a version of the traditional Firefox Bookmarks Toolbar. I use that toolbar a lot, and anything I use in place of standard Firefox bookmarks absolutely must support something like it.

I imported my Firefox bookmarks (about 900 or so) into They arrived OK, but flattened, as doesn’t have a hierarchical structure like Firefox does. After a good deal of experimentation, I have mostly gotten it working how I want. I’m using the bundles mode of the extension toolbar in Firefox, and simulating subfolders by using certain tags. It works fine; not quite what I’d want out of it ideally, but everything else is so much better that I’m happy with it.

The social bookmarking aspects of sound interesting, too, but I haven’t started trying to look at that stuff very much yet. Delicious also has a new “Firefox 3” extension that also is documented to work fine in Firefox 2. It has a few new features but nothing I care all that much about.

My main gripe at this point is that the Firefox extension doesn’t allow me to set things as private by default. It also doesn’t propogate my changes to the site immediately, which led to a considerable amount of confusion initially. On the plus side, it does do a synchronization and store a local cache, so I can still use it offline to load up file:/// links.

Some things about bug me. There are very limited features for editing things in bulk (though Greasemonkey scripts help here). It has a published API, but seems quite limited (I couldn’t find out how, in their documentation, to add a tag to an existing bookmark, for instance.) lets you export all your bookmarks, so you have freedom to leave. Also, if you poke around on, you can find Free Software alternatives that actually emulate APIs and sites.

I also looked at alternatives, and it seems that the most plausible one is Diigo. But I’m going to refuse to use it right now for two reasons: 1) its Firefox plugin has nothing like the Firefox bookmarks toolbar, and 2) its hideous Terms of Service. If you go to their ToS and scroll down to “Content/Activity Prohibited”, you’ll see these gems:

6. provides any telephone numbers, street addresses, last names, URLs or email addresses;

7. promotes information that you know is false or misleading or promotes illegal activities or conduct that is abusive, threatening, obscene, defamatory or libelous;

11. furthers or promotes any criminal activity or enterprise or provides instructional information about illegal activities including, but not limited to making or buying illegal weapons, violating someone’s privacy, or providing or creating computer viruses;

So, in other words, they can delete me account if I bookmark the contact page, or if I bookmark the opinions of someone I disagree with. Good thing the Vietnam War protesters in the 70s didn’t use Diigo, because they’d be kicked off if they wrote about their sit-ins at Berkeley. Also, I didn’t even quote the other section that says they get to remove anything you post that they think is offensive, in their sole judgment. Goodbye, links to EFF’s articles about RIAA.

Since we can’t use last names, I guess it’s just “Hillary” and “John” instead of “Clinton” and “McCain”. Oh, and don’t get me started about the folly of operating a social bookmarking site where you aren’t allowed to post URLs. That’s right up there with Apple releasing a Windows version of Safari that you aren’t allowed to install on PCs.

Compare that to the terms and privacy policy and the contrast is stark indeed.