July 2003

« June 2003 | Main Index | Archives | August 2003 »

30
Jul

This article will make more sense if you have read Why I'm Not Afraid of AOL Weblogs and The [d]Evolution of Online Communities: a Case Study first, since they form the background to this article. (Originally all three were the one article called "I'm Not Afraid of AOL Blogs, I'm Afraid of Javablogs!"), but is was way too long for a single post)

Javablogs in its current form is fundamentally unsuited to the movement it was created to capture, because it attempts to map the centralized structure of a newsgroup on top of the decentralized nature of weblogs. In doing so, and by taking a powerful role at the centre of the Java weblogging community, it is holding the community back in some ways, even as it enables it in others.

This argument can be extended to encompass any topic-based aggregator that is adopted as a tool by a weblogging community. It's just, IMHO the wrong tool for the job. This is taking nothing away from Mike and the Atlassian guys: Javablogs was timely and is well-written, and a very useful tool. I visit it several times a day myself. It's a great tool, I just can't help thinking it's not quite the right tool.

As I mentioned before, weblogs form a "collaboratively filtered trust network". People read the weblogs that interest them, and link to posts in other weblogs they find interesting. Through these networks, stories tend to propagate to people who want to read them, and people tend to find what they want to read. People also feel free to indulge themselves on their own blogs, covering any topic they fancy that day, knowing their readers are all there because of personal interests shared with the author.

Mike's original Java bloggers page (now link-rotted) was a discovery service, rather than an aggregator. You used it to find new entrants in the blogosphere (each had a short blurb describing their interests and projects), went to read their site. If you liked it you added it to your own, personal news aggregator. Everyone had their own, different and personal Javablogs on their desktop.

The Javablogs aggregator replaces this loose coupling with a newsgroup structure where all posts get thrown into the same bucket. Worse, it's an 'rn' vintage newsgroup, without threading or killfiles, and with pretty primitive navigation1. As such, the social rules of a newsgroup become very important: rules designed to increase 'signal' and reduce 'noise' in a centralized environment.

  1. stay on topic
  2. avoid redundant content that doesn't add any new information ("me too" posts)

As a result, participants are pressured to avoid doing the two things that make blogging what it is: writing whatever the hell you want, and linking wildly to anything that you f ind interesting.

I find the former very difficult: what is on-topic for Java? Where do I draw the line? Should I have to? Right now my "nerd" category feeds my Javablogs RSS feed, but I could easily change that to a Java-only feed. I just feel that would be too big and clumsy a gag to wear. Most (but not all) of my technical posts are about programming and software development, and I feel that makes them applicable to a Java blogging audience, who presumably have to do development now and then. I'm a Java programmer, this is about my art, hence it must somehow be about Java right? (The posts about Apple you'll just have to live with. Get a Mac already!)

A Javablogs that was strictly "You must be talking about Java or else" would basically be Javalobby with decentralised posting and why go to all this effort to duplicate a site that already exists?

And then, of course, there's the spectre of the social problems that centralising a community brings --- social problems that we have not encountered, but which you can already see signs of emerging. Is this the right direction to travel?

It's insidious, because you can't opt out. Either because of the social pressure against "me too", or because a large number of Java bloggers don't remember life before Javablogs, there is very little inter-blog linking in the community. You don't see Java news in the Daypop Top 40, for example. Java bloggers expect everyone to be following the aggregator, so there isn't a need to further spread news that has already passed through there. As such, for a Java hacker to opt out would be to cut yourself off from your most likely audience.

This is, of course, a rant without a solution. Javablogs exists, and is doing a good job at being what it is. As much as I think it should have been otherwise, you can't stuff the genie back in the bottle. And it's not all bad. Newsgroups still survive and flourish, after all, and Javablogs has the advantage of being actively administered to keep the trolls and spam to a minimum. So long as Atlassian add filtering and threading features to Javablogs in imitation of the evolution of mail and news clients, the Java blogging community will continue to grow and share good information.

I just can't help thinking it will be missing something. Something important: that ineffable element that makes weblogging different.

1The "popular posts" feature isn't worth much: essentially you "vote" for a story by clicking on it, before you even get the chance to read it. Since the only clues anyone gets as to a post's worth before they click on it are the author and title, the popularity adds no information but "this is the sort of author or title that people feel they wish to click on". On the other hand, this sort of information would be invaluable to writers of text-ads. :)

Excerpted from The Noble Eightfold Path:

Right Speech (samma vaca)

  1. Abstaining from false speech (musavada veramani)

    Herein someone avoids false speech and abstains from it. He speaks the truth, is devoted to truth, reliable, worthy of confidence, not a deceiver of people. Being at a meeting, or amongst people, or in the midst of his relatives, or in a society, or in the king's court, and called upon and asked as witness to tell what he knows, he answers, if he knows nothing: “I know nothing,” and if he knows, he answers: “I know”; if he has seen nothing, he answers: “I have seen nothing,” and if he has seen, he answers: “I have seen.” Thus he never knowingly speaks a lie, either for the sake of his own advantage, or for the sake of another person's advantage, or for the sake of any advantage whatsoever.

  2. Abstaining from slanderous speech (pisunaya vacaya veramani)

    He avoids slanderous speech and abstains from it. What he has heard here he does not repeat there, so as to cause dissension there; and what he has heard there he does not repeat here, so as to cause dissension here. Thus he unites those that are divided; and those that are united he encourages. Concord gladdens him, he delights and rejoices in concord; and it is concord that he spreads by his words.

  3. Abstaining from harsh speech (pharusaya vacaya veramani).

    He avoids harsh language and abstains from it. He speaks such words as are gentle, soothing to the ear, loving, such words as go to the heart, and are courteous, friendly, and agreeable to many.

Online, where our speech makes us who we are, these are even more important. If only I had the will.

I was reading Clay Shirky's article about online communities, and it reminded me of this story. Along with my previous article about AOL weblogs, this story serves as background to the point I will eventually be trying to make about Javablogs.

Once upon a time, probably around 1994/5, there was a little-known newsgroup called alt.sysadmin.recovery. It was populated by a group of truly vicious (and knowledgeable) system administrators, with seemingly enough time on their hands to be very amusingly bitter. The newsgroup was intelligent and funny, finding the absurd, and often painful humour in the life of sysadmins having to deal with broken hardware, crufty software and clueless users. After a while they were responsible for a significant proportion of the content of alt.humor.best-of-usenet, until reposting between the two groups was specifically forbidden in the former's FAQ.

It is prohibited to re-post alt.sysadmin.recovery messages to alt.humor.best-of-usenet. Most ASR denizens have nothing against that group itself, but sometimes in the past we have averaged a few messages a day there. This has drawn the lusers here like moths to a candle—more unpleasant for the moths than for the candle, but we don't care about the moths. We strongly recommend that you put "X-No-Ahbou: yes" in your headers.

The reposts were causing the little-known newsgroup to get a lot of attention. Everybody wants to associate with, and become a part of something that is funny and successful, and this inevitably lowers the overall quality of the place being inundated with newcomers. Eventually, a Usenet moderation hack was employed to prevent the totally clueless from posting, but it was generally held that the classic era of the group was over.

Around this time, there were a few posts to ASR, saying that while the newsgroup was (in their eyes) dead, there was this neat place on the web that hosted interesting discussions, and seemed to contain a lot of the pith, interesting personalities and technical knowledge that had gone from the newsgroup.

That place was a little-known website called... Slashdot

Seen in the Ruby Application Archive:

FooApp v0.34 FooApp for Ruby with a user friendly API, akin to BarApp, but feature complete and significantly faster.

Dude... if it's feature complete, why is it only version 0.34? (To confuse matters a little more, the long description page lists it as being “Production Quality”)

Too many Open Source projects treat Version 1.0 as some kind of Holy Grail that can only be reached when the project is perfect. I find that highly annoying, because it makes it really, really difficult to tell a sketchy alpha from production code that is just still in pre-1.0 because the author wants it to do everything.

My best advice to anyone thinking of starting such a project: put together a road-map now. List the features you want to see in your program, and then cull it down to three milestones:

  1. The bare minimum you would need to implement before you let anyone see your code.
  2. The bare minimum functionality you would need to implement before people could legitimately make use of your code
  3. As above, but stable and well-documented enough to be used in a production system (for libraries), or by non-programmer end-users (for applications)

Milestone 1 is your v0.5. Milestone 2 is v0.9. Milestone 3 is v1.0.

Note the “Minimum”. If you are over-eager, you will fall into the same trap as the programmer who developed FooApp: a version number that contradicts the documentation in terms of how useful the application may in fact be. Don't cull so far that you're releasing something you wouldn't use yourself, but don't fall into the trap of thinking that 1.0 is some magic number you will achieve when you've implemented everything.

The plan may be subject to change, too, especially if you find features being contributed by interested developers. However, if you start with a plan, you'll find it (and more importantly your users will find it) easier to plot how far you are from the magical stable release.

The idea is to get something by 1.0 that is useful, useable and stable. After you have accomplished 1.0, you can sit down and come up with a similar road-map for 2.0.

We've all heard by now that AOL is introducing itself into the world of weblogs. Some people believe this is a bad thing. I am, however, not convinced it will have any real effect at all. And that's speaking from the position of an ex-member of not one, but two alt.aol-sucks secret cabals.

There are basically two kinds of people who have tried IRC. The first kind connected on a whim, randomly tried a few channels. While a few of them got lucky and found something interesting, most became annoyed at all the complete lamers they found and left in disgust. The other kind already had some kind of destination in mind, people they wanted to talk to. They made use of the medium, and if they ever moved to another channel, it was because it had been recommended to them by someone they knew.

Blogging is like that too. There are already several million weblogs out there that aren't (in my opinion, anyway) worth reading. If you're not convinced, try The Random Livejournal Link a few times, and see if you find anything particularly interesting there. It's unlikely that you will.

If you sample weblogs at random, of course you're unlikely to find anything that interests you. The presence or absence of AOL will not change that one iota. On the other hand, if you enter into blogging because there are specific sites you have found you like to read, and then follow hyperlinks from those sites to new blogs (how most of us operate), you're much more likely to only see that portion of the blogosphere that you are likely to find interesting.

Blogging is a collabaratively filtered trust network. This is a fancy way of saying “people who link to each other”.

In this community, a blog post comes into existence as a web of people's attention. At the centre of the web is the blog on which the post lives. Radiating out from that centre are the people who subscribe to that blog. Traditionally, people who find a particular post interesting will create a link to it on their blog, extending the web to their readers. The reach of a particular post becomes an equation based on how many readers you have and how interesting the post is.

It's a diffuse, loosely-coupled community: an informal reputation system, based on the ability to choose who you trust to point out interesting stuff. It's that reputation system that makes the blogosphere work. After all, Sturgeon's Law applies everywhere. The problem is, like the 80/20 rule of software development, everyone's opinion on which ten percent isn't crap differs.

One or two blogs in any interleaved community act as hubs, their authors committed enough to read a large number of weblogs themselves, and post a large number of links. An individual finds new, interesting blogs by following links: if people on your blogroll (often the aforementioned hubs) link to a particular person a few times, you get to recognise their site, and eventually you decide you like it enough to add it to your own subscription list.

New sites' main avenue of promotion, on the other hand, is through trackbacks, comments and referrer logs. You get attention by commenting on some existing conversation in the blogosphere. If your comment is interesting, people will follow it back to the source, read some more, and perhaps subscribe.

Think of the thing that AOL users were most pilloried for on Usenet: the “Me Too!” post. On Usenet, because it is assumed that everyone within a particular group follows that group, making a post that agrees with some other without adding any additional content is the height of bad netiquette. On blogs, such behaviour is de rigeur because you don't assume that your readers read the same sites as you, so passing on links (even without comment) is a vital way to spread the ideas you feel are worth spreading.

All this is how blogs generally work1. It has its good points, such as the ability to quite accurately subscribe to your particular areas of interest. It also has very little acrimony over redundant or “off-topic” content, because nothing on a blog is truly off-topic: you read a blog because you are interested in what the author has to say, and if you are not interested you can easily tune out.

It also has its bad points, in that it generally creates an A-List of bloggers with an influence perhaps beyond their merits, just because they happened to be there first. Also, it can be very frustrating when your particular audience doesn't find a post interesting, but the community is structured in such a way as there's no way to punch your ideas through to the wider audience. In that way, you can find yourself stuck in a niche.

What it is, however, is highly resistant to floods of crap. The network routes around such damage easily. Nobody finds it interesting, nobody links to it, so it may as well not be there.2 Which isn't to say that AOL weblogs are going to be all crap: they'll just follow the general distribution such services have shown elsewhere. Those that aren't crap will be linked to, and become part of a larger section of the blogosphere.

Things that might suffer the brunt of damage from a few million AOL weblogs coming online would be the services that try to treat all weblogs as being equal, rather than divided into niches of interest: services like weblogs.com, would likely be overwhelmed by the sheer volume, not to mention the continuing drift from the ascendancy of technical and current-events blogs towards the personal diary. Just compare, for example, the Daypop Top 40 with the Livejournal Meme Tracker and you'll see what I mean.

1 I say generally, because a dominant topic-based aggregator like Javablogs can change the shape of the community markedly, turning it into an entirely new beast. But that's a story for a later article.

2 This is, of course, subjective, where “nobody” really means “nobody I know”. Of course these blogs will link to each other, and to their friends in other blogging or diary systems, and create their own communities of interest, over things that I just don't find interesting. They just won't intersect enough with mine for them to exist for me.

Musing about yesterday's post on the “Snapster” idea, I started dredging up memories from when I was learning Company Law.

What occurred to me, was the well-known fact that the modern corporation exists as a means by which people can get involved in risky ventures, and then be protected from having to pay their debts by the government if the venture fails. Put that in your pipe and smoke it, Libertarians.

I studied Law for three and a half years. I even went to some of the lectures before I dropped out and became a computer nerd. The attitude of nerds to the law has always interested me, especially what we think the law can, and can not be made to do.

It has the same basic rules. Rules like gravity. What you must learn is that these rules are no different to the rules of a computer system. Some of them can be bent. Others can be broken. —Morpheus, in The Matrix

Programmers, or people who associate with programmers too long, often fall into the trap of believing the law is like a software system: that like vulnerabilities in code, logical flaws in the law can be exploited to break the system wide open, and make it do things it was designed to prevent. Viz, today's much-linked-to Cringely “Son of Napster”article:

First the law. Snapster is built on the legal concept of Fair Use, which allows people who purchase records, tapes, and CDs to make copies for backup and for moving the content to other media.

Cringely proposes a single company buy copies of all available music. That company issues shares, making everyone a part-owner of that music. As owners, the shareholders then (according to Cringely) have Fair Use rights to space- and time-shift the music, and listen to it whenever they want.

All in all, a brilliant hack of the legal code.

And totally useless.

Cringely's plan quite obviously turns the concept of Fair Use on its head. It's a classic computer hack where two components that were developed separately: Corporations Law and Copyright Law interact in an unexpected way, to produce results that the designer quite obviously never wanted to occur. As such, the courts will not have even the slightest problem in declaring it illegal.

The law is not code. It is not compiled into an inviolate binary and run by a deterministic system. It is passed through the heads of human beings whose job it is to interpret the intent of the law. Courts generally look with disdain upon ‘clever’ interpretations of the law, unless that interpretation follows the court's conception of justice. The human beings themselves can be hacked, but by out-of-band methods requiring money or political clout.

In Cringely's case, any court would take one look at the idea and laugh. It is so obviously a perversion of the concept of Fair Use that it would never survive the judicial process.

Dear Vodafone

  • 3:42 PM

I have about four months remaining in my twenty-four month contract with Vodafone Australia, and my trusty handset has become decidedly less trusty of late. It's been doing things like deciding not to charge, fading out the screen and so on. I'm sure it's repairable, but with four months left before I can renew my contract and get a free phone anyway, that's not really a worthwhile investment.

So I go into my trusty Vodafone dealer, and ask them what the options are.

Apparently, I'm not eligible for a phone upgrade until six weeks before my contract is up (which is basically so close to my final bill there's no difference), so I would have to pay out all of the remainder of my current contract up front. Admittedly, that's only about $120, but it's the principle of the thing. If the dealer had offered me a substantial discount, I'd have signed up then and there, probably getting myself a pretty expensive GPRS phone in the process.

Instead, I was offered absolutely no incentive to remain with Vodafone. Since we now have number portability, the net difference in cost and convenience between me staying with Vodafone or switching to a new provider is... zilch. Hence, a certain sale was turned into a vaguely pissed off customer with an incentive to see if he can get a better bargain elsewhere.

Nice one. Very clever marketing.

On Convergance.

  • 3:55 PM

Alan pointed me to Hacknot, a software engineering blog. This line from The Soporific Manifesto stood out:

Q: What can you brush your teeth with, sit on, and telephone people with?
A: A toothbrush, a chair and a telephone.

I can't agree more. Attempting to shoe-horn unrelated functionality into a single application just creates a confusing and harder-to-use app. This, for example, is why the Mozilla project have recently broken their web browser and email software into separate applications. “What can you browse web pages with and send email with? A: Firebird and Thunderbird.”

(I disagree about Unfinished Sympathy, though. While it is a very good pop song, it's probably beaten in the ‘best pop song of all times’ stakes by Blur's For Tomorrow, or perhaps even Elliot Smith's Everybody Cares, Everybody Understands.)

(Update: And how could I have forgotten Sometimes, by James?)

When I started using OS X, two applications that were highly recommended were Tinderbox and Spring, both applications that promised to revolutionise the way I would store and retrieve information. I would try Haystack as well, but I'm told it's still really slow, and it's not available for OS X anyway. I'm always interested in innovative ways to organise the huge amount of information I tend to just leave in text files all over the place.

I produce information at a prodigious rate. The blog helps me record some of the bigger ideas, but I download a lot, I write a lot. I have random OmniOutliner files lying all over the place that I wish would just magically arrange themselves into an uber-information-store. I'm lazy damnit. My computer has all this processing power and spends 99% of its time idling, why can't it organise itself? Why do I have to intervene at all?

The other side of my laziness, however, lies in the “Fifteen Minute Test”. This was never a conscious decision, I didn't turn around one day and impose this test on new software, it just came from having a full-time job, and wanting to spend at least some time away from the computer. The Fifteen Minute Test. I'll install a demo and play with it, but if after about a quarter of an hour I haven't found something impressive that the application will do for me, something interesting and novel that promises to streamline my nerd-life, I'll turn it off, and probably won't ever try it again.

One of the reasons I switched from Linux to OS X as my primary platform was because I don't have time to play around with software any more. I want it to work in predictable, obvious ways, and OS X (mostly) does that for me. I don't have time to spend all day messing with my Apache configuration file or installing a new MTA just to see if it will rescue me from Sendmail, and I don't have the time to submit myself to a new GUI application with an unfamiliar metaphor and confusing interface in the vain hope that it might eventually become easier.

Maybe that means I'm not an alpha nerd any more. In twenty years some ten year old kid is going to try to teach me how to use the descendent of Tinderbox because I never caught on at the beginning, and Just Don't Understand...

Still, I think I'm more tolerant than most. So if you're coming up with the next big paradigm shift that will change the world, ask yourself: “How can I pass the fifteen minute test?”

Update: James Strachan suggests I check out Voodoopad, a wiki-like information manager. It seems like a neat idea, as Wikis definitely passed the 15 minute test for me.

Update 2: Voodoopad not only passes the fifteen minute test, it passes the three minute test. If only it had outlining support, it would be fantastic.

Dear Mum

  • 8:31 AM

Dear Mum,

You will be happy to know that after three weeks of my house-sitting for you, both cats are still very much alive and healthy.

Which brings us onto the subject of the plants on the balcony...

I think snopes.com is one of the better achievements of the Internet1. One thing the Internet is really good at is providing an exhaustive reference on a particular subject that might be considered either too on the fringe, or too big for print. (Another example of this is the amazing IMDB. All I want now is an online equivalent of the Guinness Book of Hit Singles. Is there such a thing? English charts only, please...)

Anyway, one story from Snopes that I quite like is the story of Van Halen and the Brown M&Ms. The story was that Van Halen's rider contained a clause that required a bowl of M&Ms to be supplied backstage, but with the brown ones removed. If they found any brown M&Ms, they could terminate the contract with the venue without penalty, and not perform.

It's true. Snopes quotes the following passage from David Lee Roth's autobiography:

. . . Van Halen was the first band to take huge productions into tertiary, third-level markets. We'd pull up with nine eighteen-wheeler trucks, full of gear, where the standard was three trucks, max. And there were many, many technical errors -- whether it was the girders couldn't support the weight, or the flooring would sink in, or the doors weren't big enough to move the gear through.

The contract rider read like a version of the Chinese Yellow Pages because there was so much equipment, and so many human beings to make it function. So just as a little test, in the technical aspect of the rider, it would say "Article 148: There will be fifteen amperage voltage sockets at twenty-foot spaces, evenly, providing nineteen amperes . . ." This kind of thing. And article number 126, in the middle of nowhere, was: "There will be no brown M&M's in the backstage area, upon pain of forfeiture of the show, with full compensation."

So, when I would walk backstage, if I saw a brown M&M in that bowl . . . well, line-check the entire production. Guaranteed you're going to arrive at a technical error. They didn't read the contract. Guaranteed you'd run into a problem. Sometimes it would threaten to just destroy the whole show. Something like, literally, life-threatening.

1 Although their use of embedded MIDI on some pages... It burns, my precious! It burns!

Some ‘Director's Cuts” of movies are really great. They give you the chance to see the movie as the director wanted you to see it, before it was emasculated by test-screenings or the whims of the studio executive (e.g. Blade Runner). Alternatively, the extended version allows the director to provide a version of the movie that contains worthwhile material that would have made the movie too long for the cinema, but that still works in home viewing (e.g. Fellowship of the Ring)

Some Director's Cuts are a rather sad attempt to rewrite history (e.g. Star W.. er... A New Hope?

Other Director's Cuts are a marketing exercise designed to sell more copies of a video (or now DVD) by adding those bits of the movie that were filmed, but removed from the final cut because they screwed up the pacing, made the movie drag and generally added nothing to the film itself. These are the ones James Cameron tends to produce (The Abyss, Aliens, and now Terminator 2 have all suffered this treatment)

The extended version of Terminator 2 (which was on television tonight) lies firmly in the third category. All of the restored scenes are basically dialogue that was cut from the original release for very good reasons. Their reintroduction detracts from what is pretty much the ultimate early-90's action movie. T2 took the genre as far as it could go with existing technology: no comparable blockbuster effects/action movie came along until The Matrix seven years later.

Annoyingly, if I want to get this movie on DVD with all the additional features, the extended version seems to be my only option. If that's the case, I'm just not going to buy it.

Meet Melvin

  • 11:56 PM

Sadly, Melvin is pretty much impossible to describe in alt text

This is Melvin the Money-box. He belongs to my mother, for whom I am cat-sitting while she is on holiday in England. Melvin is not quite as much fun as the cats, but at least he doesn't wake me up at 5:30am demanding to be fed.

Stack Overflow

  • 7:49 PM

The maximum number of significant tasks I can work on concurrently is two.

The optimal number, of course, is one. Anyone who has read Peopleware will be aware that forcing programmers to task-switch frequently is a recipe for destroying their productivity. However, today I was working on two tasks that didn't require significant switching: running some long-lived tests on one machine while I coded on the other. I could hold the state from the second task in my head while I went to check on the first.

Still, it was two significant tasks running concurrently.

Near the end of the day, the boss walked up to my desk and asked me how a third task (that had neatly slipped my mind) was progressing. What I experienced could only be described as a stack overflow. The attempt to add the third task to my stack caused the whole mess to fall over, leaving me babbling like an idiot for a few seconds while I tried to reconstruct enough state to allow me to provide an intelligent response.

So that's it. I have a stack for significant tasks that is two deep. Any more and I crash.

Minor note:

If you could care less about something, you are implying that there the thing in question has some measureable degree of importance, however slim, in that it is possible for that degree of importance to lessen.

If, on the other hand, you couldn't care less, you are describing a situation in which the object of your disdain has sunk so low that it is impossible to imagine being able to have any less interest in it.

It's an important distinction. Thankyou.

In a roundabout way, I came across a page about an IRC Channel discussing whether the channel should have a clear set of defined rules.

As an IRC user for eight years, here's the formula I have found to be the only one that actually works:

  • Everyone who is considered a “regular” by the other regulars gets ops
  • The standard of behaviour on the channel is determined by the rough consensus of whoever happens to be there at the time
  • Ops are required not to get into stupid kick/ban wars with each other

You get fewer problems with this model than most others. Few real trouble-makers have the patience to sit around and interact levelly with a group long enough to be considered a part of it. Even the occasional idiot doesn't cause problems for long, and anyway, the alternative is creating a bureaucracy.

It also means that there's almost always someone around to deal with the passing dickheads who plague IRC. Channels with rigid ops structures almost always have far too few of them to cover the whole day, leaving those poor regulars stuck in the down-time to be the victim of whoever feels like being annoying1.

Here's the counter-intuitive part: having a clear command structure and defined rules creates more conflict than not having them. People, instead of dealing with problems between each other, take them to the arbitrating body. People don't compromise when they can ask the command structure for a black and white decision.

When the strict rules and command structure go away, people have no recourse but to settle their differences one way or another.

Sometimes, that settlement can't be made, and the group forks. This is a good thing. The alternative would be either one side of the argument being disenfranchised anyway, or worse, the group staying together because they are “accepting the judge's decision”, but festering dislike, resentment and backstabbing.

1 back when I was an IRCop, this was one of my pet peeves. I would be asked to ‘babysit’ a channel that was op-less and being harrassed, but the channel owner would refuse to add any more ops because it was ‘against policy’. I would quickly stop going to the aid of such channels.

Part two of the “Lessons Learned When My Blog Died” trilogy:

Lesson Two: Handling errors is not enough.

An error is handled if the program is able to recognise that something unexpected has occurred, and trigger some alternative but explicit execution path as a result. For example, if you are loading data from a corrupt file and your program does not handle errors, the corrupt data in the file will propagate through your program unexpectedly, leaving people stuck with names like ‚ƒ„…†‡ˆ‰Š.

Or another example: Java contains implicit bounds-checking on arrays that allows it to automaticallly detect when too much data is trying to be stuffed into too small a space. By throwing an exception, Java handles that error condition, at least at a low level. C, on the other hand, does not. Where a C program does not explicitly handle the buffer overrun case, the error can cause unpredictable events elsewhere in the program, leading to the classic stack-smashing security exploit.

Handling an error means:

  1. Noticing that something unexpected has happened
  2. Triggering some alternative logic to return the program to a predictable state.

That “predictable state” could be the program printing out a cryptic error message like “Can't use an undefined value as a SCALAR reference at lib/MT/ObjectDriver/DBM.pm line 354.” and then dying. It doesn't matter: the error is “handled’ if you do something predictable with it.

Obviously, handling an error is not enough. One should also attempt to recover from an error. The error quoted above is happening when I log in to MT, because it tries to list the five most recent comments, and my comments db was corrupted when I ran out of disk space. MT handles that error by throwing up an error screen, and inviting me to do something else that isn't broken. The error is handled, but not recovered from.

To recover from an error, you must first handle the error, but in the new path of execution in the error handler, you must:

  1. Recognise the cause of the error
  2. Take steps, if possible, to prevent the error from happening again
  3. Take steps, if possible, to continue the user-requested action by working around the error condition

Sometimes, it's possible to recover gracefully from an error. In the case of a corrupt comment in my comment database (I can still post new comments and retrieve old ones), MT should recognise that the corrupt entry really isn't going anywhere on its own, and have some strategy to deal with it.

Files get corrupted all the time, especially heavily used ones. I once lost an entire Windows 2000 installation because a single, 10Mb file became corrupt. Needless to say, I wasn't happy about it.

Obviously, recovering is a lot more difficult than just handling. To handle something, you just need to know that something bad happened. To recover, you need to know just what the bad thing was, and think of ways to get around it. That's a lot more work, and in the face of things that aren't expected to happen often (like files getting corrupted), the need to get a feature written seems far more important than handling every little thing that might go wrong.

Design can also play a significant part in aiding (or hindering) error recovery. The standard Java error-handling policy of “just throw an exception in the air and hope somebody deals with it” is usually particularly deficient in this regard. Every layer of encapsulation the error goes up through, you lose more ability to deal with the issue at the lower level.

Design can also make error recovery harder. Staying with the subject of file formats, if you place all your data in a single file with variable-length records and no recogniseable record separator, you're really quite dead if you get a single bad entry. Similarly, if you're using a transaction log to keep anything in sync, each log entry relies on the previous entries all having been performed. So if a corruption appears halfway through the transaction log, anything after that corruption will be rendered completely useless.

So, the moral of today's story. Don't just think about how to handle an error, also put thought into how you might recover from it.

Note: I'm using a pretty ancient version of Movable Type here, mostly because I'm scared of upgrading. Any or all of these issues may have been fixed in later versions. I'm not trying to criticise the product, I'm just trying to make a few general points and make some good of an annoyingly broken weekend.

Part one of my “Lessons Learned When My Blog Died” trilogy.

Lesson one: Soft boundaries vs Hard boundaries

There are two types of disk quota. Soft limits are a ”nag threshhold”: you are allowed to exceed them, but the system will start nagging you if you do. If you stay over your soft limit too long, you won't be allowed to write any more data, but the limits are set up such that it would be very hard to do such a thing accidentally.

Attempts to cheat the soft-limit system by dropping down below quota and then putting the files back straight afterwards can be easily detected and dealt with administratively.

Hard limits, on the other hand, impose a strict cap on the disk space a particular user can be assigned. Attempts to exceed that usage are met with a reaction similar to the disk being full.

Very few programs have been programmed to cope with running out of disk space. It is a very rare program that, faced with a full disk, will not trash at least some of your vitally important data. In my case, this meant a couple of the Berkeley DB files that power this site became corrupted by failed writes, leading to portions of the site still being quite spectacularly broken.

Hard limits on disk space should always be set to at least twice the soft limit (that gives each user enough temporary space to back their data up to a tar file and transfer it elsewhere). People will mostly stay below the soft quota, so this will not create a serious problem of over-use of disk. On the other hand, the higher hard limit is much, much less likely to break anything and annoy the users.

Dear $VENDOR.

When it says “24/7 Technical Support” on the box, I explicitly do not expect that to mean “you can email us any time of the week, but we'll only respond the next business day.”

And while we're at it, don't you think that it would be a good engineering principle to put some cheap, write-once memory in your device containing some form of rescue firmware, so there's less chance of it turning into a brick from a corrupted upgrade?

HTH, FOAD, and do NOT HAND.

Charles

Working...

  • 2:35 PM

Various cow orkers sitting around a table...

I wanted to write a response to the circulating The Internet is Shit meme, but all I really kept coming back to was this piece of Usenet history from Russ Allbery

...because the thing that Usenet did, the important thing that Usenet did that put everything else to shame, was that it provided a way for all of the cool people in the world to actually meet each other.

Sure, I've been involved in Usenet politics for years now, involved in newsgroup creation, and I enjoy that sort of thing. If I didn't, I wouldn't be doing it. But I've walked through the countryside of Maine in the snow and seen branches bent to the ground under the weight of it because of Usenet, I've been in a room with fifty people screaming the chorus of "March of Cambreadth" at a Heather Alexander concert in Seattle because of Usenet, I've written some of the best damn stuff I've ever written in my life because of Usenet, I started writing because of Usenet, I understand my life and my purpose and my center because of Usenet, and you know 80% of what Usenet has given me has fuck all to do with computers and everything to do with people. Because none of that was in a post. I didn't read any of that in a newsgroup. And yet it all came out of posts, and the people behind them, and the interaction with them, and the conversations that came later, and the plane trips across the country to meet people I otherwise never would have known existed.

I'll probably come up with my own response later, though.

I've been watching the RSS vs nEcho debate rage back and forth, back and forth for a week or so now. For those who came in late, don't worry what Echo is, go home.

It doesn't matter.

I've been in so many arguments like this over the years. When I was helping on an IRC network, I think I got in two or three of them every week. At the time, they seem like the most important thing in the world: every minor issue is a grand, moral point you must stand up for.

They don't matter.

nEcho will happen, or it won't. RSS will survive, or it will die. nEcho will be the greatest thing since sliced bread, or it will make stupid design decisions and everyone will laugh at it. Maybe it will all be taken over by Microsoft, or even worse, end up at Sun as part of the JCP.

The net change to the world as a result of this will be unnoticeable. A couple of programmers will, or will not spend a day adding another parser to their application. Maybe we'll have another format added to the existing mass of mutually incompatible syndicication formats. We'll end up with a marginally better API for weblog posting, or a marginally worse one, but whatever it is will be ‘good enough’ for all practical purposes. We'll continue to whine about how Dave Winer Just Doesn't Get It, or we'll whine about how standards bodies always screw up good ideas when they get hold of them.

I can't see a single significant difference between a world without nEcho, and a world with it. That said, if a bunch of engineers want to create a Better Way To Do It, then more power to them: I hope they're successful, are happy making something they are proud of, and get it adopted in the wider community.

It's all just another one of those rows geeks get into without realising they're fighting over thin air. I've been fantastically guilty of this over the years, and probably will be again, so I know the signs.

None of this is worth either side expending any emotional energy over whatsoever.

An architectslobby headline on Javablogs pointed out by a co-worker reads:

XML Beans: The Best of Both Worlds

While I haven't read the article it points to, I'm sure I could think up some better headlines:

  • XML Beans: Looks great on the proposal.
  • XML Beans: Because we didn't have quite enough incompatible XML object serialization formats.
  • XML Beans: Two points in buzzword bingo!
  • XML Beans: If you start running now, you just might escape.

Joseph Ottinger asks Where are the Writers?

...It seems like everyone wants to write very focused, pigeon-holed articles about this tiny behaviour. Nobody writes anything sweeping, nothing is submitted that actually shows a lot of organized thought. I'm impressed (occasionally) in fits and starts, but nothing really seems to stand out and make me think, "Now this article is going to be talked about for a long time."

The industry has had some of it, to be sure - Fred Brooks, and Dyjkstra, McConnell as well… but those are old works, from the industry's infancy. Java has nothing on that stature, unless perhaps it's the GoF book - but even there you're talking about something that's fairly hoary with age.

Well, to start with, all the examples in the GoF book are in C++ or Smalltalk, so Java can't really adopt that either.

You very rarely see a lasting work of ”stature” in computing that is tied to a particular programming language. In The Mythical Man Month Brooks' subject was writing a mainframe OS, but the book transcended that subject to be about the truths of development in general. Knuth invented MIX specifically to make The Art of Computer Programming independent of any particular architecture.

There are books being written today on development methodologies and technologies that use Java for its example code. I've seen books on design patterns, OO architecture, genetic algorithms, compiler design and so on, all with Java code inside them. Think of AOP: a general programming technique with its most well-known implementations in Java.

Why then, don't all these nifty things percolate down to the pages of the Java Developers Journal?

The real answer is that the JDJ is a magazine for practitioners. It's the same reason that the books on the “Software Engineering” shelf in my bookshelf contain all the interesting stuff, while the Java shelf is full of useful but un-inspiring books on specific tools and APIs. Interesting movements in Computer Science end up in more general journals (I first read about AOP in the Communications of the ACM on a co-worker's desk), the JDJ attracts a more practical article.

This makes the rest of my essay largely pointless, and I'm not sure I even agree with it entirely myself. It's an alternate explanation, though.

Expansive, inspiring, sweeping developments in the field of computer science just doesn't suit Java's character.

Java fits into a particular, closed programming niche. It is a practical language designed from the start to leverage existing technologies to solve certain problems. It was designed to be easy to pick up, to protect you against certain common coding mistakes, and to be friendly to migrating C++ programmers. The Java Virtual Machine was a pretty exciting deployment platform, but unlike its forefathers, the JVM is a black box: you have no real ability to alter the behaviour of the VM from inside the language, so when programming it is largely an irrelevance.

You won't find too many ground-breaking articles or books written about Java, because Java is not a ground-breaking language. Java was a language born into middle-age. It's solid, rather set in its ways, bloated here and there, but it doesn't throw wild parties or disturb the status-quo. It certainly doesn't fire the imagination.

You really have to blame its parents. It was a designer baby. When something is born as a product, rather than as a language in its own right, it is always going to be pushed into maturity a little faster than it is able to handle.

While Java itself is rather boring, its boringness has brought it a quick ubiquity. Its got a huge available range of libraries and tools, and its in the areas of these tools that Java as a platform continues to grow. Hence while Java does nothing to fire the imagination of the language purist, there is more than enough happening around Java in a practical sense to keep people interested. Which explains why there is an interest in writing, and reading articles about particular tools.

In addition, Java has been quickly working its way into the middleware world: Java's niche of being the COBOL replacement for the new century. And when I say niche, I must point out that this is abig niche. When viewed from the perspective of programmers employed, lines of code written and the direct influence on peoples lives from day to day over the last half century, COBOL is the elephant in the programmers' kitchen that everyone seems to try to ignore.

But in terms of advancing the art of computer programming, it's a niche nonetheless. The history of COBOL development has lied in advancing the art of COBOL, without appreciably much of that art making it beyond that barrier. When you're sitting in this niche, what you want out of your trade journal are not articles about the next big thing, but how to make better use of the thing you have: which leads us nicely back to Joseph's complaint.

At work at the moment, we're using a commercial Rules Engine product (that I will not name for obvious reasons1). Rules are entered through a Graphical User Interface that obviously looks really nice when you're giving a demonstration: “Look! With a few clicks you can add a new business rule. All the available objects and methods are in these convenient drop-down menus. It's so intuitive!” The rules are then saved in a set of XML files that are, like most machine-friendly XML, totally opaque to a human author2.

There are a number of really annoying user-interface glitches. It's incredibly inconsistent. Some simple things (like cut/paste, or multiple selection) are broken in various subtle or less-subtle ways. As a result while the UI is perfectly passable for adding one or two rules, or making minor modifications to existing rules, it really really really sucks when you need to input two hundred or so of the bastards.

I'm not making a particularly controversial (or new) statement when I say that when developing some project, having the development team “eat their own dogfood” is a very useful technique. If you are writing an email application, have the developers adopt it for their email. If you are writing an IDE, have the developers use the IDE to write its own next version. Nothing focuses a developer more than the desire to fix something that's annoying them personally.

When you're writing a commercial rules engine product, you're not likely to be eating your dogfood. You're going to be testing it, you're going to be demo-ing it, but you're not going to have that day-in, day-out experience where every problem becomes a personal annoyance, rather than an abstract bug report.

This is an advantage Open Source can have. The successful Open Source frameworks are those that the authors wrote to use themselves as part of some larger project, and then released to the outside world. A commercial product is more likely to have been written specifically for the purpose of being sold commercially, and not used in-house (of course, many commercial products can be both). While this may sound like you end up in Open Source with something that was a by-product, an after-thought, the truth is that what you get is some very well-chewed dogfood3. You get the John West Seal of Approval: the product that the developer wanted to use personally.

There's a flip-side, of course. One example of this is JBoss, and its mass of thinly documented XML configuration files. Of course the developers know exactly how they work. They know where to find everything so they don't find the complexity annoying4. Find that a problem? It's Open Source and you can fix it yourself. However, by the time you know enough about a product to fix a problem like this yourself, you know enough not to be experiencing the problem any more. Hence it falls down the back of your priority list. Catch-22.

1 except to say, since some people might assume it from the fact I'm primarily a Websphere consultant, it's not an IBM product.
2 for God's sake. If you're going to use XML as a serialisation format, design your schema to be human-readable and human-editable. Otherwise what's the point of using XML? A binary format would be so much more efficient.
3 the great thing about this supposedly approving metaphor is how disgusting it is when you really think about it.
4 insert your favourite theory about wanting to sell documentation and/or consulting here, if you like.

Online polls are one of the facts of life of the blogging ecosystem. There's usually one or two of them on the various blogging meme lists, and Livejournal's Meme page is usually full of them.

The way these polls work is you fill in the answers to a series of (often quite personal) questions, and after pressing the submit button you are told which Buffy character you are, what car you should drive, how long you have to live, or which fruit you should be using as a sex-aid, all provided with a cute graphic, and a snippet of HTML for you to cut/paste into your weblog or Livejournal to tell the rest of the world what some CGI script thinks about you.

Personally, I think such polls are a blight on society. If I had a lot of spare time and absolutely no morals, I'd fight back.

Step one: Create a site where people can create their own polls. This will mean a lot of the work will be done for you, and you'll benefit from the viral memes of others. Seed the site with a few somewhat raunchy tests (stealing questions from the Purity Test would work well) that are prominently linked, and then create one or two quizzes with your more quiz-using friends to spread use of the site through the network.

Obviously, the quizzes shouldn't ask what the identity is of the person who is answering them. People don't want to think their exact answers can be traced back to them, after all. They just want a funny picture for their website.

As for the funny pictures, your system should work by having people upload those graphics to the quiz server. The HTML snippet from the results page will refer to the image on the quiz server - you don't encourage the user to copy the image elsewhere. This will cost a lot of bandwidth, but it's vital for the exercise.

Six months later, open a new web-site. On the front page, people can enter a Livejournal username or blog URL, and see all the questions that person has answered, and what the answers were. Hopefully, your high-profile Purity Test quizzes will have netted quite a lot of people by now.

How's it done? Easy. Every result graphic URL, faithfully cut and pasted from the quiz site is unique, linked to the quiz that was answered. A web bug. By tracking your referrer logs it should be easy to connect the recorded quiz answers to the person on whose site the results were posted.

Did I mention I hate these quiz things?

(Issue ROL-216)

Dear Roller maintainers. ' is not a valid HTML entity reference. The definitive list of HTML entity references is here, and ' is not on it.

' was introduced as a standard entity in XML, and thus is also standard in XHTML. Even if you are using XHTML, if you wish to produce web-pages that are backwards compatible with browsers that do not support XHTML (and IE is one of them), you should avoid '.

If you're desperate, you can use ' instead. (See also: the backwards-compatibility section of the XHTML standard)

Even if you're serving valid XHTML with an XML DOCTYPE, there is still significant controversy as to whether user-agents should handle it as XML unless it is also served with the text/xml MIME-type (which would cause IE to display the page as a parse-tree)

Most of the time, it is fine to leave apostrophes unescaped in XML. XML's escaping rules are such that if some character's meaning is unambiguous, it need not be escaped. For example, if you have the tag <foo>, you only need to escape it as &lt;foo>. By just escaping the less-than sign, you make the meaning of the greater-than unambiguous, and therefore it need not be escaped itself. Since apostrophes have no special meaning outside of tags, they need not be munged in regular text.

As an aside, it's amusing to see IE penalised for following the standard. :)

Spotted this review on IMDB:

Summary: Probably one of the most intelligently written screenplays of alltime.

Any idiot can sit down and spend four or seven years of his life writing out his "masterpiece." You do some research, you do some hard work, you get a little help from friends and family, and you get it done. But, it takes a true writing genius (or geniuses, in this case) to create something as original as "Bill & Ted's Excellent Adventure."

A tiny corner of my mind screams at me that this guy could actually have been serious.