October 2002

« September 2002 | Main Index | Archives | November 2002 »

31
Oct

Misc 0x0001

  • 10:09 AM
  • Jamie Zawinski has had it up to here with GNOME., considers porting xscreensaver to OS X.
  • A Brief Introduction to Writing a Brief Introduction... —a funny-once take on why kuro5hin is such a load of wank.
  • An old .sig quote from alt.sysadmin.recovery: “The only thing worse than a user with a good idea is a user with a good idea and a screwdriver.” —Peter da Silva in <6qsme8$7jn@web.nmti.com>
  • Dear Sun. By your own coding conventions, it should be arrayCopy, not arraycopy. Bastards.

Mark Pilgrim came across a wav of someone shouting ‘Hey! I'm looking at gay porn!’ in his referrer logs. A co-worker at a previous job once encountered this file in his inbox. Of course, the fates being as they are, he encountered it precisely at the moment the managing director of the company was in an adjacent office.

Soon after this, we discovered a whole bunch of new entries in /etc/aliases. Every incoming mail was being silently redirected to the director's inbox. By the next staff meeting, he had a list of all the distribution- and mailing-lists we were to get off now. Needless to say, that's about when I started looking for a new job. Aside from the issue of having what we thought was private mail silently redirected, we all (well, with one significant exception, but that's a story for another day) worked our asses off, and he wanted to begrudge us the five minutes a day it took to read a few jokes.

So 2001...

  • 6:54 AM

If you look really closely at the icon for Omni Outliner files, it's a rendition of ‘All Your Base’

Personal Space

  • 6:35 AM

One of the surest ways to get on my nerves very, very quickly, is to touch me. I have this very strong sense of personal space, and strangers touching me just makes me cringe.

I don't mind the normal, incidental contact that's a part of living in a big, crowded city. When the train's full, I'll sardine myself in with the rest of them. But on the other hand, I tend to only shake hands when it's necessary. In family gatherings, I'm the one standing at the back trying to avoid having to hug a thousand people I only see every five years. And I absolutely loathe having a conversation with someone who feels it's necessary to put their hand on your arm to emphasise whatever point they decided to make. It makes me want to crawl out of my skin.

Exceptions are made for close friends, family, or people I'm attracted to.

Well, the Caesar salad was nice, even if I did get progressively more and more annoyed as the evening progressed.

Sometimes, you just have one of those days. One of those days where you manage to find flow, and stay in it. One of those days where you move from test, to code, back to test, back to code with ease. One of those days when you don't feel like you're fighting the environment (the fact that I didn't have to do any EJB work helped there).

One of those days when coding feels like you've been given a block of wood, and you're carefully carving the edges away, polishing the grain. You may not end up with a big pile of code, but everything that's left serves some purpose. One of those days when you delete a method and smile, because your code is that much more elegant. Every line is justified, and contributes to the beauty of the work.

One of those days when you can stick "Everything below this point is done by magic" in a comment.

One of those days you remember why you got into this fucking annoying business in the first place.

Addendum: Found this quote in the Jargon File entry for hack-mode

...the sensation of being in hack mode is more than a little habituating. The intensity of this experience is probably by itself sufficient explanation for the existence of hackers, and explains why many resist being promoted out of positions where they can code.

favicon.ico

  • 10:52 AM

Just to show off my complete lack of artistic skill, both The Fishbowl and my livejournal have a k-rad favicon.ico. It took me about 45 seconds to throw together in the Gimp, but hey it almost looks somewhat like it might be a goldfish, right?

I have become vanity site, destroyer of worlds.

Addendum: you know you're in for a really bad day at work, when “BCDIC or EBCDIC?” becomes an important question.

Given the massive confusion exhibited here, I've written a nice, simple guide on how to implement HTTP's Conditional GET mechanism, with regards to producers and consumers of RSS feeds.

This article presumes you are familiar with the mechanics of an HTTP query, and understand the layout of request, response, header and body.

What is a conditional get?

My full-length RSS feed is about 24,000 bytes long. It probably gets updated on average twice a day, but given the current tools, people still download the whole thing every hour to see if it's changed yet. This is obviously a waste of bandwidth. What they really should do, is first ask whether it's changed or not, and only download it if it has.

The people who invented HTTP came up with something even better. HTTP allows you to say to a server in a single query: “If this document has changed since I last looked at it, give me the new version. If it hasn't just tell me it hasn't changed and give me nothing.” This mechanism is called “Conditional GET”, and it would reduce 90% of those significant 24,000 byte queries into really trivial 200 byte queries.

Client implementation

The mechanism for performing a conditional get has changed slightly between HTTP versions 1.0 and 1.1. Like many things that changed between 1.0 and 1.1, you really have to do both to make sure you're satisfying everybody.

When you receive the RSS file from the webserver, check the response header for two fields: Last-Modified and ETag. You don't have to care what is in these headers, you just have to store them somewhere with the RSS file.

Next time you request the RSS file, include two headers in your request.. Your If-Modified-Since header should contain the value you snagged from the Last-Modified header earlier. The If-None-Match header should contain the value you snagged from the ETag header.

If the RSS file has changed since you last requested it, the server will send you back the new RSS file in the perfectly normal way. However, if the RSS file has not changed, the server will respond with a ‘304’ response code (instead of the usual 200), where 304 means ‘Not Modified’. In the case of a 304, the response will have an empty body and the RSS file won't be sent back to you at all.

There's a temptation for clients to put their own date in the If-Modified-Since header, instead of just copying the one the server sent. This is a bad thing, what you should be sending back is exactly the same date the server sent you when you received the file. There's two reasons for this. Firstly, your computer's clock is unlikely to be exactly synchronised with the webserver, so the server could still send you files by mistake. Secondly, if the server programmer has followed this guide (see below), it'll only work if you send back exactly what you received.

Server Implementation for Static Files

If you are using one of those weblogging tools that just sticks regular files on a regular webserver (e.g. or Moveable Type), your webserver will almost certainly already follow the get standard. HTTP 1.1 has been around 31 years now, and there's really not much of an excuse for anyone to not be following it.

One thing you'll have to watch out for, though, is if your site's RSS file is regenerated frequently even when it's not changed. If that happens, the server won't be able to keep track of the last modified time properly, and you'll get people downloading the file even when it's not changed. The solution is for the writers of weblogging tools to optimise their software to make sure that files are only updated if they've actually changed in some way. (i.e. have them generate the new file, compare it with the old one, and if they're the same leave the old one untouched.)

Server Implementation for Dynamic Content

If you've got a weblogging tool that re-generates the RSS file every time a request is made, there's a little more work to do. This section is aimed more at the writers of the tools than at the user, because it's the tool writers that need to fix their software so that it follows the specs.

I'll concentrate purely on RSS files, but the concepts used here can be applied to any page in the weblog, and may further reduce the bandwidth usage for your users.

In your RSS feed generator, you'll have to keep track of two values: the time the file was last modified (converted to Greenwich Mean Time), and an “etag”. According to RFC2616, the etag is an “opaque value”, which means you can put anything you like in it, providing you stick double-quotes around the whole lot. The time in the Last-Modified header needs to be formatted in a certain way, though, the same format used in email headers. For example, ‘Mon, 17 Sep 2001 11:54:29 GMT’.

Whenever someone requests your RSS file, send those values for the Last-Modified and Etag headers. Every web scripting language allows you to add and remove headers like that at will, just check the manual if you don't know how.

Now for the other bit. Whenever someone requests your RSS file, check the headers of their request for an If-Modified-Since header, or an If-None-Match header. If either of them are there, and if [deleted either ] both of them match the values you were planning to send out with the file, then don't send the file. Once again, consult your manual to see how to send back a "304 Not Modified" reply instead of the "200 OK" that you normally would. If you send back the 304 reply, you don't have to generate the RSS file at all. Just send out the headers, followed by two linefeeds to show the headers are done, and the client will know there's nothing else coming.

Technically, what you should do with an If-Modified-Since header is convert it to a date, and compare it with your stored date. However, 90% of the time you can get away with just doing a straight match, so it's probably not worth the effort.

How do I calculate the Last-Modified date?

Easy. It's the time that the most-recently-changed item in the RSS file was modified. Something like that should be pretty easy to store and fetch.

What should I put in an etag?

The Apache server uses a hash of the contents of the file. This isn't necessary though. All the eTag has to be is something that changes every time the file changes. So it could be a version number, or it could even be exactly the same as the Last-Modified date, just in double-quotes.

2002-11-11 Update: A number of people have written to me to remind me of HTTP's Gzip Content-encoding (compressing the files during transfer). This is a little beyond the scope of this essay. The worst thing you can do when suggesting a solution to a problem is to provide alternatives, people end up arguing the alternatives instead of implementing the fix.

1 in the original version of this document, this read ‘13 years&rsquo, because the author can not count. Mea culpa.

Lutz Prechelt, An Empirical Comparison of C, C++, Java, Perl, Python, Rexx, and Tcl. Emphasis mine.

80 implementations of the same set of requirements are compared for several properties, such as runtime, memory consumption, source text length, comment density, program structure, reliability, and the amount of time required for writing them. The results indicate that, for the given programming problem, which regards string manipulation and search in a dictionary, “scripting languages” (Perl, Python, Rexx, Tcl) are more productive than “conventional languages”. In terms of run time and memory consumption, they often turn out better than Java and not much worse than C or C++. In general, the differences between languages tend to be smaller than the typical differences due to different programmers within the same language.

The number one rule for starting an Open Source project. Never, ever, ever start a project without having working code that people can compile, run and play with.

If you don't start with it, you'll never develop anything.

If Linus had posted to the comp.os.minix newsgroup: “I have this neat idea. Let's write a free Unix-like operating system that runs on a 386”, maybe FreeBSD would be a lot more popular these days. Linus didn't, though, he posted working code that could run a variety of Unix applications and could be expanded in lots of different directions.

Having something that works gives you three important things (because all lists must have three parts)

  • People can work on the feature that interests them the most, and not worry about all the un-interesting-to-them things that might have to be done to get it to work. And since everybody finds different things interesting, you'll expand in lots of directions.
  • The person who wrote the original code is the project's natural leader. You get a natural respect from bootstrapping the project, even when everyone else's code starts to outweigh yours. And small projects need a dictator.
  • The base architecture of the project is already laid out, so you don't need to spend six months arguing over interfaces and coding standards.

The code doesn't have to be perfect. If your VM handler sucks, somebody interested in virtual memory will rewrite it. But if you don't have a VM handler at all, the guy who wants to make your scheduler better is going to look for somewhere else to exercise his talents.

After a thread on the webappsec mailing list, I spent some of yesterday coming up with a guide to password recovery practices for public web applications. It's still under development, of course, so any suggestions are welcome.

It's available as a PDF, and to fulfil my obligations under the GNU FDL, as LaTeX source

The kind folk at Google have also saved me some effort by caching the document as HTML.

The Manual

  • 6:02 PM

The KLF: The Manual: (How to have a number one the easy way). Annoying lack of capitalisation from the original:

be ready to ride the big dipper of the mixed metaphor. be ready to dip your hands in the lucky bag of life, gather the storm clouds of fantasy and anoint your own genius. because it is only by following the clear and concise instructions contained in this book that you can realise your childish fantasies of having a number one hit single in the official u.k. top 40 thus guaranteeing you a place forever in the sacred annals of pop history.

other than achieving a number one hit single we offer you nothing else. there will be no endless wealth. fame will flicker and fade and sex will still be a problem. what was once yours for a few days will now enter the public domain.

I was reading this thread on www-html about the possible addition of contentEditable to HTML. (found via manero.org), and specifically this post by Christian Hujer:

Forms are not the task of HTML anymore. They are the task of XForms. Why should only HTML contain forms, but not XSL:FO, SVG, SMIL, MathML, DocBook? So XForms will be the pluggable language to add forms to content.

This really sums up the situation with HTML as far as the W3C is concerned. HTML4 is dead. XHTML1 was just a transition from a monolithic SGML format to a modular XML format. The as-yet-unfinished XHTML2 is the only way forward for the standardized web. Thus, any improvements we might want to make to the specification must end up in, and conform to the goals of, XHTML2.

For those not following the game, XHTML2 is most notable for the fact that it is incredibly backwards-incompatible with HTML. Where XHTML1 just required you to remember to close your tags in the right order, and maybe replace one or two attributes, XHTML2 does some pretty radical things like get rid of the <img/> tag in favour of <object/>, and completely replace HTML forms with XFORMS, which is about twice as complicated.

Add up the three years it's going to take for the standard to be acceptably implemented in browsers, and the three years after that until the majority of users have got around to upgrading. And even then, it's going to be even harder to convince web authors to use XHTML2 than it was to convince them to use XHTML1, because it's such a radical change to what they were used to. HTML4 is going to live a very long time.

(As an aside, if you haven't yet, go read Ian Hickson's description of why we shouldn't even be using XHTML now, because browsers don't properly accept text/xml documents)

So, as far as the W3C are concerned, if we want any new, standard features, we're going to have to wait six years for them, and we're going to have to dive into the XHTML 2 Brave New World in order to use them. This leaves the browsers in an interesting position. The browser vendors want to deliver nifty things that users like. With the standards as they are being in this time-warp, and the average web-hacker most likely to want to stick with the moribund HTML4, seeing no reason to switch to something more complex and less forgiving, it is inevitable that the browsers are going to extend HTML4 in weird directions out of the control of the W3C.

The W3C, on the other hand, can't really go back to adding things to the old SGML-based HTML, because that would undermine the efforts of the push towards XHTML, an important effort that is being done for all the right reasons.

It's a tough one, innit.

My life

  • 5:55 PM

The Wayback Machine found something I'd forgotten about. Once upon a time, back in 1998 or so, I kept a list on my homepage of things that I wanted. On it was “A life. Preferably in a metallic blue, or maybe dark green.” Soon after, Lonita sent me this:

If you were using a graphical browser, or had images turned on, you'd see a little pulsating ball.

In The Daily Adventures of Mixerman, a sound engineer keeps a diary of his life making an album, the names being changed to protect the guilty. The whole thing may very well be an invention, but its an entertaining one. It gives a lot of insight into how a band with a $2 million advance, an incompetent drummer and a producer who doesn't even show up the first week might make an album.

Or fail to make one. (Link found via RockNerd)

Links for Hire

  • 10:29 AM

Mobius
Russell Beattie, on the whole “Microsoft pays for bloggers to attend a conference, bloggers don't mention it&rdquo; saga:

I really only have one thought about this: WHEN THE HELL IS SUN GOING TO SEND THE JAVA BLOGGERS OUR TICKETS TO A SIMILAR GIG?!?!?!?

Sun! Get on it! I have no scruples! I promise I won't mention ANYTHING about you're paying for a flight back to San Francisco for me!

Dear Sun. I also have no scruples. If you pay for me to fly to San Francisco, I will blog whatever the hell you want me to.

I found the winners of the 2002 Bulwer-Lytton Fiction Contest via RageBoy's weblog. First, the contest itself:

An international literary parody contest, the competition honors the memory if not the reputation of Victorian novelist Edward George Earl Bulwer-Lytton (1803-1873). The goal of the contest is childishly simple: entrants are challenged to submit bad opening sentences to imaginary novels. Although best known for The Last Days of Pompeii (1834) and the phrase, "the pen is mightier than the sword," Bulwer-Lytton opened his novel Paul Clifford (1830) with the immortal words that the "Peanuts" beagle Snoopy plagiarized for years, "It was a dark and stormy night."

The Winner...

On reflection, Angela perceived that her relationship with Tom had always been rocky, not quite a roller-coaster ride but more like when the toilet-paper roll gets a little squashed so it hangs crooked and every time you pull some off you can hear the rest going bumpity-bumpity in its holder until you go nuts and push it back into shape, a degree of annoyance that Angela had now almost attained.

...who last year won the ‘Detective’ category with:

The graphic crime-scene photo that stared up at Homicide Inspector Chuck Venturi from the center of his desk was not a pretty picture, though it could have been, Chuck mused, had it only been shot in soft focus with a shutter speed of 1/125 second at f 5.6 or so.

Russell Beattie asks, “What were you doing in 1993?”

In 1993, I was going through the first year of Arts/Law (a degree I was destined not to complete) at the University of Western Australia, having finished high-school the year before. I studied (or attempted to avoid studying) Contemporary English Literature, Philosophy, Psychology, Contract Law, Criminal Law, and The Legal Process. I actually got quite good grades in them, except for Legal Process which had lectures very late on a Thursday afternoon, and to which I (and several others) would inevitably show up drunk.

I spent most of the year hanging around with people who I knew from school but didn't particularly like, because at least it avoided the hassle of trying to meet new people. Big mistake. Other interesting things that happened that year mostly involved me being thrown out of the University tavern for being underage, and having my eyebrow shaved off at someone else's 18th birthday party.

I had not yet discovered the Internet.

I expect most of you have read this before, but Terry Bisson's They're Made out of Meat is always worth a read. Or like I did just now (thanks to Lonita) popping back to and re-reading after a few years of forgetting it was there.

From RFC 2616

The Referer[sic] request-header field allows the client to specify, for the server's benefit, the address (URI) of the resource from which the Request-URI was obtained (the "referrer", although the header field is misspelled.) The Referer request-header allows a server to generate lists of back-links to resources for interest, logging, optimized caching, etc. It also allows obsolete or mistyped links to be traced for maintenance. The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard.

Could authors of news aggregators please stop putting the URL of their product page in the referer header? It is in no way the “resource from which the Request-URI was obtained” and as such It's a clear violation of the RFC. The place to identify the client is the User-Agent header.

Book Idea

  • 4:59 PM

New idea for a book, suggested by a cow orker:

Pair Programming with Tyler Durden

Why I'm not hanging out for handwriting recognition any time soon:

a page from my security notes shows how abysmal my handwriting is

UML Uses

  • 11:20 PM

Russell Beattie writes:

Anyways, I've since seen the value of UML, but not as a programming aid per se. I need to see that in practice with my own eyes, because I don't see it helping all that much. But what IS nice is when you're trying to communicate process flow on the whiteboard, or in a document and having a common graphical vocabularly to use.

Me too. I know about enough UML to draw pretty sketchy class, state and interaction diagrams. UML works very well as a common vocabulary for communicating design between developers, or for documenting critical parts of a design. There's nothing magical about UML, though, and sometimes it's possible to go overboard and spend more effort drawing the pictures than you get value out of them.

I'll happily admit that I probably get a thousand technical details in my diagrams wrong. That's cool because I'm only using them to communicate a pretty broad intent—I don't even expect the code to be 100% faithful to the diagram. That's another advantage of whiteboards, you can wipe the pictures into oblivion when they're no longer useful, and they don't leave confusing artifacts behind.

If you're doing enough detailed UML that you can take advantage of those tools that generate code from the diagrams, you're doing far too detailed up-front design in my book.

Disclaimer: I'm slowly moving this weblog from its rather soupy default templates to correct markup, but I generally can't play with HTML for more than about fifteen minutes at a time before getting bored. Do what I say, not what I do. :)

Dave Winer, What Is Tag Soup?

The Web is tag soup. People use blockquotes to indent. Even though the REST folk argue that it's anti-Web to do RPC, people do RPC anyway. There's a never-ending list of complaints, but they can be resolved. That's why I'm writing this little essaylet. [...] You can't put the genie back in the bottle. Only by making your world very small can you fail to see the enormity of getting everyone to see it your way. Better to adapt your thinking to their way, and see how you can make your vision fit into what is.

No.

The most significant cause of Tag Soup has been the fact that the tools we have to work with are always catching up with what we want to do with them. The reason people use <blockquote> for indenting (as they used <ul> before it) is because not long ago it was the only way to indent text without using really complicated tables or spacer images. Now that the technology to indent text using CSS is widespread, we developers can migrate to more correct, and more reliable ways to indent.

The tools aren't there yet, of course. The standards don't support everything we want to do, and the browsers don't support all the standards that do exist. We've got years to go yet, but that doesn't mean we shouldn't be moving in the direction of semantically correct web-pages.

The movement is already happening. Like all good Internet movements, it started in the hands of the technical practicioners, shown by the proliferation of weblogs that are moving away from table-based layouts and towards CSS styling. Any web designer who shows pride in their work will want to use the tools correctly. If there are two ways of doing something, the correct way and the hacky way, a professional will want to do things the correct way.

Wired has shown that a professional publication can follow the same route.

There are three things that, if we continue to do them as a community, may not eliminate tag soup, but will ensure everyone knows it's a bad thing, and at least tries to avoid it:

  1. Continue to improve both the standards and the browsers. Wherever web designers are doing something the hacky way because there is no way to do it correctly, provide a way to do it that doesn't break the semantic structure of HTML
  2. Continue to evangelize the correct way. Continue to let designers and page authors know that there is a correct way, and there are good reasons to adopt it.
  3. Write applications that take advantage of the benefits of correctly structured web pages. Nothing would speed up the adoption of standards more than a killer application, the function of which applies better to valid pages than it does to invalid pages.

The Price Curve

  • 7:18 PM

Dave Johnson on the price/quality curve:

Let's talk about you. What do you think? Is there any truth to my theory? Am I totally wrong about this? Is there any insight here, or am I just being petty? Does the ROI of the big expensive enterprise software make it the most valuable of all, quality be damned?

Big expensive enterprise software occupies a completely different sphere of existance compared to consumer products such as Photoshop or Visual Studio. Since I've previously used Websphere as an example, and my employer sells Websphere, this time I'll pick a completely different product to talk about that I've dealt with, but have no financial interest in: SAP.

SAP qualifies as being really big, expensive software for which you need a team of consultants just to install. It doesn't hold a lock-in position on the market, but people continue to use it. There has to be a reason that this doesn't matter, otherwise SAP would have a competitor with a one-click installation procedure. That's how the market works. SAP requires a similarly big team of consultants to customize, but there has to be a reason this is necessary, otherwise SAP would have a competitor with an easier customisation system.

The answer is that there is no way that SAP could possibly, ever come up with a product that could fit even a single business. SAP isn't an application you can just plug in to a business. It's not like Quicken or MYOB, which are sold to companies small enough to be able to mold their business processes around their accounting software. While SAP (arguably) delivers a very valuable framework into which you can encode business processes, in order to get SAP to do anything worthwhile for your business you are going to have to do complicated things with it. And as programmers, witnesses to a thousand failed attempts to “dumb down” programming, we know that in order to do complicated things, you need to employ skilled people.

If you're going to have to employ very skilled people to get your appserver, e-commerce suite or ERP package to fit in with your business, then the additional overhead of having said people perform installation, and deal with the niggly annoyances of the product's user-hostility is only a very small fraction of their total cost. If you're hiring these guys for six months, and it'll take one of them two days to set up the software right, it's not really a competitive advantage for another product to have a really simple installation procedure.

Crappy development tools, on the other hand, I could rant about for a long time. If you have development tools that suck, you move from your expensive developers losing a small, fixed amount of time on the inefficiencies, to losing a proportion of every day you're paying them for wrestling with the product. At that point, I start wondering what the hell is wrong with some people.

Ravioli Code

  • 3:57 PM

A co-worker emailed this to me—the Complete Pasta Theory of Software:

The ideal software structure is one having components that are small and loosely coupled; this ideal structure is called ravioli code. In ravioli code, each of the components, or objects, is a package containing some meat or other nourishment for the system; any component can be modified or replaced without significantly affecting other components.

We need to go beyond the condemnation of spaghetti code to the active encouragement of ravioli code.

I've been making tweaks to the look of my weblog ever since I put it up, gradually adding the various bits and pieces required to make it work. (For example, until this morning it didn't contain any reference to my real name, which confused at least one reader, and could possibly get me in trouble with the (ISC)2 code of ethics1). As I go, I've been testing it with all the browsers I have lying around: Chimera, Galeon, an obselete beta of OmniWeb, lynx, Internet Explorer 5 for Mac, and Internet Explorer 5.5 for Windows. It seems happy with all of them, although IE5 Mac refuses to display the title graphic.

That is, until a friend looked at the page in Internet Explorer 6, and asked why it was the text on the right hand side kept vanishing?

I found a machine in the office that had IE6 installed, and easily replicated the problem. Now I don't pretend that my site even comes close to validating as correct HTML, but that's no excuse for a browser to be so flaky as to have things randomly appearing and disappearing as you scroll up and down the page.

This isn't the first time I've had problems like this with Internet Explorer and CSS. Not by a long way. In a normal competetive environment, a browser that behaved so flakily the moment you tried to do anything complicated would lose market-share to its more able competitors, and be forced to improve its rendering. However, once a product attains the near-monopoly position that IE has, a reality distortion field comes into effect. It's no longer IE's fault that my page doesn't display correctly, it's my fault for writing a page that triggers one of IE's bugs.

This is not a good state of affairs.

1 While it doesn't appear on that page, in order to enrol for my CISSP exam, I had to sign a form stating that I had never gone by any alias online that wasn't identified with my real identity.

Joel Spolsky notes the release of the second edition of Dynamic HTML: The Definitive Reference. This is the only HTML book I've found more useful than the original specifications. It's as comprehensive as the specs, while also including a lot of important information on Java cross-browser compatibility, and cross-referencing between the four different standards that make up web pages (HTML, DOM, CSS and ECMAScript).

Too many notes?

  • 12:33 AM
From the programme of the opera I went to see tonight:

Britten was once asked to identify the difference between his new opera, The Turn of the Screw, and his opera of 1946, The Rape of Lucretia. ‘The title is different, and the story’, replied Britten. ‘Oh yes, of course, but the music, Mr Britten – what would you say was the difference between the music of The Rape of Lucretia and The Turn of the Screw?’ ‘The notes are the same, but they are in a different order.’

Ned Batchelder owns up to a heinous sin:

Brent's Law of CMS URLs is simple: the more expensive the content management system, the longer and uglier the URLs they produce.

This resonated with me, both because I have experienced these impossible-to-use URLs as a web surfer, but also because I helped make some of them, by being one of the developers of the Domino Web Server.

Domino produces URLs that look like this: http://www-10.lotus.com/ldd/sandbox.nsf/ 85d5b6903071400e8525676d0079b3ae/ 6bcca234153471348525689a0070bc43?OpenDocument

As someone whose job involves frequent visits to IBM's website, I am intimately familiar with those kinds of URL. Sir, I hereby pronounce you evil, and sentence you to a year with only Jakob Neilsen for company.

Ask Slashdot: What Would You Do With a New Form of Encryption?: “I have come up with a new form of encryption that's better than a one-time pad. Should I patent it?” Or, “Dear Slashdot editors. April 1 was six months ago.”

Some guy comes up with a claim that any reader of Crypto-gram would know is laughable, and then uses it to push Slashdot's hot-button on patent issues. Ten points out of ten for trolling audacity, minus several million points to Cliff for posting it.

Ask Slashdot always makes me laugh, though. Who in their right mind, when confronted with a problem, would think “I know! I'll ask Slashdot readers!”

Shelley Powers has a strange brain.

Today, though, the group was quiet, much quieter than usual, because one of their members, PHP, was not its usual cheerful self. In fact, one could say that PHP was in a true funk, if one had a mind to say something like that aloud, or within the hearing of one's boss. Or doctor.

Why the blues, PHP, the other languages asked. All the languages that is but C, because all C ever said was "bite me", being a rude language and hard to live with, but still respected because it was such a good worker.

Exhibit A, your honour. An options list from a website's registration form containing 341 different salutations, including “Honourable Judge”, “Senator”, “Dpty. Commissioner”, “Kepala” and “YB Dato' Paduka Bijaya Di Raja”, but shamefully missing ‘Mademoiselle’.

It was mailed around the office by a cow-orker. The fact that this was found on a real website is truly, truly scary. The fact that another cow-orker replied with a similar example from a completely different site is even scarier. You'd think at some point, someone would have thought “Hey, let's just use a text field.”.

Hours of fun.

  • 4:40 PM

YKYBHTLW...

  • 3:27 PM
You know you've been hacking too long when somebody says “Axis of evil”, and you think “Hey, SOAP isn't that bad...”

The Joy of HTML

  • 1:01 PM

I just looked at my weblog homepage in Netscape 4.78 for Linux. If you get the chance, you should try it, it's really funky. It looks like William Burroughs cut my page up and pasted it together in random blocks.

The page seems randomly broken into chunks and then pasted together.
full-size...

Mark Pilgrim talks about evolvable formats, specifically the mess that is RSS 0.9x/2.0:

The problem with that list of RSS deficiencies is that it is also a list of necessities—RSS has flourished in a way that no other syndication format has, not despite many of these qualities but because of them. The very weaknesses that make RSS so infuriating to serious practitioners also make it possible in the first place. (Pilgrim)

Exactly how many times are we going to rewrite Richard Gabriel's The rise of “Worse is Better” in our lifetimes? While it's great to strive for perfection, it's inevitable1 that when you sacrifice perfection for convenience, you're far more likely to succeed.

From Worse is Better we have the “correct” approach:

  • Simplicity-the design must be simple, both in implementation and interface. It is more important for the interface to be simple than the implementation.
  • Correctness-the design must be correct in all observable aspects. Incorrectness is simply not allowed.
  • Consistency-the design must not be inconsistent. A design is allowed to be slightly less simple and less complete to avoid inconsistency. Consistency is as important as correctness.
  • Completeness-the design must cover as many important situations as is practical. All reasonably expected cases must be covered. Simplicity is not allowed to overly reduce completeness.

vs the approach that inevitably succeeds:

  • Simplicity-the design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.
  • Correctness-the design must be correct in all observable aspects. It is slightly better to be simple than correct.
  • Consistency-the design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.
  • Completeness-the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.

The original essay was about C vs Lisp, and played very hard on the computational efficiency side of the argument, but “Worse is Better” holds true across the board. Just look at Java vs Smalltalk, or RSS vs RSS.

Addendum: Mark's essay on RSS is adapted from Clay Shirky's essay on HTML/HTTP, which is really just more evidence for the underlying conclusion. Worse is better.

1 This was originally the typo ‘ineviable’, which I am convinced was a Freudian concatenation of ‘inevitable’ and ‘unenviable’

From Joe's Jelly:

Booo. XML is designed for representing data. Programming languages are designed for representing behavior. Have a look at the examples of oXML and try writing them out again in a dynamic OO language like Python or Ruby. The XML versions are massively overcomplicated and very hard to read (XML has a noisy syntax). Remember, machines have no problems reading code, it's humans you need to think about.

I can not agree with this enough. XML is a great language for structured data, terrible for programming anything more complicated than a simple series of instructions (à la ant). This is also one of the many reasons I detest XSLT.

The Move

  • 11:48 PM

Warning, I've had a few beers and I'm rambling.

So. Why did I move? A combination of reasons really. The general clunkiness of . The inability to update unless I had my laptop with me. The insuficient access logs from Userland's server. The annoyance of being “0100190” instead of using the domain I've owned for five years now. The artificial separation of my two weblogs. Inertia.

But I'm here now. I have given up what is a pretty influential position with Google, even after the update, to forge out under my own name.

Movable Type looks bloody good so far. It seems very full-featured, very easy to use, and it didn't take very long to set up at all (although I'm not sure how that would translate to someone who hadn't once made a living as a Perl programmer and Unix sysadmin). The instructions are comprehensive, and the interface is very elegant.

I need to write my own page templates, though. I'll probably base it on my livejournal template, but that's a job for next weekend.

Exporting from Radio took a while. I found a tool that would do the job for me here, but whoever wrote it (a) knows nothing about localisation, and (b) got it wrong anyway. I made a huge number of false starts because the MT importing format wants dates to be mm/dd/yyyy, but the exporter writes them out (at least on my machine) as d/m/yy. Also, MT expects a newline between BODY: and the body of a post, but the exporter doesn't deliver. Luckily, that's all the sort of thing that can be solved with regular expressions.

Exporting from Livejournal took a lot less time. I wrote the first part of a Java library for talking to Livejournal in January. On top of that, the exporter was only 130 lines of code. It was fun watching STDOUT as my journal was exported. I commented to Lonita that it was “like watching your life flash before your eyes in fast-forward”

So The Fishbowl is a merging of The Fishbowl Diaries with The Desktop Fishbowl. I'm not sure how merging my personal and geeky writings is going to work out. It works for Mark Pilgrim, but he's quite a bit more eloquent than I am. In the near future I shall create categorised RSS feeds, so people can subscribe to what they're interested in, and no more.

Two important things I learned this week:

  1. It is very unfashionable to drink Chardonnay
  2. Thursday is the new Friday, except in London where Wednesday is rapidly becoming the new Thursday.

I just ran the MacOS X Software Update, and one of the items in it was the fix for the Internet Explorer certificate-chains bug. Great, I'll install that, I use IE every so often, when a site is too broken to work in anything else. After the installation was complete, I went back to NetNewsWire Lite, double-clicked on an article, and what do you know, instead of another tab opening in Chimera, Internet Explorer opens. The update had changed my default browser preference.

Bastards.

To anyone working in a software company. My computer is my own. You do not make assumptions as to how I want to use my computer, nor do you make assumptions as to how much I want to use your product, just because I happen to be installing it.

The worse culprits in this kind of thing are probably Real. Real seem to have an entire marketing strategy focused on annoying the fuck out of their users until said users refuse to ever install a Real product again. I know that's the state I'm at, and so are quite a few of my friends. When there's a site that says “requires Real Player”, my reaction is “Oh well, I can't hear that”.

My most common mistake in Java is typing StringBugger. The F and G are right next to each other.

[Joe's Jelly]

I find I rarely make mistakes like that while coding Java—I use autocomplete for anything longer than three characters anyway. On the other hand, when pair-programming on a Java 1.1 project, and I was dictating code for my pair to type, I used to say things like “enumeration dot nextElephant”to see whether I could get him to type it by mistake.

Snippet

  • 11:36 PM

Any sufficiently advanced beauracracy is indistinguishable from magic. Complete with the strange incantations and sacrifices.

My two most common typos when writing HTML by hand are trying to close <acronym> tags with </a>, and more embarrassingly, mistyping <cite> as <cute>.

Quoth Mike

In the office all our machines are named after Muppets (Gonzo, Scooter, Beaker, Bunsen, Cookie, Grover etc), our servers are all named after Greek gods (Zeus, Bacchus - son of Zeus etc).

At my university, the servers were all named after composers (Mozart, Liszt, Handel etc).

What wacky naming schemes do you have?

Let's see. My personal machines are named after states of religious enlightenment, because I started running them back at Uni halfway through an Eastern Philosophy course. The desktops are satori, nirvana and gnosis, while the laptop is epiphany. That last one is a very obscure religious joke that nobody has ever managed to get without an explanation.

At the ISP I used to work for, all the servers were named after casino- or card-games (hearts, craps, lotto, keno, baccarat, poker...) Late in my employment there, we started running out of names, so one of the Cisco RAS boxen got stuck with the name ‘genting-wheel’.

One morning at that job I got a phonecall from my brother, who is a journalist for the West Australian newspaper. Apparently, the ISP across the road (to whom we supplied all their bandwidth by running an ethernet cable under the street, and who were constantly getting us on the MAPS RBL by hosting spammers) had been hosting an illegal online casino. Because our nameservers were called “keno” and blackjack, we had to spend a lot of time explaining that no, we weren't involved.

When I moved across in the ISP to do web conslutting, I got to name the servers, and I chose to name them after characters from Hitch-Hikers Guide to the Galaxy This was done purely for the amusement value of having a host named “Slartibartfast”.

At my current employer, all our boxen are named after heavenly bodies—generally stars, planets or constellations. However, I maintain that calling my next box “Sarah-Michelle Gellar” would fit perfectly into the naming-scheme.

Jamie Zawinski talks about the joys of computing:

And -- let me emphasize -- I do not enjoy this! Oh sure, you say, why do you keep doing it? I don't know. I think I still enjoy writing software, usually. But what I end up spending almost all of my time doing is sysadmin crap. I hate it. I have always hated it. Always. If you made a Venn diagram, there would be two non-overlapping circles, one of which was labeled, "Times when I am truly happy" and the other of which was labeled, "Times when I am logged in as root, holding a cable, or have the case open."

It seems, Michael Palin is not only really, really, really nice, he has interesting opinions about fish.

Fish are funny, he says, in a way dogs and cats are not. “There is just something about fish. And their silly names. Halibut.” And haddock? “Haddock is very, very funny.” Pilchard? “Hilarious.”

Dave Winer thinks that because he's not ranked where he wants to be, “Google's algorithm is quirky, or the implementation is buggy, or both.” I've also noticed that I'm no longer the first non-Apple search result for Janie Porche, and I've even fallen to third-place on naked desktop people (although amusingly enough, the top two are Ugo Cei linking to me).

Google must be in a tough situation. When they started out, PageRank was a really neat idea, and worked very well. What it wouldn't have counted on was the huge increase in small, insular communities who do nothing but link to each other every day. Look at the Java-blogs community, for example. Half the entries I read are links to other blogs I'm subscribed to. (Which is a good thing, each post accretes a little more information, a pearl solidifying around an irritant in the blogosphere)

While there are a lot of weblogs, we're still a tiny percentage of the population of the web, and even ‘popular’ weblogs still only have a few thousand visitors a day. Thus, Google are probably doing quite a lot of work behind the scenes to reduce the inordinate influence blogs have on their page rankings.

And more power to them.

A year ago, Mark Pilgrim was fired because his boss wanted him to stop writing in his weblog. His reply was, in part:

Writers will write because they can’t not write. Repeat that over and over to yourself until you get it. Do you know someone like that? Someone who does what they do, not for money or glory or love or God or country, but simply because it’s who they are and you can’t imagine them being any other way?

Design Advice

  • 4:55 PM

In web pages, as in many other pursuits, anything that is ‘nifty’ will become less so with each viewing, and will probably become annoying in the end. Elegance, on the other hand, never gets old.

From Bryan Dollery on the Extreme Programming mailing list:

If business leaders were interested in money then they'd use JBoss overWebSphere or WebLogic. The three products are, for most uses, identical -but JBoss is free while Web* can cost around $50,000 per processor. If money is that important, why do these products sell?

I make a living as a Websphere consultant, so I'm going to put my devil's advocate hat on. Please note that I really do like JBoss, I just don't get paid to use it. I don't want to start a ‘My Appserver is better than yours!’ battle, I just want to demonstrate that there are reasons that Websphere and its ilk exist in the marketplace, even though its free (or at least orders of magnitude cheaper) competitors are generally more up-to-date with the standards, easier to use and faster.

I can't speak for Weblogic, having never used it in anger, but as far as Websphere goes:

  • Trust is an asset (1). In the J2EE Container Shootout, JBoss's Marc Fleury said that of his competitors he'd either choose Orion because it's superior technically, or Websphere “because IBM will be around for ever and ever - it is always a safe choice.” As good as Open Source support generaly is, a support contract is a far more tangible asset.
  • Trust is an asset (2). JBoss is around the same place Linux was five years ago. It's the same Catch-22. To be trusted, it has to be seen running critical applications, but to be deployed in critical applications, it needs to be trusted. Linux managed that through two prongs, firstly by being deployed in thousands of Internet providers who didn't want to pay for commercial Unices any more, and secondly by system administrators sneaking Linux boxes in while nobody was looking, so a year later they could say “Yeah, we run Linux, it's been delivering your mail the last twelve months without a hitch.”
  • Scaleability. Clustering is a very new feature in JBoss, and it's hideously under-documented. I'm told it's very good, but all I can get from the JBoss site is “We have clustering, but to find out anything about it you have to buy our book”, which doesn't fill me with confidence.
  • Documentation. Giving away the product and then hoarding the documentation is just plain stupid, not to mention being very anti-GNU. Remember, software is only free if your time has no value, and the time of most IT contractors is very valuable indeed. Hiding information on how to use your product adds a high hidden cost to its use. Contrast the JBoss clustering book with the Workload Management Redbook for Websphere 3.5, free for download, and 600 pages long.
  • Bundling. This is the biggest reason. Most people who buy Websphere don't buy it in a vacuum. They're buying a package deal of hardware, software and consultancy. The cost of the individual Websphere licenses generally become part of a big lump-sum that covers the entire project from conception to post-deployment support. That big sum gets negotiated, and divided up internally, but to the customer it's only really the big number that matters. And bluntly, if it weren't for the software margins, the programmers would be far more expensive.