December 2003

« November 2003 | Main Index | Archives | January 2004 »

31
Dec

Valid HTML Woes

  • 1:02 PM

SGML (and by extension, HTML), can be rather annoying to deal with. I am beginning to understand why everyone was so happy when XML turned up and started to supplant it. For example, did you know that the following is a completely valid HTML document?

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<head>
   <title>I am a fish</title>
                                                                                
<p>Blah

Run it through the W3C validator if you don't believe me. That page will earn you a nice "Valid HTML4" badge. The closing of the <head> and <p> tags and the entire <html> and <body> tags are all implied by the positioning of the other elements in the document. When you run the document through an SGML parser, it should insert all the ‘missing’ bits for you, so when you look at it through the parser, it magically becomes this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
  <head>
     <title>I am a fish</title>
  </head>
  <body>                                                                                
    <p>Blah</p>
  </body>
</html>

A more relevant annoyance is the handling of <foo />. In XML (and in the weird hybrid XHTML that people are encouraged to write to be backwards compatible with modern browsers), <foo /> is assumed to be equivalent to the empty tag <foo></foo>. Unfortunately, this works because web browsers aren't SGML parsers. In SGML, <foo /> is considered equivalent to <foo >>.

This would be OK if it were just XHTML documents that contained the <foo/> notation. XHTML is required to be valid XML, so you can recognise it and run it through a straight XML parser instead. However, it's started to creep into regular HTML documents as well, through cut-and-paste page writing, developers' finger-macros and guru incantations adopted half-understood by willingly ignorant acolytes. So if you pass one of those through an SGML parser, you'll end up with all sorts of extraneous greater-than signs lying around the place.

This is far too much to think about on New Years Eve. Maybe I should go somewhere, get drunk and watch some fireworks.

(Random note. My site doesn't validate. Nor do I expect that it ever will. There's a certain effort/reward ratio involved in maintaining a valid site, and for me, the reward simply doesn't even approach the required effort. Especially considering the work that would be required in going back through over a thousand old entries and giving them valid, semantically appropriate markup.)

What the subject says, really. Eclipse and IDEA both have really cool templating features for new classes to set up things like default doc comments and copyright notices and so on. Unfortunately, they set them up on a global basis, not per-project.

This is a Bad Thing, because different projects rarely use exactly the same templates.

Of course, maybe I'm just embarrassed because I was working from home yesterday, and managed to check a bunch of files into CVS with my standard boilerplate "This is my code, why are you even reading it?" license at the top.

Since the web was in its infancy, HTTP has come with a built-in authentication protocol that was simple, if not particularly secure on the wire, supported by all browsers, and almost never used. Even with its security improved by digest authentication, pretty much every major website ignores HTTP Authentication in favour of a form-based approach that is in many ways worse. Why don't we like HTTP Authentication?

  1. Jarring User Interface

    The login dialog, as implemented by web browsers, looks like part of the browser, not part of the site. The only explanatory text is what you can shoehorn into the realm-name, something that field was not intended to do. In addition, it's impossible to present the user with alternatives to logging in while they have the (at least page-modal) login dialog in front of them. You can not give them the option to sign up for a new account or request a password reminder beside the login form.

    All this is just confusing and annoying to the user. Worse, it's ugly: a login form embedded in and styled with the site is orders of magnitude more aesthetically pleasing.

  2. Authentication Is Not Optional

    Either a page requires authentication by returning a 401 response code, or it does not. Many websites, on the other hand, allow users to access pages whether they are logged in or not, controlling instead what content appears on those pages based on login status. Thus, HTTP Authentication is unsuitable.

  3. No Logout

    It's really annoying and fiddly to code a logout option for HTTP Authentication, and you're never really sure the browser will take the hint. Configuring a time-out for idle users is just impossible.

  4. Poor Server Support

    Historically, web– and web-application servers have dealt with HTTP Authentication badly: making it fiddly to configure, hard to plug in alternate providers, and hard to integrate with your applications.

On the other hand, there are certain things that are right about HTTP Authentication

  1. Browser Support

    The browser understands, at the protocol level, how to determine if a page requires authentication, and which credentials to send. Modern browsers give users the option to remember credentials, and seamlessly remain logged into the site between sessions.

    The alternative, as we've all probably had dealings with, involves messing around with user sessions, and putting the user at risk with authentication tokens buried in cross-site-scripting-vulnerable persistent cookies.

    Just as annoyingly, there's the idle timeout problem. We sometimes want to time idle user logins out (for their own safety), but we really have no idea whether the user is idle or not. We've all had that nasty experience of spending a little too long filling out a form, only to lose everything we entered when we find out we've been logged out when it came time to submit it.

  2. Simplicity

    It is far, far easier to build a tool that can perform HTTP Auth, than it is to build one that can navigate the myriad login forms that clutter the web. For example, consider a password-protected RSS feed. It would be easy to write an RSS Newsreader that could retrieve it if HTTP Auth were used, much harder if anything else were put in its place.

My solution is pretty straightforward. Back-port current practice (optional login and in-page login forms) into the standards. First, provide for optional authentication in HTTP:

WWW-Authenticate

The server MAY include a WWW-Authenticate header (as defined in RFC2617) with any successfully retrieved (2xx response code) document. This header denotes that the document was retrieved, but further information may be available if the user authenticates to the realm provided in the header.

On receiving a WWW-Authenticate header with a 2xx response, any user-agent that has credentials cached for the realm SHOULD repeat the request, including those credentials. If the user-agent has no credentials cached for the given realm, it SHOULD NOT interrupt the delivery of the response to the user, but MAY provide some indication that the page accepts authentication, and some mechanism to enter credentials.

As in RFC2617, the user-agent MAY preemptively send the same credentials for any resource located at a URI beneath the one at which the WWW-Authenticate header was received.

Then, provide for HTTP Auth from within web pages.

When a FORM element has a method attribute value of "Auth", this defines a form for providing HTTP Authentication credentials. Its action attribute value is taken to be the name of an HTTP Authentication realm as defined in RFC2617. Authentication forms may contain the following form input elements:

  • username, which will be the username for the login credentials.
  • password, which will be the password for the login credentials.
  • timeout, which will be an optional idle-timeout (in minutes) for the credentials provided in this form.

    The exact definition of idleness is left to the user-agent, but is roughly defined as the amount of time since the user last interacted with a page that required authentication to that particular realm. If timeout is zero, or this field is not provided, the login is assumed to never time out.

On submitting an "Auth" form, the user-agent should cache the credentials for the given realm. If the page on which the form was located was part of that realm and new credentials have been provided, the page should be re-requested with the new credentials. If the page on which the form was located was not part of that realm, it MUST NOT be requested with those credentials, for obvious reasons.

An "Auth" form may also have an optional input element of type logout. User-agents should render this element in the same manner as inputs of type cancel. Selecting this element will cause the user-agent to forget all authentication tokens for the given realm.

Voila. Now all we need is server and tool support, and we're on our way back to a better web.

Update: Zoe points out that the authentication form has appeared before in a 1999 W3C Draft. Which just goes to show that few good ideas are ever original. Shame it was never implemented though.

Unlike most countries, who have had this movie for a week or two, Return of the King opened in Australia today. Traditionally, I've been going to the Lord of the Rings movies with my family, and with my brother in town today, this was no exception.

In no particular order:

  • This movie kicked more arse than a platoon of arse-kicking machines set on overdrive.
  • Sam and Frodo really should have just shagged and got it over with.
  • Too much 'meaningful' slow-motion. Especially of the aforementioned hobbits staring into each others eyes. Not only did it feel manipulative, if you'd run more of the slo-mo stuff at full-speed, the movie would have been substantially shorter (and maybe you could have kept Saruman in)
  • My mother, when questioned about her preferring Legolas to Aragorn, replied "It must be the pointy ears". I'm starting to wonder if the elf thing is hereditary.
  • Sam spent almost all of the movie crying. A little more contrast of moods might have been good, but I guess there's only so much you can do with the source material.
  • While the ending was drawn out, the story had to end with the departure of Frodo into the West, and there's little of the stuff in between that could have been cut out without making the departure meaningless.
  • I was surprised to learn Aragorn hadn't had Anduril for the last two movies. I'd just assumed the sword had been one of those details that got skipped. I dunno, I just thought the reforging sequence was dead weight. The movie would have worked just as well if he'd had the sword with him all along.
  • Aside from that, most of the other changes they made to the books were pretty justifiable in terms of making the movie work better.
  • Particularly, the decision to be up-front about who Eowyn-in-disguise was from the start was a good one: Miranda Otto wouldn't have made a convincing bloke, and the effect would just have been comical.
  • As visually impressive as the beacon sequence was, I was left wondering how the hell they had people at the top of those really high mountains all day manning them.
  • When going to see a three hour movie, buy the small drink. I learned this lesson years ago, but from the rash of departures around the two-hour mark, a lot of people didn't realise that most people's bladder capacity is somewhat less than one of those big-gulp coke cups.
  • The movie really did kick arse.

It took me possibly a little too long to work this out, but I thought I'd better share this advice with the world.

If on a major holiday or birthday you buy a girl a gift from The Body Shop, you may as well be writing "I didn't have the faintest clue what to get you, so I ended up taking the path of least resistance" on the wrapping paper. Because that's what the gift be interpreted to mean, and lets face it guys, that's exactly why you ended up in The Body Shop in the first place.

Hope this helps, and may you all have a very happy holiday.

Stop Words

  • 8:56 AM

The iTunes music store, like most search engines, ignores certain stop words -- words that are so common that they'll rarely be useful in a search. Now: how do you search for seminal 80's band The The?

(The answer, of course, is to search by song and then backtrack. But it's an interesting side-effect of what is otherwise a pretty sensible idea.)

A cow orker and I once tried to work out just how many different languages you had to know to understand all of a J2EE project. Between programming languages, markup and templating languages and a thousand different flavours of configuration files (each a language in its own right), the assumed knowledge for a pretty simple project was pretty daunting.

And we weren't even using Jelly or Groovy.

You don't see nearly so much of this with other languages. Not just the proliferation of different languages, but also the proliferation of configuration files. Everything that's optional or that needs to be easily changed (scripted) gets dragged out into something that isn't Java code, where in other languages it would much more likely be written in one language, and programatically configured.

A configuration/scripting file makes sense when it's one component and one config file. But when you start gathering components together, you end up with one or more configuration files per component, and things start to get brittle and fiddly: especially if you want to make regular switches from one configuration mode to another (for example testing and production), in a way that crosses a number of different files.

It's tempting to say that this is because of a flaw in Java: there's something wrong with the language that makes people want to get away from it. This could partially be true, certainly I'd love to switch regularly to a dynamically typed language with closures1, but it'd be lazy to say it's the real answer. For all Java's flaws, it's still a relatively clear and flexible language. A little verbose, perhaps, but not that much. And certainly not compared to, say, anything written in XML.

As far as I can tell, there's just a firm cultural belief amongst Java developers, perhaps influenced by its C ancestry, that anything written in Java code is inviolate: set in stone. If there's anything that could possibly ever change, it should be immediately externalised. If there's a process that might need to be configured, it will need to be done in its own scripting language outside the Java code. Programmatic configuration is bad, because it's putting something that might change inside the code.

This is rubbish. Java is a nice readable and writeable language. It's been designed to have small, independent compilation units so the compile cycle is short. It doesn't suffer from the "fragile base class" problem. Changing a Java class really isn't that big a deal most of the time.

But we feel the Java code must remain inviolate, and anything that might change must live outside, where it's safe to put volatile things.

Of course we don't expect end-users of an application to configure it by changing the code and recompiling. And anything you want to change during deployment needs to be external as well. But really, with the number of configuration files that a Java server-side app accumulates, maybe one tenth of it is actually stuff you'll eventually want exposed to the end-user. And rather than spread that tenth across a bunch of different configuration files, you're much better off if you're able to write some kind of code that centralises it in a way that makes sense for your application, rather than for the various libraries it's making use of.

Update: Daniel Sheppard contributed some nice tips on the subject.

1 ...because I know someone will at least be thinking this. Anonymous inner classes are not closures. They do a reasonable (if ridiculously verbose) job of faking it, but that's all.

Overly Mocked

  • 10:55 PM

As far as Unit Testing goes, mock objects are a pretty useful addition to the toolkit. In a componentized system, by mocking out all the other components, you can isolate the one you're testing that much more effectively. Mock object frameworks exist to make the job a little easier: providing simple ways to provide substitute objects, and record and predict method calls.

Unfortunately, once you get in the mock object habit, it becomes too tempting to test things that you really shouldn't be testing. Because you can test to a granularity of which methods get called on which components, because it's easy, you find that you do it as a matter of course.

Often, you shouldn't be. Let's get down to what a unit test really should be. A unit test should predict some observeable effect, and test that it can make that effect occur. How the object makes the test pass is irrelevant, so long as the test passes. Mock object testing looks a little too closely at what lines of code are actually going to be written to make the test pass. Testing lines of code directly, rather than the effects of those lines of code, can lead to irrelevant and hard-to-maintain tests.

If you start writing mocks to predict (and test for) every call to the objects that are being mocked out, you end up with fragile tests that break trivially as the result of refactorings that change how the object being tested does something, without changing what the object does.

But that's not all.

Most of the systems I've seen over the last few years have been written in layers. Down the bottom, you have components that do the important stuff, but don't talk to other components much. The layer above coordinates between the lower-level components, but doesn't do much interesting itself apart from that delegation.

So what happens if you test those higher-level components with mock objects? Well, you sort of end up with tests that do absolutely nothing but test a series of method calls on mock objects. This is, I would suggest, worse than useless. What are you testing, really? You're not putting forward any kind of verifiable hypothesis about what the method you're testing should be producing: you're just writing it twice, once in the test and once in the object being tested.

If you get the series of calls wrong in the test, you'll get it wrong in the object too. You have exactly the same margin for error with and without the test, you've just moved the risk around a bit.

It pays with mocks, therefore, to always keep in mind the question: "what exactly am I testing?"

Sick

  • 10:16 PM

I'm sick again. This is starting to freak me out. Normally, I'm the sort of person who doesn't take a sick day for years, but looking back recently, I seem to have been coming down with exactly the same ailment every couple of months for the last year.

So I'm going to have to take the radical step of seeing a doctor. My guess is that it's a resurgence of the tonsilitis I managed to get rid of back in 1997: the doctor told me it might be back some day. Ho hum.

One of the more... interesting aspects of this particular ailment is the delerium. Each time I get it, I end up spending about a day (on and off) in bed with very little control over my own brain. Being a delerious computer programmer is weird. This time, I kept believing that I was a program, and desperately tried to search through my own source-code for the bug that was causing me to be ill.

It's a pity you can't do that. If I had access to my own source-code, and could recompile myself whenever anything went wrong, life would be so much easier.

Update: Went to the doctor. Turns out it was my sinuses, not my tonsils. I'm on a two-week course of bastard-strength antibiotics that should kill the bug off dead instead of leaving it hanging around to come back in a few months time.

I ended up sitting in the waiting room for half an hour beyond the time my appointment was scheduled for. Given that I was really not very well, this was not a particularly pleasant experience to say the least. I would have probably been waiting there another half hour, but there was a fortunate mix-up with another patient called Charles who had mysteriously ducked out for a moment. Score.

By the time I got in to see the doctor, my fever was running at 39.4°C (103°F). I got this look that I'm sure meant "wow... and you're still conscious?"

If it happens again, I get to have lots of tests on my immune system, plus a CAT-scan of my sinuses to see why they're getting blocked up. I have this mental image of the conclusion to the latter test that goes something like this:

Wow! My bicycle! I lost that when I was five!

One of the perils of maintaining a suite of unit tests is the need for each test to be run against some stable baseline. Any resource that is changed by a test must be cleaned up afterwards, so that when you run all your tests as a single suite, they don't interfere with each other.

You can go a long way to avoiding this by using component models that allow you to replace components with mock objects and with in-memory databases that can be completely disposed of between tests, but there's always something that can leak, and if it can leak, Murphy's law kicks in with a vengeance.

A test that doesn't clean up after itself is a menace because if it causes an error, the error will only be apparent in the next test that relies on that particular resource... which means a long, frustrating search backwards through the test suite to find the last person to play with that particular piece of code.

Enter the LeakDetectorTest.

In the JUnit extensions package lies the TestDecorator. Like the name suggests, a TestDecorator wraps around a test, allowing you to extend it without changing the original class. Decorators allow you to add time constraints to a test, or repeat a test a certain number of times, or in the case of the LeakDetectorTest, add a post-condition check to each test class to see if it was the one that had forgotten to tear down properly.

The only problem, of course, is that the decorator has to be applied to every single test class. Which means one gigantic suite() method that would be tedious to write, and even more tedious to maintain. In a more dynamic language I could have temporarily added the check to the TestCase superclass, or applied an aspect to every test* method instead, but neither were possible in this case.

This is why every programmer should know at least one scripting language.

It took about a minute to write a Ruby five-liner that would generate my test suite for me, populating it with every test class in the project, each decorated with the LeakDetectorDecorator. Once you've got the script, it's a simple matter, whenever you find something isn't being cleaned up when it should, of building a new decorated suite to track down the leak, and fixing it.

Happy birthday to me,
Happy birthday to me,
Two years 'til I'm thirty,
Fuck fuck fuck fuck fuck fuck.

I turned 28. I have absolutely no words of wisdom for you to mark the occasion, so I will leave you with a quote from the narrator of that great TV show from my childhood, Monkey

"The pilgrims still have as far to go as they have travelled. What end can there be to a journey as long as life? What end can there be to life? It is very hard to want nothing and to move on endlessly, and it is very easy. There is no end. There is one life, one pattern, and this is the pattern which is being followed."

Moving On...

  • 8:07 AM

Friday was my last day of employment at Cirrus Technologies, the firm at which I've worked for the last three and a half years. They're a terrific bunch of blokes, and great developers.

I'd like to thank the Three Amigos: David, David and Jeff, for employing me on the strength of a phone interview from Perth that I was sure I'd screwed up beyond belief.

To single (triple?) out my three co-bloggers from Sydney: Alan, David and Keith -- I've learned a hell of a lot from the three of you, and had a lot of fun doing it. Even those times the project itself wasn't particularly enjoyable, there was always something positive to get out of it.

And to everyone else I worked at Cirrus with: Caragh, Chris, Daniel, David, David (yes, these are all different Davids), Gavin, Mark, Neville, Pete, Reynaldy, Richard, and the two whose names I can't remember right now and am kicking myself over, thanks for three great years.

Update: Ack! And Leon and Shane. Neither of them being one of the two people I was thinking about above, but couldn't remember the names of. For some reason, they were just missing off the association list I used to pull together the names. For which I apologise profusely: my brain really sucks.

Robert Martin's Artima article: Debuggers are a wasteful Timesink:

Since I started using Test Driven Development in 1999, I have not found a serious use for a debugger. The kinds of bugs I have to troubleshoot are easily isolated by my unit tests, and can be quickly found through inspection and a few judiciously placed print statements.

...

I consider debuggers to be a drug -- an addiction. Programmers can get into the horrible habbit of depending on the debugger instead of on their brain. IMHO a debugger is a tool of last resort. Once you have exhausted every other avenue of diagnosis, and have given very careful thought to just rewriting the offending code, *then* you may need a debugger.

Ron Jeffries, describing Kent Beck in the "Practices of the C3 Project":

Something goes wrong. The code doesn’t work. You start to think: "What could cause that to happen?" Kent doesn’t think about what the problem is. He just sets a halt in the system and lets Smalltalk tell him what the problem is.

Sometimes you’re right about what the problem is. If you’re really quick you’ll be able to tell Kent [Beck] what to edit in the window he’s already looking at. If you’re really quick.

Sometimes you’re not right about what the problem is. Forget it, he has already fixed it.

Train yourself to think about where to put the halt, not to think about what the problem is. Of course it’s a great feeling when you can reason to the problem. But we’re not here to make our brains feel good, we’re here to get the code working as quickly as possible. Setting the halt andletting Smalltalk tell you will help you build working code faster.

For fear of being called a blinkered XP groupie, I'm going to have to side with the second quote here (and with other Smalltalkers who have chimed in, despite never having programmed in the language myself)

If you have a problem: your tests have gone red, or you've been handed a bug report, you have two tools available to you: static analysis, or dynamic debugging. With static analysis, you look over the code, and in understanding what it is doing, you find the logical error that is causing the bug. With dynamic debugging, you step through the code until you can see it start going wrong, and then you fix it. In a good debugger, you can then hit continue after you've finished editing, and it'll all work.

Static analysis can work, and can be fast. Unlike Beck, I would give the offending code a once&ndash or twice-over eyeball before switching to the debugger; in case it's something obvious I missed. That said, if it's that amenable to static analysis, why on earth did I write the bug there in the first place? I'm not always that sloppy. Often, the problem requires a deeper look. That's where the debugger can save you time.

Adding a few printlns can also be useful, but its a very limited technique. Unless it's tracking the progress of a variable over the course of a long-lived loop, I have no idea how one can justify putting in a println as being more useful than setting a breakpoint at the same line, and then being able to examine all available variables, and evaluate arbitrary expressions in that context. Most of the time, the first thing you try to println is the wrong thing, or in the wrong place. Far better to be able to refine your investigation in the one pass than having to change the print statement, recompile and re-run.

I find, however, that the quality of the debugger is the biggest decider on whether I do static analysis or full debugging. In VisualAge for Java, I would spend far more time in the debugger than I have since in Eclipse/WSAD. Things that worked reliably and helpfully in VAJ&mdashrewinding the stack , fix-and-continue, object inspection, expression evaluation—only seem to work reluctantly or sporadically in Eclipse's front-end to the standard Java debugger.

If I'm reluctant to use the debugger, it's not because I don't want to use it, it's because I want it to be a better debugger.