January 2003

« December 2002 | Main Index | Archives | February 2003 »

30
Jan

So there's yet another article online that claims to discuss a downfall of XP. And once again, it follows a very familiar formula:

‘x’ is a property of Extreme Programming. (For some value of ‘x’ including pair programming, short iterations, doing the simplest possible thing, or having no upfront design) We did ‘x’, but it didn't work, so XP sucks.

Someone on the XP mailing list was a little less generous:

Oooh, my project failed! It was vaguely XP-like, so I'm going to blame XP, instead of myself, and then run at the mouth on the net!

In this case, the article talks about the dangers of short delivery cycles when it comes to manual testing. XP has already thought of that, and insists on the full suite of automated tests that the article's author admits would have been the most effective solution to her problem.

XP is a set of twelve practices. They're balanced against each other. One of the biggest difficulties involved in doing XP is the discipline involved in keeping the more difficult practices going. Getting significant customer involvement is hard, but you really need it if you're not nailing down all of the specs before-hand. Coding all the automated tests can be tedious, but it's necessary in the absence of a long testing phase. Refactoring is really hard to justify to management because it looks like time spent doing nothing, but it's vital in the absence of up-front design. Lots of people aren't particularly happy about Pair Programming, but it keeps the code quality up, increases team awareness of the codebase, and helps prevent an individual from forgetting the other practices.

People expect Extreme Programming to be undisciplined. That's not true. People seem to want to label any undisciplined process as being ‘agile’ or ‘extreme‘. That's not true either. Writing an article of the form above, and sticking an inflammatory headline about Extreme Programming on it gets you lots of page-views. That, on the other hand, could be true.

But it's bad. There are real problems with XP, but when it's done properly, it's still one of the better processes I've worked with. The whole straw-man backlash thing not only helps foster an unnecessary prejudice against XP, the sensationalist debate from ignorance stifles any informed (but perhaps less headline-worthy) debate that may be waiting in the wings.

Dear Cityrail

  • 9:46 PM

Dear Cityrail

On Friday January 17th, 2003, I forgot to renew my weekly train ticket. Since I boarded the train at Newtown, which has no ticket gates, I had no idea I had forgotten until it was too late. I attempted to reason with the gate attendants, but they made it clear to me they had no choice but to issue me a $100 fine.

On Wednesday January 29th 2003, I boarded a train from Central that was clearly posted on the platform indicators as an all-stops train, stopping at Newtown. When the train failed to stop at Macdonaldtown, the Cityrail security employees on the train were informed of the error. Despite obviously carrying radio transmitters capable of contacting someone with authority, they said there was nothing they could do about the situation. As a result, I ended up travelling several stations past my destination, waiting fifteen minutes on the platform for a return train, and travelling back again.

Everybody makes mistakes. My fine is attached. I believe we're even.

Yours sincerely,

Charles Miller

Mark Paschal takes objection to a post by Dan Benjamin about Objective-C vs Java.

Objective-C has a lot of things in its favour. It's dynamically typed. It's enormously less picky You can add methods to existing classes. You can drop back to straight C if you want to, which many people consider a benefit but I was never a big C programmer. If you're on OS X, you get to use Cocoa, which kicks ass. But, I'll have to agree with Dan on one point. To quote Jamie Zawinski:

Java doesn't have free(). I have to admit right off that, after that, all else is gravy. That one point makes me able to forgive just about anything else, no matter how egregious. Given this one point, everything else in this document fades nearly to insignificance.

Once you're used to automatic memory management, being forced to do it all manually is a real hassle.

Every time I post to a large mailing list, I inevitably get a huge pile of nice messages from some guy called Mr MAILER-DAEMON, telling me that half the people on the list don't actually exist any more. I also get quite a few messages from helpful Notes systems telling me that some of the list subscribers are on holiday. Which I really care about. I do.

But today, the following bounce-message won the award for Mailing-List Stupidity of the Year (Well, the year so far. Which really means this month.)

      This is a system generated message.   
   * Your message has NOT been delivered *

This mailbox is protected with an email password system, to have your email delivered please resend the message and include the string BLUEHILL in the subject. Thank You!

Thank-you! Thankyou for subscribing to a mailing-list, and then bouncing every message you receive from it because you don't trust it to send you mail. You GENIUS!

Responding to Andy Oliver's Why C# is better than Java, part two. 6-10

  1. Structs In C#, a struct is a not-quite-object that lives on the stack and can't do inheritance. I assume structs get passed by value too? Hello, serious bug pattern. This adds a great deal of “When is an object not an object? When it's value type that can't be extended!” complexity, which isn't worth the pretty trivial gain.

  2. Substitution Parameters java.text.MessageFormat has existed since the 1.0 days.
  3. #define For the example given, look up Java's compilation. For other uses of #define, the preprocessor is a hack. There's generally a more elegant solution staring you in the face.
  4. Verbatim Strings These are nice syntactic sugar. Notice how the only things that pass the test of really being advantages of C# over Java are “nice syntactic sugar”? If you really cared, you could hack verbatim strings into the Jikes compiler in an hour or two.
  5. Foreach Evil syntactic sugar. Foreach is a lame, feeble excuse for not having blocks/closures. It embarrasses me that Java is planning on copying this mistake.

I normally don't get into these language dick-size wars, but this was amusing.

Responding to Andy Oliver's Why C# is better than Java, part one. 1-5

  1. The Filesystem: This is, at worst, a very minor annoyance. I work in an IDE, so the base unit is the class anyway, not the file.
  2. Case-sensitivity: In a world where you must interoperate with both case-sensitive and case-insensitive filesystems, being case-sensitive is the safest way to avoid major tangles later on. Such as when you MADETHEMBIG (as Oliver suggests), and then try to check them back into the case-sensitive CVS. Really Bad Move.
  3. Hashtable syntax candy: Being able to overload [] is, I admit, pretty neat. But it's pretty mild syntactic sugar.
  4. Less reluctance to change the VM: Big, big problem for C# and .NET in general. If Microsoft runs true to form, incompatible VM updates will be used as a goad to force upgrades (“Sorry, you need CLR 553 for that, which is only supported on Windows 2010”), and to keep projects like Mono chasing after a moving target.

    In addition, .NET is substantially younger than Java. When Java was the same age as .NET is now, it was making pretty big changes to the VM, for example introducing the concept of nested classes.

  5. Spec of the Day The examples Oliver gives are stupid. Of course, email, MOM and XML messaging are different specs, they're totally different things. Why don't we use SMTP for middleware messaging? Oh, because e-mail is best-effort store and forward, while MOM is reliable pub-sub.

    On the other hand, any perusal of the JCP site will find a bunch of specs that should never have been proposed. Do we really need a standard Sun-mandated API for rules engines? Should Java Server Faces really exist when there's so much competition happening in the real world to try to create better web frameworks?

    Once again, C# is young. Think C# is going to avoid having spec-of-the-day disease? You're on crack. Microsoft is famous for it.

    Think of the history of data access strategies to come out of Microsoft. ODBC, RDO, DAO, ADO, OLEDB, now ADO.NET - All New! Are these technological imperatives? The result of an incompetent design group that needs to reinvent data access every goddamn year? (That's probably it, actually.) But the end result is just cover fire. The competition has no choice but to spend all their time porting and keeping up, time that they can't spend writing new features. —Joel Spolsky, Fire and Motion

SQL Worm Irony

  • 10:24 AM

Microsoft's Homepage, on the day the SQL Server worm crippled large sections of the Internet:

Today's News: Bill Gates reports on security progress made and the challenges ahead.

On Bugtraq, Jason Coombs wrote:

As of now we don't know who wrote the worm, but we do know that it looks like a concept worm with no malicious payload. There is a good argument to be made in favor of such worms.

There are many arguments against them, too.

  1. They have the potential to seriously disrupt delivery of important services.
  2. It takes one bug in the worm to turn it from "mostly harmless" into "crippling", and nobody has a spare Internet to test a worm on before releasing it.
  3. They cause enormous problems even for those who are not directly to blame. Consider a co-location. You can keep your own systems up to date, but that's really no consolation if some idiot in the same facility hasn't patched their SQL Server boxen, and they melt the router.
  4. They feed a cycle of short, sensationalist incidents that target a single vulnerability, and then fade into the background.
  5. It feeds the myth that it is "good enough" to be reactive when it comes to security.
  6. It has no appreciable long-term benefit. Last year, it was Code Red and Nimda. Everyone patched their IIS servers. There was the Apache mod_ssl worm (to a lesser extent) that reminded everyone to patch their Apache servers. This year, there's Sapphire, and everyone patches their SQL Server boxes. Next vulnerability, next worm, even if it's in IIS, Apache or SQL Server again, will catch the same people, again.

The solution isn't defensive worms. The solution lies in the recognition (seldom expressed, lest we later regret it ourselves), that the failure to patch a seven-month bug is negligence, the failure to firewall non-essential open ports on network servers is negligence. In other matters, the failure to implement egress filtering is negligence. We could probably come up with a pretty good baseline of what is obvious systems administration negligence when it comes to security.

Few worms exploit vulnerabilities that are new and unknown. Most exploit those that have been known for months. That it is cheaper for negligent administrators to wait until the worm hits, suffer a day of disruption and then fix the problem du jour is simply unacceptable. The only solution, however, is to somehow make it more expensive to be negligent than it is to be diligent.

This is difficult. Tort law really isn't very good at handling cases where a lot of people each do small amounts of damage to a lot of other people. Even though the aggregate effect is significant, you can't really put your finger quite firmly enough on who did what to whom. And since the Internet is decentralised, you can't be slapped down by some authority in charge of keeping the 'net healthy.

There's always the doomsday scenario. Maybe if this worm had caused major data-loss, there would be some lasting effect? Or maybe the admins would have just restored from backup. Who knows?

I was wondering why my network connection was running so slowly, and why my modem was blinking when I wasn't really doing anything, so I pulled up tcpdump... and discovered thousands upon thousands of UDP packets coming in trying to find something on my home network (three machines on 8 publicly routeed IPs) that responded on port 1434.

Looks like there's a new Code Red, except this time it preys on Microsoft SQL Server. (1434 is SQL Server's "server resolution" port, for which a remote root exploit was discovered in July last year.) My network connection is being hammered, I can only imagine the rest of the Internet isn't faring very well either.

Hello, Internet-wide denial of service attack. Fuck you, Microsoft. Fuck you, incompetent server administrators who are not only too lazy to get off their stupid asses and upgrade a broken piece of software, but are too fucking clueless to put their servers behind a firewall and block access to administrative ports.

Categorisation

  • 11:26 AM

Jorge Luis Borges, The Analytical Language of John Wilkins:

The ancient Chinese encyclopaedia entitled Celestial Emporium of Benevolent Knowledge classifies animals into the following categories:

“a) those that belong to the Emperor, (b) embalmed ones, (c) those that are trained, (d) suckling pigs, (c) mermaids, (f) fabulous ones, (g) stray dogs, (h) those that are included in this classification, (i) those that tremble as if they were mad, (j) innumerable ones, (k) those drawn with a very fine camel's hair brush, (l) others, (m) those that have just broken a flower vase, (n) those that resemble flies from a distance.”

Enh...

  • 9:21 AM

They're creating a hilarious new movie about Websphere developers. It's called “Dude! Where's My Classpath?”

I've always been amused with those online quiz things, you know, the ones where you fill out ten completely irrelevant questions and they tell you which Spice Girl you are. Sometimes, I even do them myself, although I'm happy to say I don't bore any of the rest of the world with the results. (I'm too busy boring the world with my geekiness instead.) Anyway, I took a “personality test” this morning. Based on things like what I dream about, what position I sleep in, and how I stand when I'm talking to people, this was my result:

Others see you as fresh, lively, charming, amusing, practical, and always interesting; someone who's constantly in the center of attention, but sufficiently well-balanced not to let it go to their head. They also see you as kind, considerate, and understanding; someone who'll always cheer them up and help them out.

What a load of utter bollocks.

Where I live

  • 11:15 PM

Dan ‘DJB’ Bernstein can be a dickhead, but you have to respect software of the quality of qmail, and you have to respect his amusing fight against his university's patent policy. Read from the top down. (Link from Boingboing)

Dear Operating System Vendors.

I no longer want to know where my files are stored. I no longer care. I have hordes of directories on my various computers called stuff, downloads and documents, and the effort that it would take to organise them into a proper heirarchy is just not worth it. The heirarchical filesystem is a really wonderful thing for programmers and websites, but it just doesn't cut it for personal use.

I no longer care where my files are stored. Here is all I want to see when I save a file:

A dialog that simply asks the user to type in a few words describing the file to be saved.

The operating system can put the file wherever it wants, I don't care, so long as it remembers when it was created (and by whom), when it was modified (and by whom), what type of file it is, and the magic words I typed when I saved it. If the OS knows something about the file format, such as being able to interpret mp3 id3 tags, a document's author, or a download's originating site, remember that sort of thing as well.

When it comes time to load the file again, it's a simple matter of selection. First eliminate all the files that the application doing the loading can't operate on, and all files the current user can't access. Prefer files that were modified recently, by the current user, but provide a few simple menus for changing that. Add a simple, forgiving search on the description, and a more in-depth content-aware search if you're really stuck. Voila. No more messing around with the heirarchical filesystem. We got rid of heirarchical databases in the 1970s, after all.

Of course, for reasons of efficiency, the regular filesystem will still exist in the background. If I'm feeling incredibly nerdy, I might even drop down to it occasionally. And I'll need to know a little about it if I'm reading files off a CD, or trying to organise a website. But mostly, that sort of thing is just going to get in the way, and I don't want to know about it.

Some people may get a bit upset about the fact that they no longer know where their files are. Ignore them. They're just a little resistant to change. These same people have gone years without knowing what inodes their files inhabit, or precisely which sectors on disk. Why the hell should they care where in the filesystem they reside, so long as they have a reliable way of getting them back again when they want them?

Thankyou,

Charles Miller, Alpha Nerd.

My distrust of Relational Databases comes from a time before I was a full-time programmer. I was a part-time programmer, part-time sysadmin. I wanted to install an SNMP client for Linux. The problem? The client wanted to store data somewhere, and the writer of the client had decided to store it in MySQL. This was a desktop application, a pretty small one for that matter, and the author had decided I needed to install an RDBMS just to store a few hundred Kb of data. Ever since, I've taken to examining very carefully every time I just assume that the answer to the question “Where do I store my data?” is a relational database.

RDBMS are slow. They're massive generalisation, and thus will waste an awful lot of cycles trying to cater for possibilities that you'll never encounter in your application. RDBMS require specialist expertise to tune effectively. DBA's are a particular breed of wizard, with an incredible store of arcane knowledge. RDBMS are large applications that require additional maintenance. You suddenly have to wonder about the bugs and security holes of a whole new big application. When a new major version of the database comes out, and you want to upgrade, you have to dump and restore all your data. If you upgrade your Linux distribution, you might find your data has been left behind entirely.

I feel pretty confident saying that a significant number of people who are using a Relational Database, shouldn't be. They shouldn't be for one of the following reasons:

  • They're not storing enough data.
  • They don't need to make arbitrary relational queries.
  • They don't need ACID transactions.

As an illustration, I'm going to pick on one application: the QDB (Quotes Database) at bash.org, which I go to occasionally, and which is often painfully slow. As a disclaimer; I know that the site uses MySQL for data storage because it tells me it does. I don't know what caching the application does internally, it could be they're using a really smart architecture and the database isn't the bottleneck. It certainly appears to be, though. Quite often, I get an error message telling me it timed out connecting to the database, which seems to point to the DB as being the point of failure.

Looking at the statistics on QDB, there's fewer than 15,000 quotes in the database. Very generously figuring an average quote at 1k, that's 15Mb of memory. That's nothing. That's room for the database to grow by a factor of twenty and still fit in the RAM of a low-powered server. And think how much cheaper RAM will be by the time the database hits 300,000 quotes.

The select operations that need to be performed on the quotes are really simple. The only ones seem to be sorting by rating, random selection, and text search. The first two can be handled incredibly efficiently by in-memory data-structures. Even if you're really strapped for memory and can't hold all the data there, the application could store the quotes in something simple like Berkeley DB, keep sorted indexes in memory, maintain an LRU cache, and cheat the random function by keeping a “random pool” of quotes in memory that just cycles every so often. Remember, you can serve the same “random” page to as many people as you like, so long as you never give it to the same person twice.

The only complex query in this application is a full-text search, and there are specialist full-text search packages available that don't rely on an RDBMS. Java hackers should seriously consider Lucene for this purpose, for example. Remember, RDBMS's are slow, so the same full-text technology will be faster outside an RDBMS than inside.

One of the big benefits of a real database is ACID transactions. Most applications don't need them. If you're storing financial data, ACID is a necessity. If you're running a web application, it's overkill, and once again it's overhead that your application will suffer from. MySQL became the most popular Open Source database long before it had transactional support, because it was fast. Why was it so fast? Well one might guess, partly because it didn't support transactions. And nobody but a few reviewers at Ars Technica really noticed for years.

Sometimes, you'll need an RDBMS. You're storing massive amounts of data. ACIDity is something you just can't do without. You want to be able to do arbitrary relational queries on your data, or access your data in so many different ways that you'd basically have to reinvent the relational model just to implement them. If you find yourself in this position, go for it.

Sometimes, you'll want an RDBMS. You need to store data somewhere, you're familiar with SQL and the database API, you don't want to worry about coming up with your own data model, dealing with concurrency issues, or messing around with different ways to safely persist your data to permanent storage. This is when you should take a good, long look at yourself. This sort of decision-making will come back and bite you in the long run.

Morning-itis

  • 8:37 AM

When I get to work, there's always this gap where I sit, staring at the monitor, thinking “Damnit. What was the last thing I thought before I left work last night?” This is especially a problem on Thursday mornings, since Wednesday night involves a significant amount of beer and pool-playing.

So this morning I thought: I do test-first programming. So, what if I made sure that I never left work without at least one failing test? I could then walk in, run the test suite, and the red bar would point to exactly the spot in the code that I need to be working on next.

It's a thought.

Think before you paint
Painted on the road at a crosswalk: ‘Thnik Before You Cross’

Apropos a discussion of the non-final-ness of final fields, a colleague came up with this gem. What does the following code do?


    // By Jed Wesley-Smith
    public String getAString() {
        //get a reference to the private field 
        //value in String class.
        java.lang.reflect.Field stringValue = 
            String.class.getDeclaredField("value");
        //make it accessible
        stringValue.setAccessible(true);
        //unsuspecting string
        String sittingDuck = "sittingDuck!!!!!";
        //black magic happening here
        stringValue.set(sittingDuck, 
              "hastaLaVistaBaby".toCharArray());
        //guess the output of this!
        return "sittingDuck!!!!!";
    }

Clarification: the value field in String isn't actually final, if it were final we wouldn't be able to change the string's value from outside the String class. The thing about final fields not really being final was a catalyst for this code, but wasn't used in it. (Regarding final fields, the JVM only prevents the changing of a final field from outside a class. Technically, a class can change its own final field at any time, and is only prevented from doing so by the compiler. So you can get around the finality of final fields using bytecode manipulation.)

Prior art: Java Specialists' Newsletter: Insane Strings

Correction of Clarification You could still pull an equivalent trick if the value field were final: it's an array, so even if the field were final, the contents of the array would be wide open.

This quote from Hibernate's “Why This Project is Successful” page caught my eye:

Good standards can provide interoperability and portability. Bad standards can stifle innovation. “supports XXX standard” is not a real user requirement, particulary if XXX was designed by a committee of “experts” who, throughout the entire process, never once ate their own dogfood. The best software is developed by trial, error and experimentation. De facto standards are usually a much better fit to user requirements than a priori ones.

Jeremy Zawodny has discovered the NullPointerException, and laughs because he's been told that Java is “the language without pointers”. Claiming that Java is a language without pointers is, of course, rubbish. All variables that refer to objects are references (i.e. pointers) to that object, which resides somewhere in the heap.

What Java lacks, is pointer arithmetic. The only operation that is permitted on a pointer is dereferencing (the ‘.’ operator), which allows you to perform actions on the object being pointed to. There are two reasons for this. Firstly, it frees the JVM to do memory-management how it wants, and not have to be a slave to the “pointer equals memory address” thing. Secondly, and more importantly, it prevents a large class of memory-corrupting, security-destroying, application-crashing bugs that can result from direct manipulation of pointers.

Which brings us to null. Null is the value for a reference which means “this reference does not point to any object”. If you try to dereference null, you get a NullPointerException.

Some languages treat null differently. Objective-C's nil is a reference to the null object, which responds to any message you throw at it with nil. This makes an unassigned reference something like a black hole, you can send anything in but you'll get nothing back. Smalltalk gives you both worlds: nil is a singleton that passes every message to doesNotUnderstand. In development environments this throws up the debugger when it's reached, but common practice is to redefine nil to observe the Objective-C behaviour in production.

Some people prefer the Smalltalk/Obj-C approach, although the argument is based on something of a straw-man example of Java code. I prefer the Java approach, and my reasoning goes back to programming by intention. (See also: Programming by Coincidence)

One good piece of advice from the Pragmatic Programmers is: “Crash Early: a dead program normally does a lot less damage than a crippled one.” Most of the time, sending a message to a null reference is a mistake. Most of the time, you expect there to be a real object at the end of your message, and if there's no object there, it's quite likely because you forgot to put it there, or mistyped an earlier assignment. If a message to the null object just returns null, that mistake isn't going to be flagged anywhere near where the problem is. It's just going to introduce some crawling data corruption which will spread throughout your program until you have nulls all over the place, each the result of a call to one of the previously created nulls. Eventually something will break, but it'll break unpredictably, a very long way (in the code) from the original error.

I am aware that many people disagree on me with this point, and that with sufficient unit tests, it's much less of a problem. I'm not trying to start a flame-war, this is just my opinion, and my experience. It's also the way nulls work in Java, so while we're being paid to hack in that language, we're stuck with it anyway, so we may as well remind ourselves of its rationale.

So, without further ado, here are Charles' guidelines for dealing with how null works in Java:

  1. Before you start writing a method, decide whether it will ever return null. Document this decision in the @return section of the method's Javadoc.
  2. Never have a method return null unless there's a really good reason for it.
  3. If your method returns an array or a collection, there's no reason to ever return null. Return an empty array or collection instead.
  4. If you have a situation where null is a valid value for a variable, you can make use of the Introduce Null Object refactoring, to make the behaviour explicit.

(Obviously, this post was written long, long before I encountered the “Option” type.)

Legend

  • 9:47 AM

Steve Waugh is a dead-set legend. Possibly even more-so than Boonie.

Matt Olson: How to Write Like a Wanker

A popular misconception about text messages on the Internet is that, to be an effective communicator and earn the respect and admiration of your peers, you must be able to write lucid prose; that your messages, articles, posts and pages must be easy to understand and pleasant to read.

Nothing could be further from the truth.