April 2005

« March 2005 | Main Index | Archives | May 2005 »


<edit date="December 2010"> This blog post describes an old Java bug that has since (to the best of my knowledge) been fixed and all affected versions EOL'd. Regardless, it remains a cautionary tale about the problem of leaky abstractions, and why it's important for developers to have some idea of what's going on under the hood. </edit>

Every Java standard library I have seen uses the same internal representation for String objects: a char[] array holding the string data, and two ints to represent the offset from the start of the array that the String starts, and the length of the String. So a String with an array of [ 't', 'h', 'e', ' ' , 'f', 'i', 's', 'h' ], an offset of 4 and a length of 4 would be the string "fish".

This gives the JDK a certain amount of flexibility in how it handles Strings: for example it could efficiently create a pool of constant strings backed by just a single array and a bunch of different pointers. It also leads to some potential problems.

In the String source I looked at (and I'm pretty sure this is consistent across all Java standard library implementations), all of the major String constructors do the 'safe' thing when creating a new String object - they make a copy of only that bit of the incoming char[] array that they need, and throw the rest away. This is necessary because String objects must be immutable, and if they keep hold of a char[] array that may be modified outside the string, interesting things can happen.

String has, however, a package-private "quick" constructor that just takes a char array, offset and length and blats them directly into its private fields with no sanity checks, saving the time and memory overhead of array allocation/copying. One situation this constructor is used in is String#substring(). If you call substring(), you will get back a new String object containing a pointer to the same char[] array as the original string, just with a new offset and length to match the chunk you were after.

As such, substring() calls are incredibly fast: you're just allocating a new object and copying a pointer and two int values into it. On the other hand, it means that if you use substring() to extract a small chunk from a large string and then throw the large string away, the full data of that large string will continue to hang around in memory until all its substrings have been garbage-collected.

Which could mean you carrying around the complete works of Shakespeare in memory, even though all you wanted to hang on to was "What a piece of work is man!"

Another place this constructor is called from is StringBuffer. StringBuffer also stores its internal state as a char[] array and an integer length, so when you call StringBuffer#toString(), it sneaks those values directly into the String that is produced. To maintain the String's immutability, a flag is set so that any subsequent operation on the StringBuffer causes it to regenerate its internal array.

(This makes sense because the most common case is toString() being the last thing called on a StringBuffer before it is thrown away, so most of the time you save the cost of regenerating the array.)

The potential problem again lies in the size of the char[] array being passed around. The size of the array isn't bound by the size of the String represented by the buffer, but by the buffer's capacity. So if you initialise your StringBuffer to have a large capacity, then any String generated from it will occupy memory according to that capacity, regardless of the length of the resulting string.

How did this become relevant?

Well, some guys at work were running a profiler against Jira to work out why a particular instance was running out of memory. It turned out that the JDBC drivers of a certain commercial database vendor (who shall not be named because their license agreement probably prohibits the publishing of profiling data) were consistently producing strings that contained arrays of 32,768 characters, regardless of the length of the string being represented.

Our assumption is that because 32k is the largest size these drivers comfortably support for character data, they allocate a StringBuffer of that size, pour the data into it, and then toString() it to send it into the rest of the world.

Just to put the icing on the cake, if you have data larger than 32k characters, you overload the StringBuffer. When a StringBuffer overloads, it automatically doubles its capacity.

As a result, every single String retrieved from the database takes up some multiple of 64KB of memory (Java uses two-byte Unicode characters internally), most of it empty, wasted bytes.


The first computer I owned had 64KB of memory, and almost half of that was read-only. Which means every String object coming out of that driver is at least twice the size of a game of Wizball.

This turned out to be false. According to a reddit comment: “The c64 used bank switching to allow for a full 64KB of RAM and still provide for ROM and memory-mapped I/O.”

One possible workaround is that the constructor new String(String s) seems to "do the right thing", trimming down the internal array to the right size during the construction. So all you have to do is make an immediate copy of every String you get from the drivers, and make the artificially bloated string garbage as soon as possible.

Still, ouch.

A comparison between my new Mac Mini and a Motorola SURFboard cable modem

The device in the foreground is a moderately powerful general-purpose computer, including 40Gb of storage and an optical drive. The device behind it, occupying approximately the same volume, is a cable modem.

The cable modem is the most hated device on my desk. Everything else was chosen by me, to be an efficient part of my lifestyle. The modem was supplied by my cable company, with the most likely motive of shaving a few bucks off the cost of each new subscription.

I see no reason the cable modem couldn't be small, unobtrusive, and (pet peeve) USB-powered so I don't have to add yet another power-adapter under my desk. No reason aside from the fact that in this market the major purchasing decisions are made by a middle-man working for an effective monopoly (there are two cable companies here, and one refuses to run cable into apartments), not the end-user.

Maciej Ceglowski, who for a while has been a member of my list of bloggers I desperately wish I could write as well as, has poked a well-aimed skewer into Paul Graham's essay, "Hackers and Painters". It's one of those articles where, in trying to find an excerpt to quote here, I had trouble finding any paragraph that wasn't a gem.

It's surprisingly hard to pin Paul Graham down on the nature of the special bond he thinks hobbyist programmers and painters share. In his essays he tends to flit from metaphor to metaphor like a butterfly, never pausing long enough to for a suspicious reader to catch up with his chloroform jar.


Great paintings, for example, get you laid in a way that great computer programs never do. Even not-so-great paintings - in fact, any slapdash attempt at slapping paint onto a surface - will get you laid more than writing software, especially if you have the slightest hint of being a tortured, brooding soul about you.


I blame Eric Raymond and to a lesser extent Dave Winer for bringing this kind of shlock writing onto the Internet. Raymond is the original perpetrator of the "what is a hacker?" essay, in which you quickly begin to understand that a hacker is someone who resembles Eric Raymond.

One of the commonly repeated lies spread about the Mac by non-believers is that it is somehow 'limited' by the fact that it only ships with a single-button mouse. You see this 'fact' trolled across every single Apple-related discussion on the Internet.

In fact, nothing could be further from the truth.

Since 2003, all desktop Macintosh computers have shipped without a keyboard or mouse. Instead, they come with a single button bearing the Apple logo. If you get a chance to see one of these buttons, make sure you do, because they're a triumph of Jonathan Ive's industrial design: sleek and elegant, white plastic for the iMac, brushed aluminium for the PowerMac.

Each press of the button activates a sophisticated, context-sensitive OS X user interface, which will then proceed to do whatever it was you pressed the button for. The Mac uses a powerful predictive heuristic to determine what it is you are trying to accomplish, and just does it for you at the press of one button, no mess, no fuss.

Of course, sometimes this interface works a little too well. I remember one night I came home from work feeling frustrated at my lack of progress on a project. I pressed the button, expecting my iMac to make me a nice, hot cup of tea. Five minutes later, Jennifer Garner showed up at my front door carrying a case of beer.

Macintosh folk feel sorry for their PC-using brethren, though, so most of us plug in a keyboard and mouse anyway, just so they don't feel jealous.

This is Linux

  • 10:11 AM
(to co-worker, who is reading email)

If you just follow that link there...


(right-clicks on the link, copies it to his clipboard, and launches Firefox)

(Getting over a moment of surprise)

Oh, of course. You can't just click on the link, this is Linux.

Oh, I could, it would just bring up Mozilla.
Like I said, this is Linux.
I could change it. It would just take too long.

(Any comments explaining how to change your default browser in Linux will be pointed at and laughed upon for entirely missing the point.)

I've been playing quite a bit of World of Warcraft over the last few weeks. Not a huge amount, mind you, but enough to progress from "total clueless newbie" to just "somewhat green".

I've tried a few MMORPGs in the past. I tried Ultima Online soon after it was released and gave up after a few days. I came back a year or two later and played it on and off with an online friend who had vanished completely into the game, but it just didn't stick.

Everquest, if memory serves me correctly, had me playing for all of three hours before I decided it wasn't for me. The Sims Online was a really neat chat-room hobbled by a painfully bad in-game economy, and insufficient mass-appeal.

World of Warcraft, on the other hand, sucked me in quite effectively. I think what really did it was the well-designed quest system. From the moment you appear in the world of Azeroth, you're not just plonked down in a free-form world and left to your own devices, you're given things to do. The game is balanced so that you can do most of these things alone if you want, but sometimes you might form an ad-hoc group with other people at the same level of questing to finish off a particular foe.

This is neat, because it's like being back in one of the old Ultima games, before they started sucking. The world is full of people with problems, big or small, and your job as the hero is to go around solving them. But instead of the world being populated with filler characters who can only say "I'm too busy to talk right now", it's full of other real people, who can only say "STFU n00b".

I like the questing, because the social aspects of the game are really only a cool background thing. It's nice to occasionally grab other people to help with quests, but given I'm only a casual player, logging in at random times when I'm not busy with something else, the chance of even meeting the same person more than once is slim.

The ultimate problem, as I've started to discover, is that thanks to the necessary mechanics of an MMORPG servicing hundreds of thousands of users, questing is a frustrating thing. In the Ultima games, after you'd cleansed some town of the evil that had been afflicting it, the evil would be gone, and the town would be full of villagers who would thank you profusely before saying "I'm too busy to talk right now". In an MMORPG, once you've cleansed the town of evil, it hangs in limbo for five minutes... and then resets itself for the next guy.

This behaviour is absolutely necessary for the game to work, but at the same time you're left feeling powerless. I mean what's the point of doing the quests in the first place, if you're not really ridding the world of that ancient darkness, and the big evil red crystal will be there for hundreds of thousands of other people to destroy every day for the lifetime of the game?

(I mentioned this to Alan on Friday, and he pointed me to "I Saw God and I Killed It" -- the story of a group of Everquesters who banded together to kill a supposedly invincible monster)

It's no wonder that games like this quickly end up revolving around leveling and getting more powerful items - you can't change the world around you one iota, so the only thing you have the power to do is change yourself.

Logging into the Apple Developer Connection today, I was presented with this stark warning:

Mostly unremarkable. Everyone knows that Apple are serious about stopping people putting their developer seeds on BitTorrent the moment they're released.

Unremarkable, except that the word "your" changes what should be a warning into an accusation. Should I be battening down the hatches in case a pack of rabid NDA lawyers descend on me?

In the continuing hunt to find an accurate metaphor for software development, Monty Python offers:

When I started here, all there was was swamp. Other kings said I was daft to build a castle on a swamp, but I built it all the same, just to show 'em. It sank into the swamp. So, I built a second one. That sank into the swamp. So, I built a third one. That burned down, fell over, then sank into the swamp, but the fourth one... stayed up! And that's what you're gonna get, lad: the strongest castle in these islands. -- Monty Python and the Holy Grail

Every tomcat host configuration has a "webapps" directory. By default, web applications dumped into this directory are automatically deployed. So if you put an application into webapps/foo, it will be deployed as http://www.example.com/foo

Simple enough so far.

Tomcat host configurations also allow you to specify web applications explicitly using Context declarations. You add a Context tag, point the docBase parameter at the root of your web application, and voila.

This is where the fun begins.

  • By default, docBase is relative to the webapps directory
  • If something is in webapps and specified in a Context, it is deployed twice: once for the automatic deployment and once for the manual.

It's a subtle useability issue. Every default behaviour or configuration tells the user "this is how the application expects to be used". So if you default to auto-deploying anything placed in a certain directory, and you default to looking for manually deployed applications in the same directory, you're creating an affordance for a broken configuration.

The fact that the fix is dead simple (deploy the app elsewhere, or turn off auto-deployment), and it's a documented problem doesn't make it any less of a bad design decision.