June 2004

« May 2004 | Main Index | Archives | July 2004 »


Preface. I DO NOT HAVE ANY GMAIL INVITATIONS. I have given them all to friends, co-workers, and one random IRC acquaintance.

We know that Google's parceling out of Gmail invitations was a really smart marketing move. The invite system made the service seem that much more exclusive, and thus that much more desireable. However, I'm betting it wasn't their primary motivation. I suspect there was a far more important reason to launch the service this way: stability.

Here's a possible scenario.

Google finish the beta program for Gmail, and open subscriptions to the public. Over the next few days, millions of people subscribe and explore the interface. Many start pumping their mail archives into the service. This creates a load-spike orders of magnitude higher than the service would normally have to maintain.

At the same time, some Google engineer starts discovering that a number of parts of Gmail that they thought would scale linearly, don't.

Over the next month Gmail is down more often than it is up, none of the programmers are getting any sleep because they're too busy putting out fires, and some sub-editor at the New York Times decides "Gfail?" would make a good headline for the post-mortem, which really isn't good publicity for a company looking towards its IPO.

Invite codes are a really smart way to control growth. After a few weeks, anyone "in the know" enough to really want a Gmail account, can have one. At the same time, Google can control the service's rate of growth. There's no explosive rush, and if anyone notices something isn't scaling as well as it should, they can throttle back the issue of new invitations until the problem is solved, without too many sleepless nights.

Unfortunately, it's the sort of tactic that only works if you're as desireable a service as Google: giving Bob three invitations only helps if Bob knows three people who want to sign up. If you're just a startup with a good idea, an invitation program would just kill any chance of your service getting traction.

I wouldn't be surprised, though, if I saw future big-name MMORPGs opening their doors post-beta with an invitation program.

Caption competition:

  • Imagine a Beowulf cluster of....
  • When I asked for a G4 tower on my desktop...
  • So, if one Mac is a lover, what's six stacked on top of each other?
  • Recycled aluminium: 10c per can.
  • Desperately trying to prove we belong at WWDC.
  • Maybe if you hook them all together, they'll be able to run Doom 3?
  • Confluence. Because if you can't trust a room full of crazed Mac bigots, who can you trust?
  • All that's missing are a bunch of monkeys running around the base beating each other with sticks.
  • Now we know what they're doing all day instead of fixing Javablogs...

Edit: I forgot...

  • Hey, one of us has a girlfriend, too!

One trend that's been wandering casually around the Internet lately has been the use of Javascript to highlight words in a page, if you visit that page via a Google search.

Like many of these web-tricks, it was interesting and 'neat' at first, but now I think its 15 minutes is up, and can we please quietly pack that script away and move on to something else?

The benefit of search-term highlighting is that it allows you to see where in a document the match occurred, which may sometimes be hard to spot.

The practical result, however, is different. It's very rare that I ask Google to give me a page that occasionally mentions the terms I am searching for. If it does, then Google is either not doing its job, or it's a really obscure search. What I usually want (and end up with) is a page that is largely about the terms I am searching for.

Even if I'm searching for something really specific that might only be mentioned in one section of a document, I'm probably going to have used several search terms that occur all the way through the page, and then added the specific, narrowing term on the end.

Which means, from experience, that these Javascript-enhanced pages light up like a Christmas tree.

The effect of the highlighting is to completely disrupt the flow of the page. The highlighted terms are dotted pretty evenly through the page (making having your eyes drawn to their location pointless), and the highlighting is usually more colourful and 'interesting' to your eyes than the page's headings, which might be more useful in locating the precise information you were after.

I Love Postgres

  • 3:34 PM

Here's a quick quiz. You have a table in Postgres 7.3.x with BIGINT primary keys.

Q: What is the difference between the following queries?
  1. select * from foo where pkey = 12345;
  2. select * from foo where pkey = '12345';

A: The latter does an index lookup. The former performs a full table scan.

We discovered this while messing around with Postgres's EXPLAIN command, trying to work out why Javablogs was running so slowly. It turned out that without the quotes, selecting a single blog entry by its primary key cost 5000 of whatever metric Postgres measures query performance in. With the quotes, the cost dropped to... 5.

If you make the Primary Key a NUMERIC instead of a BIGINT, you get the index scan both ways (but it looks like the index scan over the numeric key is slower than it is over the bigint key...)

Anton tried to explain it to me, but my brain seized up. It was something to do with casting between INT4 and INT8. You can get an unofficial patch for the Postgres JDBC driver that fixes the problem with explicit casts, but it's ugly and untested.

Alan: "Now you know why good DBAs get paid so much, and why they are so horrified at the thought of developers writing SQL."

Apropos a conversation with cow-orkers on the way to lunch yesterday:

If those kooky stem-cell researchers were to discover a way to grow human meat in a vat, such that it was never at any point a real, living human being...

Would you eat it?

Saggitarius: Your hard disk light will not go out today. Plan for occasional bursts of activity. Your lucky number is 2.6.5-gentoo-r1.

Name: The File/Stream Duality

Context: Your API retrieves data from, or writes data to a file


  • The naive approach is to have the API take in a filename, or a File object to work on.
  • Far too many developers believe this is sufficient
  • Unless the file is random-access, then in order to read or write to that file, you're going to have to turn it into a byte stream.
  • One day, somebody is going to have a stream of bytes that is not a file, but that they want to pass through your API.
  • When that happens, this person will curse you, your parents, and the town you grew up in.


If you are writing an API that takes a filename, instead provide an API that does precisely the same thing to an arbitrary stream of bytes, and then add "convenience" methods that apply those stream-based methods to files.

"You need to buy a printer."

"No, I don't. Pretty much everything I do is electronic, all of my records are electronic, why do I need to print anything out?"

"No seriously, you need a printer."

"I've done quite well without one so far. The rare times I need hard copy for a letter or something, I just, er, steal office supplies."

"Go buy a printer."

"Look, you know me. I never throw anything out. Within six months, my life would be completely buried in paper."

"But you need one!"

"We've gone through this. On the computer, everything's sorted, searchable, and even if it's clutter it only takes up virtual space. Why do I need anything on paper?"



"Yeah. You can't scribble on electronic documents. Some documents are easily editable. Some document formats can be annotated. Some let you 'cross items off' a list. Inevitably, though, this means you have to create a document with the specific intention of being editable, or annotated, or things-to-do-ish."

"...whereas any document can be printed out and scribbled on. You've got a point."

"Well, I am you."

"You are? Why am I talking to myself then?"

"Call it a literary conceit."

"Ah, I'm being pretentious again, aren't I."

"Hey, you said it, not me."

(Prediction: bc298e458cdacac56b5247bc5f8f1a62)

Statement 1:

Bob said something that I disagree with. This is why I think he's wrong.

Statement 2:

Bob is an idiot. This is why I think he's an idiot.

I hope it's pretty clear that the first statement is at least vaguely productive, while the second is not.

Which isn't to say I restrict myself to being productive all the time. This is my blog, and what's the point of self-publishing if you can't be self-indulgent and insulting when you feel like it?

What's annoying is when you try quite hard to write a type-1 post, and someone comes in and writes a type-2 comment. Any work you did to stick to the facts is immediately undermined: however balanced your post may or may not have been, the presence of the comment skews any perception of the original post.

What do you do? If you leave the comment un-challenged, you look like you're providing a supportive environment for random flamers. If you try to interject, you're getting into an argument that's really between the commenter and Bob, that you really have no personal interest in being a part of. Ultimately, you don't care, and any time spent debating the matter is time that could be better spent drinking tea and watching documentaries about just how many maggots can legally be contained within a can of tomatoes according to the US FDA1.

And, of course, if you delete the comment then you feel like you're silencing dissent, which is totally counter to the reason you opened up comments in the first place.

1 Two, I think.

John Gruber of Daring Fireball dares ask the question:

Why are Windows users besieged by security exploits, but Mac users are not?

Boiled down his answers are:

  1. Market-share is a factor, but there has to be some other explanation for the fact that Windows' market-share in malware vastly outstrips its market-share on the desktop
  2. There are fewer places to hide bad programs on the Mac
  3. Mac users are far less tolerant of programs that spread malware

I disagree with the first point. You can explain almost all of the relative safety in running Mac OS X with its low market-share.


This argument ignores numerous facts, such as that the Mac’s share of viruses is effectively zero; no matter how you peg the Mac’s overall market share, its share of viruses/worms/Trojans is significantly disproportionate.

In order to spread, viruses, worms and trojans rely on network effects. The value of a network grows as the square of the number of users. Therefore viruses, trojans and other malware are simply orders of magnitude more effective when targeted against a widely deployed platform.

Imagine you send the latest Mac-targetting email trojan to 100 random addresses. If you're lucky, three of them might be Mac users. If you're lucky, one of them might open the attachment, causing the trojan to be sent to all of the people in that person's address-book, most of whom will also be Windows users. Meanwhile all the Windows users will receive this attachment that they can't run, and get back to the person who sent it to them.

The trojan's just not going to get off the ground. The effectiveness of sending a Windows-targetting trojan is just several orders of magnitude higher. Even if your initial mail-out went only to Mac users, it would probably fizzle out after the first generation.

Even with spyware and adware that do not propagate over the network, the Mac is a small enough target that it is not worth tackling.

For packaged software, there are market segments. There's value in targetting a product at a small market, so long as the market wants the software, and the competition is perhaps less cut-throat than in the dominant market. That's why software exists for the Mac. Malware has no market segments, because people aren't looking to install malware. If someone has one piece of spyware installed, that doesn't mean they're not going to get another: on the contrary, it means they're more likely to install another. There's no value in targetting malware at a niche market.

I would dispute that there are fewer places for malware to hide on the Mac: I could think of some pretty interesting places you could hide programs in the Unix subsystem, or by playing tricks inside existing Application bundles. I would also dispute that any UI measures make the Mac inherently safer from malware: if you convince someone they really want to open that attachment, or download that "login application" they need to access the porn site, no amount of warning dialogs will make any difference.

I also dispute the "broken windows" theory, just on the basis that it's easy to assume ever-vigilance against something that has not yet shown any sign of existing. Communities exist in the Windows world to warn of adware-infested applications, but there's still just too many people who just want to get on the file-sharing network, and don't do their homework.

As Gruber says, even if market-share is the dominant reason for the Mac's relative security, this isn't a bad thing: since that share is unlikely to rise significantly, the Mac will stay safe from general threats.

What I'd like to add, though, is that there is still no room for complacency, because none of this keeps you safe from specific threats. Specific threats get no value from the network effect. If I want to get into your computer, I no longer care about the market-share of your operating system: the only target I care about is you.

Three years or so ago, the only thing on TV was Starship Troopers. Dutifully I succumbed, because when there's only one thing on, you have to watch it. I then wrote a quick review:

Boy did this movie suck.

It sucked badly.

On the suck-o-meter, it rates slightly more suckage than "bowling balls through a garden hose".

I've never spent an action movie wishing that all of the lead characters would just die. Not one of them had a single redeeming feature. I wanted to see each of them eviscerated by a CGI monster. This is a movie where the computer-generated aliens were better actors than the humans. If this is the future of the Earth, then for God's sake just let the bugs kill us all.

Tonight, the only thing on TV is Starship Troopers. So guess what I'm doing?