October 2004

« September 2004 | Main Index | Archives | November 2004 »


Ruby Performance

  • 11:03 AM

If you read my blog with any regularity, you'll know that I really like the Ruby programming language. It's the language I feel most fits the way my brain works, and when I'm writing Ruby code, I feel happier than when I'm writing code in other languages.

That said, I have some pretty serious doubts about the ability of the Ruby interpreter to do real work in its current form.

Take, for example, a 25Mb XML file that I wanted to investigate the contents of. I thought it would be cool to load it up inside Ruby, because then I could use the interpreter to give me an interactive shell to play around with the contents of the document.

Anyway, first here's my baseline: loading the file into a DOM Document object using dom4j on my 1.5Ghz Powerbook:

Epiphany:/tmp cmiller$ time java -Xmx256M DomTest

real    0m19.413s
user    0m17.270s
sys     0m1.070s

Now, it's Ruby's turn, using REXML to load a DOM tree:

Epiphany:/tmp cmiller$ ruby -v
ruby 1.8.2 (2004-07-16) [powerpc-darwin]
Epiphany:/tmp cmiller$ time ruby read.rb

real    33m22.680s
user    14m44.710s
sys     1m3.670s

OK, that's a pretty huge difference. Ruby is a factor of fifty slower at parsing XML. At first I thought this might just be REXML's fault. It admits only to being "reasonably fast" on its homepage. Maybe there's just some pathological algorithm going on inside that particular library.

Then, on a hunch, I turned off Ruby's garbage collection with GC.disable and ran the test again.

Epiphany:/tmp cmiller$ time ruby read-nogc.rb

real    45m20.425s
user    5m52.830s
sys     2m37.110s

The "real" time went up, partly because I was busy doing other stuff at the time, and partly because the amount of memory that was eaten up pushed my Powerbook into swap, but the microbenchmark seems to show that in the first run, Ruby was spending more than half its CPU time doing memory management.

Hopefully, this is something that will be fixed in Ruby 2.0. The plans are to move to bytecode compilation and generational GC, both steps in the right direction. For now, though, 2.0 is quite decidedly vaporware.

So yes. As much as I love the Ruby language, I'm not sure that I'd trust it with too much heavy lifting just yet.


  • 8:41 AM

Last night, amongst other things, I was watching day two of the Australia vs India test match. (That's cricket, for those of you who think having two countries involved makes a "World Series").

Now you can say what you like about Shane Warne. He's not the brightest, nor the most couth card in the pack. He's kept the tabloids busy, what with dodgy Indian book-keepers and mobile phone calls to strange women. Most of Australia is glad that he was never made captain, even though from his on-field performance he would have well deserved it.

Watching Warne play cricket, though, you get the impression that the reason he lacks wit off the field is because he's saving it for the game. What Warne (and Australia's other premium, and now veteran bowler, Glenn McGrath) brings to the game is absolute focus. You can see it in his eyes: on form, Warne approaches every delivery utterly convinced that it is going to result in a wicket. Warne bowled twenty-one overs yesterday. That's one hundred and twenty-six balls, each of which was driven with an unshaking conviction that this one would send the batsman back to the pavilion with his bat under his arm.

Players complain that Warne tries to intimidate batsman and umpire alike by appealing everything that's even remotely close. And he does. But he's appealing because when the ball left his hand, he was already convinced he'd got you out. And when you hit him for six, as will commonly happen to slower bowlers, he's twice as focused because you just slogged the ball that should have got you out, and he blames himself and needs to make amends.

So here's to you, Warney. Bit of a joke off the field, dead-set legend on it.

Microsoft, on their front page, are offering "5 Steps to Improve Your Online Security", decorated with a photo of a woman typing on a laptop.

In the stock photograph from which Microsoft's designers ever-so-carefully cropped the image, we learn the real advice Microsoft is subliminally offering its customers. Want to be secure online?

Get a Mac.

(From macpro.se via the Livejournal MacOSX community)

Update: Microsoft have since updated their homepage, so for posterity (and so you know I'm not just making this up), here's a screen grab I took of the incriminating photo in context.

[Note for the mentally infirm: This is ironic humour. Flames will be redirected to /dev/null. If you want my serious opinion on the security of the Macintosh platform, you can find it here.]

It may just be me, but every time I come across a reference to John Kerry (or any other politician) "flip-flopping", all I can think of is that episode of The Simpsons when Sideshow Bob ran for mayor.

Bob: Young friends, my opponent, Joe Quimby, is confused about your school system. Do you know what he does? He flip-flops. [does backflips; children marvel] Sometimes he doesn't know whether he's coming or going. [walks funny; children clap and cheer] He wants to sell your future short. [shrinks, walks sideways; children clap more]

(It's funnier to watch, I assure you)

We know Fox is helping write the Republican election script, but I never suspected those notoriously liberal Simpsons writers to be on the team. Damn that Murdoch!

Meanwhile, this quote from The New York Times has been making the rounds, and each time I've read it my brain has rebelled and assured me that this can't possibly have really happened. But it still makes a great story, and it sums up why admitting a mistake, changing your mind, or even having reservations about a decision are now political mortal sins: they're all evidence of reality-based thinking.

In the summer of 2002, after I had written an article in Esquire that the White House didn't like about Bush's former communications director, Karen Hughes, I had a meeting with a senior adviser to Bush. He expressed the White House's displeasure, and then he told me something that at the time I didn't fully comprehend -- but which I now believe gets to the very heart of the Bush presidency.

The aide said that guys like me were ''in what we call the reality-based community,'' which he defined as people who ''believe that solutions emerge from your judicious study of discernible reality.'' I nodded and murmured something about enlightenment principles and empiricism. He cut me off. ''That's not the way the world really works anymore,'' he continued. ''We're an empire now, and when we act, we create our own reality. And while you're studying that reality -- judiciously, as you will -- we'll act again, creating other new realities, which you can study too, and that's how things will sort out. We're history's actors . . . and you, all of you, will be left to just study what we do.''

JIRA 3.0

  • 10:32 PM

Somehow, during the Confluence team's Friday night Quake tournament1, I missed the release of JIRA 3.0. (See also: The ServerSide and Javalobby.) I rarely go into shameless promotional mode on my blog, but I think this is a big enough occasion for me to make an exception.

Mad props to the JIRA team. They've been working their arses off getting this release together, and they've come up with some great stuff. I've become accustomed to going out in the evening, coming back to the office late at night to pick up my stuff on the way home, and finding someone still there, working to put the finishing touches on 3.0.

In my previous job we evaluated JIRA for one project (but despite a strong push from the development team couldn't get approval to buy it), used it intensively on another, and when Atlassian started advertising available positions it was pretty easy for me to say "I want to work at the sort of place that writes software like that."

Or, as John Davies said on the ServerSide thread:

If you don't know what Jira is, take a look, if you have a bug/issue tracking system and it's not Jira then take a look. If you just like looking at really cool well written software then take a look.

1 Noteable, mostly, for the way it transforms meek, polite programmer Dave into trash-talking "I'm going to kick all your asses at once" Quake-God Dave.

You may or may not have noticed that Javablogs has been going through another of its periodic bouts of low uptime. It's not been nearly as bad this time as it has been, though, mostly because the Contegix guys have been doing such a great job of monitoring the server and kicking it when it goes down.

After a week of totally failing to locate the performance drain, I managed to catch the site just as it was falling into a deep hole. 'top' and 'vmstat' both told me that we weren't using a third of the available memory, and the CPU load rarely hit double-digit percentages. All of the resources seemed to be sinking into I/O.

So I did what you always do when you're lost in I/O, but too lazy to pinpoint it exactly:

  • Increase Postgres' shared memory setting
  • Add an index or two where we were still doing occasional full-table scans on the blog_entry table
  • Cache a few expensive DB queries that nobody will notice aren't performed live.
  • Disabled swap on the server (swapoff -a)

OK, maybe the last one isn't something you always do, but I had good reason to try it. In advance, I'd like to say that I am not a professional sysadmin, nor am I a kernel guru. I could be getting this entirely wrong. It seems to be working for me, though, so I thought I'd share.

A naïve implementation of virtual memory will fill up the physical RAM first, maybe leaving a little space for disk caching. Once there is no more RAM left, it will reluctantly start paging unused blocks of memory out onto disk. Advanced VM managers, on the other hand, recognise that you get the best performance through a balancing act between disk cache and virtual memory.

This is why, on a modern Linux box, you find yourself using significant amounts of swap long before you've run out of memory. If a page of virtual memory is accessed less often than a particular block on the disk, you get better performance caching the disk block than you do keeping the memory block in RAM. Thus, it's possible to get better performance on a system with swap than one that holds everything in physical RAM; purely because it gives the OS more breathing-room to do clever things with caches.

I couldn't help but think, though, that Linux was getting it wrong in Javablogs' case. Almost half of the Java process was paged out, as were substantial amounts of the Postgres connection processes. At any moment, there was a lot of stuff moving in and out of swap, even though physical RAM was divided 30/70 in favour of disk cache.

Paging out bits of the Java VM is a pretty bad idea. Java does a lot of traversing, compacting, slicing and dicing its heap to collect garbage and maintain performance, and if it has to start pulling chunks of that off disk, your whole application's performance is going to suffer.

Similarly, I'd granted more memory to the Postgres connections specifically because I wanted to give Postgres more room to cache things in fast RAM. Having Linux in turn push parts of that back onto disk was defeating the purpose entirely.

Ultimately, I realised that of the two main processes on the box, I trusted them to regulate their own memory more than I trusted Linux.

So I turned off swap. And everything seems to be running pretty smoothly. There are still bursts of IO as the database does something complicated, but in general, the box is back to barely working up a sweat.

Of course, the other thing turning off swap does is remove the breathing-room you get if you run out of physical memory. So I did some back-of-the-envelope calculations. Typically, the box Javablogs is on is using about a third of its physical RAM for programs and data, with the OS holding on to the remaining two thirds as disk cache. Even if Javablogs were pushed up to using its configured maximum heap size, the memory usage would still only go up to 60-65%.

Like I said to Matthew Porter at the time: "The box may die hard. But only if I'm very, very wrong."

The 2.6 Linux kernel has a "swappiness" sysctl, that allows the sysadmin to fine-tune the balance between swap and cache. It would have been interesting to play with that, but for the fact we're not running 2.6.

It would be even cooler if you could do some kind of per-process "swap-niceness". In the same way that 'nice' raises or lowers a process's priority in the scheduling queue, "swap-nice" could raise or lower a process's priority for having bits of it swapped out. Maybe such a thing exists already. I couldn't find it.

During World War Two, a number of Pacific islands that had previously been left undisturbed by civilisation found themselves home to military bases. These bases had airstrips on which landed planes full of clothing, food, and all the other things you need to run an outpost.

After the war, the bases were vacated and dismantled, leaving behind a lot of bemused islanders. When the strange people were around doing strange things, food fell from the sky. When they left, it stopped.

So some islanders began to mimic the activities of the soldiers who had left. They sat in towers, carved new equipment out of wood, lit signal fires, and waited for the food to fall again. Anthropologists dubbed these religious activities "cargo cults". (Wikipedia)

Sometimes, I find myself performing cargo cult programming1.

A piece of code works. I don't really understand how it works or what it's doing. But if I copy it over here, tweak the edges and poke it a few times, I'm pretty sure I can get it doing what I want it to do well enough that I don't have to know how it works.

I'm not really programming, I'm just standing on someone else's runway waving a torch, hoping it'll work.

You don't need me to tell you that this is a Really Bad Thing. The resulting code is error-prone, brittle, inefficient, and don't even think about trying to remove duplication or foster reuse.

That said, sometimes you have a task that's about important enough to deserve an hour's work, but that would take a day to understand. You can either abandon the task as unscheduleable, or you can cargo-cult it. "So long as you understand you don't know what you're doing", I tell myself with a knowing wink. "It will be fine."

This explains how I could have spent the last year on a project that uses Maven, written several build targets (and a tool to build and test against multiple databases and application servers), and still have no real clue how Maven works.

1 This is related to cargo cult software engineering, but not the same since I'm talking about code, not process.

Charles and Ang get home, turn on Rage, and the Rolling Stones are playing.

Charles: They truly were an ugly band.
Ang: I can't get over how big his mouth is.
Charles: If he and Julia Roberts kissed, it would swallow the sun!

Anyway. By some total stroke of luck, I scored a ticket to the Sydney première of We Will Rock You, the Ben Elton-scripted musical that's really just a flimsy excuse to string a bunch of Queen songs together. It was way cool. If you like Queen songs, go see it.

The tickets were in the middle of the theatre, five rows back from the stage. Go me. Guy Sebastian was sitting in the seat in front of me, but I was too polite to tell him he was a soulless slave of corporate rock. He probably knows already. He seemed to take all the jokes about Australian Idol in the show pretty well.

By some even greater stroke of luck, I scored a pass in to the post-première party. It was full of Beautiful People, so we spent most of the time in a corner, feeling a little out of place in that crowd, and occasionally half recognising a face as it walked past. Later in the night, though, the pay-off. Brian May and Roger Taylor had turned up to the show, and took the stage at the party.

I was ten feet away from Brian May kicking some serious ass on guitar. Loud cries of "We are not worthy" were heard from the crowd.

Abu Ghraib

  • 1:17 AM

Abu Ghraib

Like most Iraqi women, Alazawi is reluctant to talk about what she saw but says that her brother Mu'taz was brutally sexually assaulted. Then it was her turn to be interrogated. "The informant and an American officer were both in the room. The informant started talking. He said, 'You are the lady who funds your brothers to attack the Americans.' I speak some English so I replied: 'He is a liar.' The American officer then hit me on both cheeks. I fell to the ground.

Alazawi says that American guards then made her stand with her face against the wall for 12 hours, from noon until midnight. Afterwards they returned her to her cell. "The cell had no ceiling. It was raining. At midnight they threw something at my sister's feet. It was my brother Ayad. He was bleeding from his legs, knees and forehead. I told my sister: 'Find out if he's still breathing.' She said: 'No. Nothing.' I started crying. The next day they took away his body."