Ruby Performance

If you read my blog with any regularity, you'll know that I really like the Ruby programming language. It's the language I feel most fits the way my brain works, and when I'm writing Ruby code, I feel happier than when I'm writing code in other languages.

That said, I have some pretty serious doubts about the ability of the Ruby interpreter to do real work in its current form.

Take, for example, a 25Mb XML file that I wanted to investigate the contents of. I thought it would be cool to load it up inside Ruby, because then I could use the interpreter to give me an interactive shell to play around with the contents of the document.

Anyway, first here's my baseline: loading the file into a DOM Document object using dom4j on my 1.5Ghz Powerbook:

Epiphany:/tmp cmiller$ time java -Xmx256M DomTest

real    0m19.413s
user    0m17.270s
sys     0m1.070s

Now, it's Ruby's turn, using REXML to load a DOM tree:

Epiphany:/tmp cmiller$ ruby -v
ruby 1.8.2 (2004-07-16) [powerpc-darwin]
Epiphany:/tmp cmiller$ time ruby read.rb

real    33m22.680s
user    14m44.710s
sys     1m3.670s

OK, that's a pretty huge difference. Ruby is a factor of fifty slower at parsing XML. At first I thought this might just be REXML's fault. It admits only to being "reasonably fast" on its homepage. Maybe there's just some pathological algorithm going on inside that particular library.

Then, on a hunch, I turned off Ruby's garbage collection with GC.disable and ran the test again.

Epiphany:/tmp cmiller$ time ruby read-nogc.rb

real    45m20.425s
user    5m52.830s
sys     2m37.110s

The "real" time went up, partly because I was busy doing other stuff at the time, and partly because the amount of memory that was eaten up pushed my Powerbook into swap, but the microbenchmark seems to show that in the first run, Ruby was spending more than half its CPU time doing memory management.

Hopefully, this is something that will be fixed in Ruby 2.0. The plans are to move to bytecode compilation and generational GC, both steps in the right direction. For now, though, 2.0 is quite decidedly vaporware.

So yes. As much as I love the Ruby language, I'm not sure that I'd trust it with too much heavy lifting just yet.

The Fishbowl

Ruby Performance