October 28, 2004

Ruby Performance

If you read my blog with any regularity, you'll know that I really like the Ruby programming language. It's the language I feel most fits the way my brain works, and when I'm writing Ruby code, I feel happier than when I'm writing code in other languages.

That said, I have some pretty serious doubts about the ability of the Ruby interpreter to do real work in its current form.

Take, for example, a 25Mb XML file that I wanted to investigate the contents of. I thought it would be cool to load it up inside Ruby, because then I could use the interpreter to give me an interactive shell to play around with the contents of the document.

Anyway, first here's my baseline: loading the file into a DOM Document object using dom4j on my 1.5Ghz Powerbook:

Epiphany:/tmp cmiller$ time java -Xmx256M DomTest

real    0m19.413s
user    0m17.270s
sys     0m1.070s

Now, it's Ruby's turn, using REXML to load a DOM tree:

Epiphany:/tmp cmiller$ ruby -v
ruby 1.8.2 (2004-07-16) [powerpc-darwin]
Epiphany:/tmp cmiller$ time ruby read.rb

real    33m22.680s
user    14m44.710s
sys     1m3.670s

OK, that's a pretty huge difference. Ruby is a factor of fifty slower at parsing XML. At first I thought this might just be REXML's fault. It admits only to being "reasonably fast" on its homepage. Maybe there's just some pathological algorithm going on inside that particular library.

Then, on a hunch, I turned off Ruby's garbage collection with GC.disable and ran the test again.

Epiphany:/tmp cmiller$ time ruby read-nogc.rb

real    45m20.425s
user    5m52.830s
sys     2m37.110s

The "real" time went up, partly because I was busy doing other stuff at the time, and partly because the amount of memory that was eaten up pushed my Powerbook into swap, but the microbenchmark seems to show that in the first run, Ruby was spending more than half its CPU time doing memory management.

Hopefully, this is something that will be fixed in Ruby 2.0. The plans are to move to bytecode compilation and generational GC, both steps in the right direction. For now, though, 2.0 is quite decidedly vaporware.

So yes. As much as I love the Ruby language, I'm not sure that I'd trust it with too much heavy lifting just yet.

Posted to nerd at October 28, 2004 11:03 AM
Comments currently disabled due to spam. If you want to comment on a post, email me, and I'll try to incorporate your feedback somehow.
Trackbacks <http://fishbowl.pastiche.org/mt-tb.cgi/599>

links for 2004-11-20: GDS Tips (categories: desktop google) Manageability - Google Desktop Internals Exposed! (categories: desktop google) I hack stuff google hacking database (categories: google googledorks) Jon Udell: RESTful Flash (categories: flash) The Fishbowl: Ruby ...

From: Ms Robot0 at November 20, 2004 11:15 PM
Comments

Just from curiousity, what sort of times do you get from the Ruby bindings of libxml2 (http://xmlsoft.org/python.html)?

Mark Pilgrim for one thinks it is suitably speedy - http://diveintomark.org/archives/2004/02/18/libxml2

Posted by: Andy Todd at October 28, 2004 11:27 AM (#link)

libxml is very fast, and I assume calling it from the Ruby interpreter wouldn't change that fact very much. The problem is, though, if you have to drop down to coding in C (or mess around with installing C shared libraries) to get reasonable performance, that's taking a lot of the fun out of using Ruby in the first place.

Posted by: Charles Miller at October 28, 2004 11:39 AM (#link)

Hm, I know that this comparison is unfair but anyways. I've downloaded big.xml file from the xml-xerces/metrics project (4.5M) and these are my benchmark results:

java:

$ time java -cp dom4j-1.5.jar:. Test
root name: spec

real 0m3.135s
user 0m2.660s
sys 0m0.320s

ruby + libxml:

$ time ruby test.rb
root name: spec

real 0m1.278s
user 0m1.040s
sys 0m0.180s

Posted by: Kent at October 28, 2004 01:48 PM (#link)

"The problem is, though, if you have to drop down to coding in C (or mess around with installing C shared libraries) to get reasonable performance, that's taking a lot of the fun out of using Ruby in the first place."

Hmm... I don't know. No matter how good the algorithm is, I think XML parsing on a dynamic language such as Ruby or Python will always be slow. I've been using libxml2 in Python for a long time now, and I haven't had any problem with it yet.

Posted by: Jonas Galvez at October 28, 2004 02:55 PM (#link)

A 50 factor is reasonable. Matz said that if you want extreme performance you should resort to C (or fortran, fwiw).
It's the same for every non jitted dynamic language (squeak is fast but still really slow compared to VW).

Anyway, I don't really care about dropping down to C (and sometime you can avoid C coding bindings in pure ruby with the Dl module), but I obviously agree that having a faster engine would be nice.
In thee end, feel free to help with the YARV development it is already 50%-300% faster than ruby1.8 for the things it does :)

Posted by: gabriele at October 28, 2004 07:37 PM (#link)

I meant reasonable as 'expected' sorry

Posted by: gabriele at October 29, 2004 12:34 AM (#link)

Comment test

Posted by: Charles Miller at December 3, 2004 01:08 AM (#link)
Post a comment
(Real Names, Please)





(Leave blank if you want)


Remember personal info?

(HTML will be stripped, URLs automagically turned into links)