October 28, 2004
Ruby Performance
If you read my blog with any regularity, you'll know that I really like the Ruby programming language. It's the language I feel most fits the way my brain works, and when I'm writing Ruby code, I feel happier than when I'm writing code in other languages.
That said, I have some pretty serious doubts about the ability of the Ruby interpreter to do real work in its current form.
Take, for example, a 25Mb XML file that I wanted to investigate the contents of. I thought it would be cool to load it up inside Ruby, because then I could use the interpreter to give me an interactive shell to play around with the contents of the document.
Anyway, first here's my baseline: loading the file into a DOM Document object using dom4j on my 1.5Ghz Powerbook:
Epiphany:/tmp cmiller$ time java -Xmx256M DomTest real 0m19.413s user 0m17.270s sys 0m1.070s
Now, it's Ruby's turn, using REXML to load a DOM tree:
Epiphany:/tmp cmiller$ ruby -v ruby 1.8.2 (2004-07-16) [powerpc-darwin] Epiphany:/tmp cmiller$ time ruby read.rb real 33m22.680s user 14m44.710s sys 1m3.670s
OK, that's a pretty huge difference. Ruby is a factor of fifty slower at parsing XML. At first I thought this might just be REXML's fault. It admits only to being "reasonably fast" on its homepage. Maybe there's just some pathological algorithm going on inside that particular library.
Then, on a hunch, I turned off Ruby's garbage collection with GC.disable and ran the test again.
Epiphany:/tmp cmiller$ time ruby read-nogc.rb real 45m20.425s user 5m52.830s sys 2m37.110s
The "real" time went up, partly because I was busy doing other stuff at the time, and partly because the amount of memory that was eaten up pushed my Powerbook into swap, but the microbenchmark seems to show that in the first run, Ruby was spending more than half its CPU time doing memory management.
Hopefully, this is something that will be fixed in Ruby 2.0. The plans are to move to bytecode compilation and generational GC, both steps in the right direction. For now, though, 2.0 is quite decidedly vaporware.
So yes. As much as I love the Ruby language, I'm not sure that I'd trust it with too much heavy lifting just yet.
Posted to nerd at October 28, 2004 11:03 AMlinks for 2004-11-20: GDS Tips (categories: desktop google) Manageability - Google Desktop Internals Exposed! (categories: desktop google) I hack stuff google hacking database (categories: google googledorks) Jon Udell: RESTful Flash (categories: flash) The Fishbowl: Ruby ...
From: Ms Robot0 at November 20, 2004 11:15 PMJust from curiousity, what sort of times do you get from the Ruby bindings of libxml2 (http://xmlsoft.org/python.html)?
Mark Pilgrim for one thinks it is suitably speedy - http://diveintomark.org/archives/2004/02/18/libxml2
Posted by: Andy Todd at October 28, 2004 11:27 AM (#link)libxml is very fast, and I assume calling it from the Ruby interpreter wouldn't change that fact very much. The problem is, though, if you have to drop down to coding in C (or mess around with installing C shared libraries) to get reasonable performance, that's taking a lot of the fun out of using Ruby in the first place.
Posted by: Charles Miller at October 28, 2004 11:39 AM (#link)Hm, I know that this comparison is unfair but anyways. I've downloaded big.xml file from the xml-xerces/metrics project (4.5M) and these are my benchmark results:
java:
$ time java -cp dom4j-1.5.jar:. Test
root name: spec
real 0m3.135s
user 0m2.660s
sys 0m0.320s
ruby + libxml:
$ time ruby test.rb
root name: spec
real 0m1.278s
user 0m1.040s
sys 0m0.180s
"The problem is, though, if you have to drop down to coding in C (or mess around with installing C shared libraries) to get reasonable performance, that's taking a lot of the fun out of using Ruby in the first place."
Hmm... I don't know. No matter how good the algorithm is, I think XML parsing on a dynamic language such as Ruby or Python will always be slow. I've been using libxml2 in Python for a long time now, and I haven't had any problem with it yet.
Posted by: Jonas Galvez at October 28, 2004 02:55 PM (#link)A 50 factor is reasonable. Matz said that if you want extreme performance you should resort to C (or fortran, fwiw).
It's the same for every non jitted dynamic language (squeak is fast but still really slow compared to VW).
Anyway, I don't really care about dropping down to C (and sometime you can avoid C coding bindings in pure ruby with the Dl module), but I obviously agree that having a faster engine would be nice.
In thee end, feel free to help with the YARV development it is already 50%-300% faster than ruby1.8 for the things it does :)