I was experimenting yesterday with the Rome syndication feed parser, so I plugged it into my development copy of Javablogs, and ran an update against the 900-or-so registered feeds. Despite the claims on the Rome site that Informa's API is "too complicated to grasp and use", replacing Informa with Rome was almost entirely a matter of replacing class-names and renaming one or two method calls. Maybe we just don't use the complicated stuff.
Anyway, Rome worked fine. On one hand, I think there were two or three feeds that Rome parsed but Informa was barfing on, on the other hand, the Informa version we use in Javablogs is six months old.
What I did run into again was a problem I'd also reported to the Informa guys six months ago (and hacked in a quick and dirty solution locally), and that was subsequently fixed.
Once upon a time, when RSS 2.0 was young and the Internet was wild and free, there was a lot of discussion about adding an XML namespace to RSS 2.0. A URI was proposed (http://backend.userland.com/rss2), and for a while, even, Dave Winer's Scripting News RSS feed was published in this namespace.
Soon after, though, the namespace disappeared. The namespace proposal never showed up in the official spec. Nevertheless, it existed. For one brief shining moment, it existed; and that which is made can never truly be unmade. The RSS 2.0 namespace survives in the wild to this day, mostly (as far as I could determine) in various old versions of Blojsom templates.
you only live a day
but it's brilliant anyway.
—Elliot Smith, Independence Day
Which means that RSS parsers expecting RSS 2.0 elements to be in an empty namespace will die in weird and interesting ways.
This is why I think the Pilgrim
/Ruby1 approach to RSS parsing is the most effective. Don't write separate parsers for each RSS version. Throw the whole mess of tags into a pot and seive out the ones that look like they might be meaningful.
It's ugly, but at least it works in the wild.
1 Oops. I got the feed parser and feed validator confused. Sorry Mark.