What's Wrong with OPML

October 2, 2005 11:08 PM

OPML is the Outline Processor Markup Language - an XML dialect invented by Dave Winer as a serialization format for outlines created in Userland applications such as Frontier. Winer continues to evangelize the format, and it has made its way into a number of applications: for example many RSS readers use it as an export format for feed subscriptions.

Occasionally, someone will come up with a problem that looks vaguely outline-like: "I need to store data nested inside other data", and suggest OPML as a possible solution. Predictably, the next thing you will hear is the pained cry of a large number of developers shouting "Please, God no."

The reason for this is that OPML, as specified, is a non-format. It's the alluring vapor of a specification that isn't there. Here's a simple demonstration.

The OPML specification can be found here. It defines the following:

  • A top-level XML element: <opml>
  • A <head> section, containing a title, and a number of presentational elements1
  • A <body> section, containing one or more <outline> elements. <outline> elements can be nested
  • Four "standard" attributes of the <outline> element, being:
    • text, containing arbitrary text: the 'content' of the outline node
    • type, containing arbitrary text describing, in some way that isn't defined in the specification, how a processor should interpret the node
    • isComment and isBreakpoint, which describe functionality specific to Frontier.

To allow the format to be flexible and extensible, OPML producers can add arbitrary attributes to outline elements. While types and attributes are arbitrary, the specification does not provide implementors a mechanism for finding out the meaning of either.

Here is a sample <body> section of an OPML document, cribbed from various sources to show different 'types' of outline.

    <outline text="Here is a Podcast that
      I found today" created="Tue, 
      10 May 2005 17:30:20 GMT" type="link"
    <outline type="heading" text="This is my blogroll" 
      created="Sun, 02 Oct 2005 04:18:09 GMT">  
      <outline text="Bruce's Weblog" type="rss"

You'll see above: three different 'type' values (remember, types are arbitrary strings), three different sets of attributes (also arbitrary). Let's play a game, and make a very simple transformation to the above document fragment:

    <link created="Tue, 10 May 2005 17:30:20 GMT"
      <text>Here is a Podcast that I found today</text>
    <heading created="Sun, 02 Oct 2005 04:18:09 GMT">
      <text>This is my blogroll</text>
      <rss xmlUrl="http://www.example.com/rss.xml">
        <text>Bruce's Weblog</text>

Given the OPML spec, and the above examples, we can now ask ourselves a simple question: What is the difference between accepting OPML, and accepting arbitrary XML documents of unknown formats?

Answer: An OPML document limits where you can put text nodes.

That's pretty much the only difference. Semantically speaking, there's no difference between <outline type="blah"> and just <blah>, and given the complete lack of specification or limitations on element attributes, that's all OPML is: an arbitrary XML document with limitations on where text nodes can go. The supposed value of OPML — that it defines an outline — is an illusion. An outline is stuff nested inside other stuff. So's XML.

Any interoperability between OPML documents is the result of largely undocumented conventions. Essentially, it comes down to the fact that a limited number of applications (mostly from the same set of vendors) produce OPML. So in order to process OPML, you just familiarise yourself with those vendors' conventions and choke as gracefully as possible on everything you don't recognise.

No wonder that potential implementors throw up their hands in despair. Imagine, if you will, the following conversation:

Manager: I want the product to accept XML documents.

Developer: You... what? But XML is just a format, how am I going to know what the documents mean?

Oh, that's easy. Here's some examples of some XML documents I've found on the web, work it out from them.

But how can I be sure I'm understanding them properly? What happens when someone gives me a document I don't understand? What if two people come up with documents that look similar, but follow different conventions?

Oh, we'll cross that bridge when we come to it. I'm sure you're clever enough to deal with these things.

1 Many criticisms of OPML get sidetracked with how bad the presentational data in the header is2. However, given the larger problems with the OPML non-standard, all these complaints are trivial.

2And it's pretty bad. For example, to understand how to serialize node expansion states, you need to understand what "navigate flatdown X times and expand" means. And if it means what I think it does, then every time you expand or close (or move, add or delete) a node, you have to re-calculate the expansion states for the whole document below that node.

Previously: Word of the Day

Next: Serenity Review