OPML is the Outline Processor Markup Language - an XML dialect invented by Dave Winer as a serialization format for outlines created in Userland applications such as Frontier. Winer continues to evangelize the format, and it has made its way into a number of applications: for example many RSS readers use it as an export format for feed subscriptions.
Occasionally, someone will come up with a problem that looks vaguely outline-like: "I need to store data nested inside other data", and suggest OPML as a possible solution. Predictably, the next thing you will hear is the pained cry of a large number of developers shouting "Please, God no."
The reason for this is that OPML, as specified, is a non-format. It's the alluring vapor of a specification that isn't there. Here's a simple demonstration.
The OPML specification can be found here. It defines the following:
- A top-level XML element: <opml>
- A <head> section, containing a title, and a number of presentational elements1
- A <body> section, containing one or more <outline> elements. <outline> elements can be nested
- Four "standard" attributes of the <outline> element, being:
text, containing arbitrary text: the 'content' of the outline nodetype, containing arbitrary text describing, in some way that isn't defined in the specification, how a processor should interpret the nodeisCommentandisBreakpoint, which describe functionality specific to Frontier.
To allow the format to be flexible and extensible, OPML producers can add arbitrary attributes to outline elements. While types and attributes are arbitrary, the specification does not provide implementors a mechanism for finding out the meaning of either.
Here is a sample <body> section of an OPML document, cribbed from various sources to show different 'types' of outline.
<body>
<outline text="Here is a Podcast that
I found today" created="Tue,
10 May 2005 17:30:20 GMT" type="link"
url="http://www.example.com/blah.mp3"/>
<outline type="heading" text="This is my blogroll"
created="Sun, 02 Oct 2005 04:18:09 GMT">
<outline text="Bruce's Weblog" type="rss"
xmlUrl="http://www.example.com/rss.xml"/>
</outline>
</body>
You'll see above: three different 'type' values (remember, types are arbitrary strings), three different sets of attributes (also arbitrary). Let's play a game, and make a very simple transformation to the above document fragment:
<body>
<link created="Tue, 10 May 2005 17:30:20 GMT"
url="http://www.example.com/blah.mp3">
<text>Here is a Podcast that I found today</text>
</link>
<heading created="Sun, 02 Oct 2005 04:18:09 GMT">
<text>This is my blogroll</text>
<rss xmlUrl="http://www.example.com/rss.xml">
<text>Bruce's Weblog</text>
</rss>
</heading>
</body>
Given the OPML spec, and the above examples, we can now ask ourselves a simple question: What is the difference between accepting OPML, and accepting arbitrary XML documents of unknown formats?
Answer: An OPML document limits where you can put text nodes.
That's pretty much the only difference. Semantically speaking, there's no difference between <outline type="blah"> and just <blah>, and given the complete lack of specification or limitations on element attributes, that's all OPML is: an arbitrary XML document with limitations on where text nodes can go. The supposed value of OPML — that it defines an outline — is an illusion. An outline is stuff nested inside other stuff. So's XML.
Any interoperability between OPML documents is the result of largely undocumented conventions. Essentially, it comes down to the fact that a limited number of applications (mostly from the same set of vendors) produce OPML. So in order to process OPML, you just familiarise yourself with those vendors' conventions and choke as gracefully as possible on everything you don't recognise.
No wonder that potential implementors throw up their hands in despair. Imagine, if you will, the following conversation:
Manager: I want the product to accept XML documents.
Developer: You... what? But XML is just a format, how am I going to know what the documents mean?
Oh, that's easy. Here's some examples of some XML documents I've found on the web, work it out from them.
But how can I be sure I'm understanding them properly? What happens when someone gives me a document I don't understand? What if two people come up with documents that look similar, but follow different conventions?
Oh, we'll cross that bridge when we come to it. I'm sure you're clever enough to deal with these things.
1 Many criticisms of OPML get sidetracked with how bad the presentational data in the header is2. However, given the larger problems with the OPML non-standard, all these complaints are trivial.
2And it's pretty bad. For example, to understand how to serialize node expansion states, you need to understand what "navigate flatdown X times and expand" means. And if it means what I think it does, then every time you expand or close (or move, add or delete) a node, you have to re-calculate the expansion states for the whole document below that node.
"It's the alluring vapour of a specification that isn't there"
The sheer number of other things that sprung to my mind when I read this was saddening. MDA topped the list. In fact, you could pretty much just drop OPML and put in MDA and you’d have a post that hits all the exact same points (not to mention a tone of other things, although I’m primarily thinking about things in the industry I work in here, so when I spout XMSF I don’t expect people to know what I’m talking about – and with good reason).
Perhaps the template argument you have created is the single common thread that binds them all together? Perhaps you have just created a specification for CBDML (being Charles’ Bull**** Detection Markup Language)?
Tim: Mike Cannon-Brookes often talks about "boxes and lines architecture", which I think is a superset of most of these problems.
Pretty much any computing problem, given a sufficient level of abstraction, can be reduced to a diagram of boxes joined together with lines. At this level your solution will look startlingly simple, and you'll be able to sell it to someone.
FWIW, I’ve proposed elsewhere that aggregators should support XBEL for blogroll exchange. That’s the plan for mine, when it loses its vapourware status.
Am I the only one who can't help thinking OPML really stands for "Other People's Markup Language?"
Charles, I came to pretty much the same conclusion about 3 years ago ;-)
http://dannyayers.com/xmlns/om/OPML-tech.html
At the time I was contemplating making a more useful alternative, but having since released that XHTML can do everything OPML can (see XOXO), it doesn't seem necessary. I have ranted into space numerous times since then about the format, but still some people choose to use it, and folks like Scoble evangelise it. So now when I see the stuff I feel more like "bah, let 'em get on with it, there's always XSLT".
Those guys must work in my office!
When talking of OPML's usage for blogroll, I always thought that two more elements "Author" and "Date of addition to the list" would have been valuable, but then OPML were never made for blogroll, were they?.
Debashish, OPML was made for everything and nothing.
Except, of course for the presentational markup in the head. Which is application specific.
Of course.
As an end user, I don't get OPML at all. To me it seems like a cacky way to do something that ought to be relatively simple and which I find is adequately handled in Brainstorm (http://www.brainstormsw.com/) albeit from a different perspective. (Disclosure: I provided feedback to BrainStorm's inventors on usability)
The other problem seems to be that Winer is arbitrating what counts as 'OPML compliant' to himself. Surely this is a stupid idea in an age where improvement comes from sharing ideas?
But then I may get branded as clueless...