Dear XML Programmers...

To everyone involved in writing XML-based programming languages, and by this I mean those languages where XML is the primary syntax, not useful things like ECMAScript that just happen to be applied to XML...

Please stop.

XML, when you get down to it, is a really verbose way to represent Lisp S-Expressions. XML's expressiveness is well-designed to mark up text, which is what it was designed to do. Any programming language based on XML, however, will just end up looking like a really clumsy attempt to rewrite bits of Lisp with angle-brackets. The world really doesn't need any more of these.

In contrast:

In one of my personal, never-see-the-light-of-day projects. One of the things I was delaying writing for this program was the configuration file: I just had a TestLauncher class that got all the right objects together and launched the application with some hard-coded defaults.

Last night, I decided that enough was enough, and started throwing together an example XML configuration file that I could use as a basis to building a real configuration framework for the application. It's a reflex, you see. We're trained to think XML whenever we have to put data anywhere. When I was halfway through, I started thinking about all the dependencies I was about to introduce: the chain of “Commons-Digester depends on Commons-BeanUtils” and so on. I'd much rather have something more lightweight.

I'd already been thinking about the connection between XML and Lisp. I was also realising that in converting my ‘launcher’ into a configuration file, I wasn't writing something that was configuring my application so much as something that was programming it. And you can get a self-contained, standards-compliant Scheme implementation for Java in a 200k download. So as a thought experiment, I re-wrote the configuration file in pseudo-Lisp.

Not only was it significantly less bloated and more readable, it opened up a number of doors that just weren't conveniently open before. SISC code can call out to Java objects, which meant that rather than having a configuration file and a Java framework to interpret it, turn it into objects and then use the objects to configure the application, the configuration file could configure the application directly.

Eventually, I compromised: the configuration system would consist of two Scheme files: the configuration file itself, and a library file that is parsed before the configuration, that maps expressions in the configuration to calls on the Java objects themselves (plus a few helper functions along the way). That way, adding extra configuration options to the system simply means exposing them as Java methods, and then writing the glue code in Scheme to allow the config file to set them. I suspect (without evidence so far) this will be faster than the other way by an order of magnitude.

The two-file means that the configuration file itself will not have to contain anything that looks like a program, just a series of assertions like...

; set up the user database
(add-module UserDatabase "com.example.HibernateUserDB")
(UserDatabase config-file "hibernate.cfg")

; initialise the command framework
(command-search-order (
        "org.pastiche.commands"
        "org.pastiche.util.commands"))
(add-command "LoginCommand")
(add-command "SetPreferencesCommand")

No, I'm aware this isn't a new idea. I'm just surprised it's not done more often.

The dynamic typing of the Lisp system helps as well: the configuration file is flexibly glued to the program instead of nailed on: which means less work maintaining it as more modules are added, whatever their types are.

Note, I haven't tried this yet, I'm just getting the idea down in ones and zeros before attempting it. If it turns out to be a total mess, I'll be sure to blog my failure.

The Fishbowl

Dear XML Programmers...