Maven: Broken By Design

by Charles Miller on December 20, 2007

Responsiveness

We have a joke around the office, when a new hire is setting up their development environment for the first time. "Now we go to lunch while Maven downloads the Internet."

Software should be responsive. When you tell it to do something it should do it. Take a pristine Maven installation and tell it to build a Java application -- i.e. perform its most obvious, primary function -- and Maven will first say "Wait a moment, I have to download three dozen different components before I can even start doing what you asked me to do."

"Ah", you say. "That's how maven works. It's a modular system, you're just using it to build a Java app."

To which I reply: "Then don't give me a spoon and tell me to use it to cut steak."

The fact that all these components are downloaded separately and in serial mean the overhead of not including them is far greater than if they'd put them all (even plus a bunch more you might not need) in the core installation. Worse, though, this happens not when you're installing maven, but when you're trying to use it to do something else.

This process doesn't stop at installation. Far too often you'll run maven and it will gleefully traipse off through various repositories doing its own internal housecleaning before actually performing the build. It's like walking up to a shop assistant and asking him to help you find a book, only to watch him dust shelves for fifteen minutes first.

Reliability

Consider two pieces of software. One uses maven (and 1–n artifact repositories) to manage its dependencies, the other keeps all its dependencies in source-control. How many potential points of failure are involved in checking out and building each product?

Repeatability

Builds must be repeatable. If you check out a particular version of your code and build it with particular versions of your tools, you should get a product that is binary-identical each time. (Modulo things like compiled-in build dates, obviously)

Maven seems to try as hard as it can to prevent this. Files go missing from public Maven repositories and suddenly a whole swathe of historical versions of open source projects can't be built without hacking. ibiblio reorganises its directory layout and chaos ensues. Imagine what happens in ten years time when maven has been superceded by some new tool, public maven repository maintenance is an afterthought, and you desperately need to patch some legacy Java app?

For well-resourced projects, the solution is to maintain your own repository and ensure all your dependencies will be available from it, for all time, but even that won't help you if you suddenly need the compile-time dependencies of a project you previously only used as a binary.

(A number of responses to this blog post have assumed that all my problems could be solved by a locally maintained repository, and/or a repository proxy. We have both. They help to some extent, but they in no way solve the problem completely. All they are is a band-aid over the fundamental issue, and once again, additional potential points of failure.)

Most open-source projects just assume their dependencies will continue to exist in "the cloud" for eternity.

Plugins are the worst culprit. Since the core of maven exists solely to download an Internet's-worth of plugins to do the heavy lifting, and maven has a nasty habit of upgrading those plugins without any user-prompting whatsoever, builds can be crippled by some well-meaning committer "fixing" some piece of functionality. I'm told this has been fixed recently (or will be fixed soon) but versions should not be a moving target. v. 2.0.1 of your build tool should be v. 2.0.1 of your build tool. Forever.

Case Study: Dependency Management

One of the big strengths of Maven 2 is supposed to be the way it manages dependencies, including transitive dependencies. So if jar A requires jar B which requires jar C, Maven will sort all this out for you.

Tracking down dependencies and sorting out their transitive relationships is a tricky task, but it's a tricky task you only ever have to do when you modify your dependencies. Maven, on the other hand, wants to do this job every time you build, which adds a huge responsiveness overhead, as the "pom" definition files of each dependency must be retrieved and analysed alongside their jars.

Dependencies may live in a number of different repositories, and these repositories are out of the control of the user, especially in the case of maven-built open-source projects that almost universally rely on the public ibiblio, apache and Codehaus repositories. This impacts both reliability, as all these repositories must be available, and repeatability, as changes to the repositories may have catastrophic effects on the build.

Reliability problems also creep in because maven, forced to do dependency resolution in each build, must hide a lot of what it's doing from view lest it overwhelm the user even more. Conflicting transitive dependencies are resolved implicity, and you have to make a concerted effort (with clumsy tools) to manually find out what was going on.

Paradoxically, by trying to make dependency management easy, maven makes it incredibly hard. It becomes dangerously easy for a project to accumulate dependency cruft —— at best unnecessary, at worst conflicting —— and excruciatingly painful to remove them.

Conclusion

Maven 2 performs a difficult task, and there are a lot of moving parts — plugins, proxies, repositories — between typing mvn install on the command-line and getting a working system. But there has to be something fundamentally wrong with any tool that, whenever I use it, seems to have at least a 50% chance of completely fucking up my day.

Previously: On the re-appearance of the OS X Java 6 Developer Preview

Next: Merry Christmas