Reverse Engineering Is Not Theft

by Charles Miller on September 4, 2003

Norman Richards has already blogged about the "Ethics of Decompilation" thread currently clogging up the Apple java-dev mailing-list. A programmer wrote to the list saying that he had decompiled some code to see how it worked, and wondered what the list-members thought were the ethics of the situation.

The list exploded, much of it outrage, and most of it completely failing to understand where Copyright law ends, and the owner's rights begin.

Copyright does not prohibit reverse-engineering, except (thanks to the DMCA) where the thing being reverse-engineered is a copy-prevention mechanism. This is because copyright law is all about the making and distributing of copies. Under regular property law, once you have bought something, you're perfectly within your rights to take it apart and see how it works.

Of course, most software doesn't subject itself to such things. Thanks to a massive land-grab early in our industry's existence, back before anyone really knew what "software" was, nobody actually buys software. Instead, you licence the use of it under ludicrously draconian terms. Anyone who reads an EULA closely (and nobody really does) and looks at the masses of restrictions and arbitrary termination clauses ends up wondering whether they've bought anything substantial at all.

Hence what we normally call "IP" in the non-Free software industry is really nothing to do with IP law, and everything to do with contract law. It is these contracts that usually prohibit reverse-engineering. Some countries actually have laws that limit contracts so they can't prevent revese-engineering. Your mileage may vary.

Regardless, reverse-engineering is not "theft", as theft implies a breach of property law. At worst, it's "breach of contract". That, however, is only the law, which is orthogonal to ethics. Is decompiling a program to see how it works unethical?

Novels are, mostly, copyright material. Every author, however, has in his1 time read an enormous number of books, and incorporated their "source code" into his body of knowledge. When that author comes to write another book, he can't help but use the things he's learned about the craft of book-writing from reading the copyrighted works of others.

Does that mean any author must keep careful track of everyone he's read so that he can apportion royalties? Of course not. So long as he's not lifting the words directly from another book, we recognise that it's neither theft, nor un-ethical. Even "homage" is acceptable. My domain pastiche.org, is named after the technique of creating art deliberately in the style of another artist, one of the common techniques of post-modernist era2.

English departments at universities spend inordinate amounts of time examining the techniques that went into writing copyrighted works, and are even allowed to write their own books on the subject! And it's generally accepted that doing this benefits the art of writing.

My brother is a playwright. He has a couple of books on screenwriting on his shelf. Each is full of examples of various story-telling techniques that have been used in movies (which are copyrighted). This is perfectly legitimate, and further, if they include snippets of movie "source-code" (scripts) in the book as examples, that's considered fair-use for the purpose of education.

Frankly, I think developers should be encouraged to decompile code they find interesting. So long as they use that knowledge to learn the underlying techniques, and don't then do a cut-and-paste job into their own code, they're not doing anything unethical.

Ultimately, I think that the big hue and cry over decompilation (and the related fetish for bytecode obfuscators) is a case of "methinks the lady doth protest too much." The fact is, most code doesn't do anything particularly interesting or original. Software is the result of an investment of time. It may contain some nifty techniques here or there, but overall it's not going to damage the value of the product if those techniques are publicly known.

For 99% of the code out there, the effort required to decompile it and then understand what you've just decompiled is greater than that which was required to write it in the first place.

There are a few, very few, exceptions. If you've come up with something that is truly valuable and innovative, copyright is not the right tool. Copyrights protect works, not ideas. Patents protect ideas, and the first premise of a patent is that you publish the idea, not hide it. Then again, the sad state of software patents is fodder for a lot more abuse than praise.

If I had my way, all software would come with source. Some forward-thinking companies do this already. Many programming environments pretty much force you to provide the source anyway. Having the source allows you more freedom to customise the product, fix it if it's going wrong, and offers you some slight protection if it ever becomes abandonware. Short of reaching this utopia, I'm going to hang on to my decompiler until you pry it from my cold, dead hands.

Dan would later learn that there was a time when anyone could have debugging tools. There were even free debugging tools available on CD or downloadable over the net. But ordinary users started using them to bypass copyright monitors, and eventually a judge ruled that this had become their principal use in actual practice. This meant they were illegal; the debuggers' developers were sent to prison. -- Richard Stallman: The Right to Read.

1 Yes, I'm using the masculine pronoun. It's just convenient.
2 I chose the domain-name when I was at university, after having the word hammered into my brain in every lecture of a modern literature course.

Previously: Social Technology: A Parable

Next: Bayesian Filtering: The Spam Fights Back