Yesterday this account of a serious vulnerability in most major Java application servers crossed my Twitter feed a few times. The description, while thorough, is written in security researcher, so since it’s an important thing for developers to understand, I thought I would rewrite the important bits in developer.
What is the immediate bug?
A custom deserialization method in Apache commons-collections contains reflection logic that can be manipulated to execute arbitrary code. Because of the way Java serialization works, this means that any application that accepts untrusted data to deserialize, and that has commons-collections in its classpath, can be exploited to run arbitrary code.
The immediate fix is to patch commons-collections so that it that does not contain the exploitable code, a process made more difficult by just how many different libraries and applications use how many different versions of commons.
The immediate fix is also utterly insufficient. It’s like finding your first XSS bug in a program that has never cared about XSS before, patching it, and then thinking “Phew, I’m safe.”
So what is the real problem?
The problem, described in the talk the exploit was first raised in — Marshalling Pickles — is that arbitrary object deserialization (or marshalling, or un-pickling, whatever your language calls it) is inherently unsafe, and should never be performed on untrusted data.
This is in no way unique to Java. Any language that allows the “un-pickling” of arbitrary object types can fall victim to this class of vulnerability. For example, the same issue with YAML was used as a vector to exploit Ruby on Rails.
The way this kind of serialization works, the serialization format describes the objects that it contains, and the raw data that needs to be pushed into those objects. Because this happens at read time, before the surrounding program gets a chance to verify these are actually the objects it is looking for, this means that a stream of serialized objects could cause the environment to load any object that is serializable, and populate it with any data that is valid for that object.
This means that if there is any object reachable from your runtime that declares itself serializable and could be fooled into doing something bad by malicious data, then it can be exploited through deserialization. This is a mind-bogglingly enormous amount of potentially vulnerable and mostly un-audited code.
Deserialization vulnerabilities are a class of bug like XSS or SQL Injection. It just takes one careless bit of code to ruin your day, and far too many people writing that code aren’t even aware of the problem. Combine this with the fact that the code being exploited could be hiding inside any of the probably millions of third-party classes in your application, and you’re in for a bad time.
Your best fix is just not to risk it in the first place. Don’t deserialize untrusted data.
Mitigations
The mitigation for this class of vulnerability is to reduce the
surface area available to attack. If only a limited number of objects can be
reached from deserialization, those objects can be carefully audited to make
sure they’re safe, and adding a new random library to your system won’t
unexpectedly make you vulnerable. For example, Python’s YAML implementation
has a safe_load
method that limits object deserialization to
a small set of known objects, essentially reducing it to a JSON-like format.
Your best bet
in Java is not to use Java serialization unless you absolutely trust whoever
is producing the data. If you really want to use serialization, you can
limit
the objects available to be deserialized by overriding the
resolveClass
method on objectInputStream
.
This way you can ensure only objects you have verified are safe will be
populated during deserialization.
Or just don't use serialization for data transfer. Nine times out of ten, tightly coupling your wire format with your object model isn’t something future maintainers of your system are going to thank you for.
Edited November 9 to add the reference to the developerWorks Look-Ahead Deserialization article, after it was pointed out to me by a couple of different people.