11
Jan

Douglas Adams, 1978

Ford: They make a big thing of the ship’s cybernetics. “A new generation of Sirius Cybernetics Corporation robots and computers, with the new GPP feature.”

Arthur: GPP? What’s that?

Ford: Er… It says Genuine People Personalities.

Arthur: Sounds ghastly.

F/X: DOOR HUMS OPEN WITH A SORT OF OPTIMISTIC SOUND.

Marvin: It is.

Arthur: W… What?

Marvin: Ghastly. It all is — absolutely ghastly. Just don’t even talk about it. Look at this door. “All the doors on this spacecraft have a cheerful and sunny disposition. It is their pleasure to open for you, and their satisfaction to close again with the knowledge of a job well done!”

F/X: DOOR CLOSES WITH A SATISFIED SIGH

Marvin Hateful, isn't it?

Facebook, 2015

Facebook now inserts a jaunty “Good afternoon, Charles!” in my timeline.

Everybody knows Facebook is creepy. Nonetheless, all this time it never occurred to me to delete my account until it began doing this: Trying to act like a person. Pretending we are on a first-name basis. — Leigh Alexander, The New Intimacy Economy

To get “software with a personality” right, the personality has to be recognisably human. It needs to be the people who made the software shining through their creation, not painting themselves on top of it.

The bigger and more impersonal the software, the more subversive the personality needs to be. It needs to be something a manager would have said no to if they’d known about it before it shipped, not something they figured might make the product play better to Millennials. A spreadsheet that asks if you’ve had a nice day feels like a creepy marketing ploy. A flight simulator Easter Egg is a human being trying to reach you from behind the code.

Turning 101000

  • 7:00 PM

Happy birthday to me
Happy birthday to me
I've been beating this joke to death since 2002
Fuck fuck fuck fuck fuck fuck.

Sithsplaining

  • 6:48 PM

Leia suspects there's a tracking device on the Millennium Falcon and yet they fly straight to Yavin 4 anyway... — @msharp

Many things in Star Wars don’t make sense, but this one turns out to be pretty straightforward. Leia is badass.

Senator/Princess Leia Organa doesn’t know she is going to be rescued from captivity on the Death Star moments before she is scheduled to be executed, but when it happens, and when the ship she is rescued in is allowed to get away suspiciously easily, she thinks on her feet.

She knows the moons of Yavin have no sentient inhabitants outside the rebel base, and after seeing Alderaan blown up she wants the Death Star as far away from civilian populations as she can get it.

OK, there were two primitive pre-spaceflight species on Yavin 13, but the Empire was unlikely to pay them any notice.

She knows the clock is ticking on the value of the plans she stashed away in R2D2. The Empire knows exactly what was stolen, and it is only a matter of time before they do exactly the same analysis that the Rebellion wants to do, leaving the Rebellion with the embarrassing prospect of showing up to bomb an exhaust port that was already closed for emergency maintenance.

She has seen that Admiral Moff Tarkin is drunk on the power trip he gets from being in charge of a moon-sized death machine. She knows that given the choice between sending a couple of Star Destroyers to take out the Alliance base, and blowing them up personally with his planet-killer, he’s going to choose the Big Round Fucking Laser.

But she knows she doesn’t want to give them too much time to think and maybe come up with a proportional, sensible response.

So Leia figures either they’ll find something useful in the Death Star plans or they won’t. If they don’t, they’ve got a pretty hairy evacuation in their future and they’ll need to find a new base, but they’ve at least got advance warning the Empire is on its way. If they find something though, this is their best and possibly only chance to get the Death Star to come to where they are tactically strongest, without the rest of the Imperial fleet getting in the way.

And she thinks this through in the time it takes to tell Han what course to plot. Fuck yeah Leia.

Yesterday this account of a serious vulnerability in most major Java application servers crossed my Twitter feed a few times. The description, while thorough, is written in security researcher, so since it’s an important thing for developers to understand, I thought I would rewrite the important bits in developer.

What is the immediate bug?

A custom deserialization method in Apache commons-collections contains reflection logic that can be manipulated to execute arbitrary code. Because of the way Java serialization works, this means that any application that accepts untrusted data to deserialize, and that has commons-collections in its classpath, can be exploited to run arbitrary code.

The immediate fix is to patch commons-collections so that it that does not contain the exploitable code, a process made more difficult by just how many different libraries and applications use how many different versions of commons.

The immediate fix is also utterly insufficient. It’s like finding your first XSS bug in a program that has never cared about XSS before, patching it, and then thinking “Phew, I’m safe.”

So what is the real problem?

The problem, described in the talk the exploit was first raised in — Marshalling Pickles — is that arbitrary object deserialization (or marshalling, or un-pickling, whatever your language calls it) is inherently unsafe, and should never be performed on untrusted data.

This is in no way unique to Java. Any language that allows the “un-pickling” of arbitrary object types can fall victim to this class of vulnerability. For example, the same issue with YAML was used as a vector to exploit Ruby on Rails.

The way this kind of serialization works, the serialization format describes the objects that it contains, and the raw data that needs to be pushed into those objects. Because this happens at read time, before the surrounding program gets a chance to verify these are actually the objects it is looking for, this means that a stream of serialized objects could cause the environment to load any object that is serializable, and populate it with any data that is valid for that object.

This means that if there is any object reachable from your runtime that declares itself serializable and could be fooled into doing something bad by malicious data, then it can be exploited through deserialization. This is a mind-bogglingly enormous amount of potentially vulnerable and mostly un-audited code.

Deserialization vulnerabilities are a class of bug like XSS or SQL Injection. It just takes one careless bit of code to ruin your day, and far too many people writing that code aren’t even aware of the problem. Combine this with the fact that the code being exploited could be hiding inside any of the probably millions of third-party classes in your application, and you’re in for a bad time.

Your best fix is just not to risk it in the first place. Don’t deserialize untrusted data.

Mitigations

The mitigation for this class of vulnerability is to reduce the surface area available to attack. If only a limited number of objects can be reached from deserialization, those objects can be carefully audited to make sure they’re safe, and adding a new random library to your system won’t unexpectedly make you vulnerable. For example, Python’s YAML implementation has a safe_load method that limits object deserialization to a small set of known objects, essentially reducing it to a JSON-like format.

Your best bet in Java is not to use Java serialization unless you absolutely trust whoever is producing the data. If you really want to use serialization, you can limit the objects available to be deserialized by overriding the resolveClass method on objectInputStream. This way you can ensure only objects you have verified are safe will be populated during deserialization.

Or just don't use serialization for data transfer. Nine times out of ten, tightly coupling your wire format with your object model isn’t something future maintainers of your system are going to thank you for.

Edited November 9 to add the reference to the developerWorks Look-Ahead Deserialization article, after it was pointed out to me by a couple of different people.

My friends on Facebook are generally a tech-literate and cynical bunch, so the ratio of people who fell for the recent spate of “Repost this legalese to regain control of your content” chain-mail hoaxes vs the people who have posted sarcastic reactions to it is about one to twelve.

And that bugs me.

We (the tech industry, but more broadly society) have created these Internet agoras. To members, these sites are vital means of maintaining contact with friends and loved ones, of not feeling left out of important parts of their lives. But the same people will grasp at the most tenuous of straws if it gives them a slight hope that they might claw back some sense of ownership, safety and control.

Every time a social media site changes its defaults, loosens its privacy settings or tightens its licensing, we tend to take lack of action by its members as tacit acceptance that privacy and ownership just don't matter. Hoaxes like this tell us otherwise. People feel trapped and helpless in a complex, baffling system. They want a way to assert control over their online lives, and they don't understand why it's not as simple and obvious as saying “I wrote this. I took these photos. They are mine.”

  1. Write first draft
  2. Publish
  3. Find a dozen things wrong with published post, frantically fix them before too many people read the article.
  4. Re-publish
  5. GOTO 3

Number of post-publication edits for this post: 4

Remember back in 2003 when blogging was going to take over the world? When we were writing odes to blogging, building popular tools to map the blogsphere, actually using the word blogosphere with a mostly straight face, and wringing our hands over every new entrant in the field and every Google index update?

Sure, the component parts of blogging are everywhere now. The Internet is drowning in self-publishing, link-sharing, articles scrolling by in reverse-chronological order. It's no coincidence that the most popular CMS on the public Internet, by a pretty ridiculous margin is a blogging platform.

But somewhere around a decade ago, the soul of blogging died. The heterogeneous community using syndication technologies to create collaboratively-filtered networks of trust and attention between personally-curated websites, forming spontaneous micro-communities in the negative space between them? That’s the thing we were all saying would take over the world, and instead “blogging” dwindled back to being a feature of corporate websites, a format for online journalism, and a hobby of techies who like running their own web pages.

Going back over fourteen years of my own blog history was an interesting lesson in how this blog changed over the years. There are entire classes of post that filled the pages of this site in 2002, but that were not to be seen five years later. Some of this was due to me changing behind the blog. Many were due to the Internet changing around it.

So what happened to blogging?

Digg stole its community.

And then reddit and Hacker News, but Digg did it first.

There were popular public link aggregators before Digg, but they were either heavily curated (Slashdot was, more than anything, a blogging pioneer) or deafeningly self-important.

Kuro5hin demanded users share substantial things they wrote themselves, everything else was “Mindless Link Propagation”. Digg took MLP and changed the shape of the Internet with it.

In doing so, Digg created a devoted platform for one of the core activities, and most common entry-points of blogging: holding conversations about things written elsewhere. Their platform was far easier to get involved in, far easier to set up, and solved that one big question of blogging newbies: “How do I get anyone to even read what I’m writing?” with centralisation and gamification.

Bloggers didn't jump ship for Digg, but equally Digg didn't contribute to blogging. Visitors from aggregation sites notoriously never looked deeper into the sites they were visiting than the single article that was linked, and the burst of syndication subscribers a blogger would normally get if one of the hubs of their community linked to them just never came from aggregation sites.

Bloggers did, however, find themselves having to take part in these communities. At first because more often than not aggregators were where the conversation was happening about the things they were writing, and writing about. Later, because they’re where readers come from. For many people trying to make money writing on the Internet today, links from reddit are how you survive.

For their part, aggregation site users tend to hold bloggers in the lowest of low esteem, even when linking to them. Blogging is narcissistic. Who are they to remain aloof from the community like that, to share links and posts on their own website instead of contributing them to the centralised collective?

It is this sense of community that even turned some aggregators into creators, beyond the surfacing of links or crowdsourced comments about them. Like “Ask Slashdot” before it, some of the most popular communities on reddit are built around user-contributed posts. Overall, though, links still rule the site.

Users of aggregators tend to reserve their greatest vitriol for sites that aggregate or republish things from their website, whether it be something that was original to the site, or even if it’s just a link they found “first”. For sites built around monetising other sites’ labour, aggregator users get mighty tetchy when the same thing is done to them.

Twitter stole its small-talk.

Bloggers might not have jumped ship for aggregators, but they dove into Twitter head first.

It takes a lot of time and inspiration to write a long-form article, so most blogs filled the gaps between with links, funny pictures they had found around the Internet, short pithy commentary, snippets of conversation, interesting quotes, jokes, and in one case from a blogger now worth more money than you can count, an enthusiastic two sentence review of the porn site “Bang Bus”.

With Twitter you could do that on your phone, have it pushed to your friends/subscribers in real time, and have the same done back to you with equal ease. It wasn't even a competition.

Twitter still has the “How do I get people to notice me?” problem, and later developed the even more disturbing “How do I get people to stop noticing me?” problem, but that didn't stop it sucking the remaining air out of the blogosphere in the course of surprisingly few months.

What about Facebook, Instagram, Pinterest and the like? Well, from my perspective they weren't so much the successors to blogging as they were the successors to Livejournal.

Tumblr stole its future.

A curmudgeon might say I should also file Tumblr under “successors to Livejournal”, but I disagree. Tumblr sites tend far less towards being amorphous personal diaries aimed square at the author’s existing social network, and far more towards expressing the author’s interests in public, and joining the larger community that arises around them.

From one perspective, Tumblr is blogging. At today’s count they host 244 million blogs making a total of 81 million posts per day. That’s about four posts per year for every human being on Earth. Users can contribute their own posts, but just as importantly they can reblog and comment, forming spontaneous, distributed communities of interest around (and in the spaces between) the things they share from others.

From another perspective, Tumblr stole blogging. The syndication and sharing tools, the communities built within Tumblr, everything stops dead at the website's border. The tools seem almost contemptuous of the web as it exists outside Tumblr. To quote JWZ:

[Tumblr pioneered] showing the entire thread of attributions by default, and emphasizing the first and last -- but stopping cold at the walls of the Tumblr garden. To link to an actual creator, you have to take an extra step, so nobody bothers.

These may seem like small glitches, but the aggregate effect is huge. They’re what makes the “Tumblr Community” a real thing people talk about in a way you'd never hear about, say, people who happen to host their sites with Wordpress.

Centralisation and lock-in won.

In the end, the distributed, do-it-yourself web was just too hard. Not just for newcomers facing a mountainous barrier to entry, but even to incumbents looking to shave a few sources of frustration from their day. Just ask anyone who excitedly built RSS/Atom syndication into their product in the early 2000s, only to deprecated the feature gradually into the power-user margin over the ensuing decade.

In every case, a closed, proprietary system took some ingredient of the self-publishing crack bloggers discovered in the early 2000s and distilled it into a product that was easier to use, and that people were willing to adopt even though it meant losing the freedom of openness, interoperability and owning your own words.

Leaving behind a landscape of those for whom that sacrifice either was not commercially attractive, or those of us who are just sufficiently set in our ways that the idea of not running our own website feels alien.

Deletionism

  • 1:58 PM

Ask me ten years ago, and I'd say a blog entry, once published, should remain that way. Oh wait, I actually did say that:

I try never to delete anything substantive. Attempting to un-say something by deleting it is really just a case of hiding the evidence. I'd much rather correct myself out in the open than pretend I was never wrong in the first place.

The reasons not to delete come down to:

  • Not wanting to break the web by 404-ing a page
  • Wanting to be honest about what you’ve said in public
  • Keeping a record of who you were at some moment in time.

The counter-arguments are:

  • The web was designed to break. And anyway, the stuff worth deleting is usually the stuff nobody’s linking to.
  • Just how long does a mea culpa have to stand before it becomes self-indulgent?
  • Unless you’re noteworthy and dead, or celebrity and alive, the audience for your years-old personal diaries is particularly limited.
  • Publishing on the web isn’t just something you do, and then have done. It’s an ongoing process. A website isn’t just a collection of pages, it’s a work that is both always complete, and always evolving. And every work can do with the occasional read-through with red pen in hand.

That last point is the most compelling one. I was publishing a website full of things that, however apt they were at the time to the audience they were published for, just aren’t worth reading today.

So to cut a long story short, last weekend I un-published about 700 of the previously 1800 posts on this blog; things that were no longer correct, things that were no longer relevant, things that were no longer interesting even as moments in time, and things that I no longer feel comfortable being associated with. I don't think anything that was removed will be particularly missed, and as a whole the blog is a better experience for readers without them.

The weirdest thing about deleting 700 blog posts is realising you had 1800 to start with. Although to be fair, 1750 of them were Cure lyrics drunk-posted to Livejournal.

Under the hood

It's a testament to the resilience of Moveable Type that in the eleven years since I first installed it to run this blog, I've upgraded it exactly twice. If I’d tried that with the competition, I doubt I’d have had nearly as smooth a ride.

Moveable Type got me through multiple front-page appearances on Digg, reddit, Hacker News and Daring Fireball without a hitch, or at least would have if I hadn't turned out to be woefully incompetent at configuring Apache for the simple task of serving static files.

But as they say, all good things must come to end. Preferably with Q showing up in a time travel episode.

I replaced Moveable Type with a couple of scripts that publish a static site from a git repo, fully aware that I’m doing this at least five years after it became trendy. The site should look mostly identical, except comments and trackbacks haven't been migrated. They’re in the repo, but I'm inclined to let them stay there.

Look, bad things happen to people in fiction just like bad things happen in real life. And at least the people in fiction aren't real so it didn't really happen to them.

I get that.

And you can have great entertainment where bad things happen to bad people, or bad things happen to good people, or bad things happen to indifferent people who just happened to be in the wrong place at the wrong time.

I get that too.

But at some point you find yourself sitting on a couch watching a drawn-out scene where a child is burned alive screaming over and over for her parents to save her, and you think “Why the fuck am I still watching this show?”

Bad things happen in real life. Bad things have happened throughout history. So what, I'm watching television. If I wanted to experience the reality of a brutal, lawless campaign for supremacy between tribal warlords, there are plenty of places in the world I could go to see that today. I wouldn't survive very long, but at least I'd get what I deserved for my attempt at misery tourism.

Bad things happen in good drama, too. But drama comes with a contract. The bad things are there because they are contributing to something greater. Something that can let you learn, or understand, or experience something you otherwise wouldn't have; leading you out the other side glad that you put yourself through the ordeal, albeit sometimes begrudgingly.

To refresh our memories, here's how George R. R. Martin explained the Red Wedding:

I killed Ned in the first book and it shocked a lot of people. I killed Ned because everybody thinks he's the hero and that, sure, he's going to get into trouble, but then he'll somehow get out of it. The next predictable thing is to think his eldest son is going to rise up and avenge his father. And everybody is going to expect that. So immediately [killing Robb] became the next thing I had to do.

There are increasingly flimsy justifications for the horrors of Game of Thrones. They motivate character A. Or they open up space for character B. But in the end it's obvious that it's really about providing the now-mandated quota of shock, and giving the writers some hipster cred for subverting fantasy tropes.

I did not enjoy watching Sansa Stark’s rape. I did not enjoy watching Shireen Baratheon burned at the stake.

If that's what you want to watch TV for, go for it. But I'm out.

Seen on Twitter:

Either and Promises/Futures are useful and I’ll use them next time they’re appropriate. But outside Haskell does their monad-ness matter?

All code below is written in some made-up Java-like syntax, and inevitably contains bugs/typos. I'm also saying "point/flatMap" instead of "pure/return/bind" because that's my audience. I also use "is a" with reckless abandon. Any correspondance with anything that either be programatically or mathematically useful is coincidental

What is a monad? A refresher.

A monad is something that implements "point" and "flatMap" correctly.

I just made a mathematician scream in pain, but bear with me on this one. Most definitions of monads in programming start with the stuff they can do—sequence computations, thread state through a purely functional program, allow functional IO. This is like explaining the Rubiks Cube by working backwards from how to solve one.

A monad is something that implements "point" and "flatMap" correctly.

So if this thing implements point and flatMap correctly, why do I care it's a monad?

Because "correctly" is defined by the monad laws.

  1. If you put something in a monad with point, that's what comes out in flatMap. point(a).flatMap(f) === f(a)
  2. If you pass flatMap a function that just points the same value into another monad instance, nothing happens. m.flatMap(a -> point(a)) === m
  3. You can compose multiple flatMaps into a single function without changing their behaviour. m.flatMap(f).flatMap(g) === m.flatMap(a -> f(a).flatMap(g))

If you don't understand these laws, you don't understand what flatMap does. If you understand these laws, you already understand what a monad is. Saying "Foo implements flatMap correctly" is the same as saying "Foo is a monad", except you're using eighteen extra characters to avoid the five that scare you.

Because being a monad gives you stuff for free.

If you have something with a working point and flatMap (i.e. a monad), then you know that at least one correct implementation of map() is map(f) = flatMap(a -> point(f(a)), because the monad laws don't allow that function to do anything else.

You also get join(), which flattens out nested monads: join(m) = m.flatMap(a -> a) will turn Some(Some(3)) into Some(3).

You get sequence(), which takes a list of monads of A, and returns you a monad of a list of A's: sequence(l) = l.foldRight(point(List()))((m, ml) -> m.flatMap(x -> ml.flatMap(y -> point(x :: y)))) will turn [Future(x), Future(y)] into Future([x, y]).

And so on.

Knowing that Either is a monad means knowing that all the tools that work on a monad will work on Either. And when you learn that Future is a monad too, all the things you learned that worked on Either because it's a monad, you'll know will work on Future too.

Because how do you know it implements flatMap correctly?

If something has a flatMap() but doesn't obey the monad laws, developers no longer get the assurance that any of the things you'd normally do with flatMap() (like the functions above) will work.

There are plenty of law-breaking implementations of flatMap out there, possibly because people shy away from the M-word. Calling things what they are (is a monad, isn't a monad) gives us a vocabulary to explain why one of these things is not like the other. If you're implementing a flatMap() or its equivalent, you'd better understand what it means to be a monad or you'll be lying to the consumers of your API.

But Monad is an opaque term of art!

So, kind of like "Scrum", "ORM" or "Thread"?

Or, for that matter, "Object"?

In summary:

As developers, we do a better job when we understand the abstractions we're working with, how they function, and how they can be reused in different contexts.

Think of the most obvious monads that have started showing up in every language1 over the last few years: List, Future, Option, Either. They feel similar, but what do they all have in common? Option and Either kind of do similar things, but not really. An Option is kind of like a zero-or-one element list, but not really. And even though Option and Either are kind of similar, and Option and List are kind of similar, that doesn't make Either and List similar in the same way at all! And a Future, well, er…

The thing they have in common is they're monads.


1 Well, most languages. After finding great branding success with Goroutines, Go's developers realised they had to do everything possible to block any proposed enhancement of the type system that would allow the introduction of "Gonads".