Referer Spam: A message to writers of web tools

by Charles Miller on March 26, 2003

The Referer request-header allows a server to generate lists of back-links to resources for interest, logging, optimized caching, etc. It also allows obsolete or mistyped links to be traced for maintenance. The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard. — RFC2616: HTTP/1.1

I've written about this before. So has Mark Pilgrim, and he's got a much bigger audience than me. If you are writing a news aggregator, web robot, blog indexing tool, whatever, you MUST NOT advertise your product's website in the Referer header. If you do, your product does not comply with the HTTP standard. More importantly, you're being incredibly annoying.

I'm not just being a standards-nazi here. Referer logs are an important tool in running a weblog. Only about one in ten people who link to me use Trackback, so keeping an eye on my logs is really the only way I have to notice any discussion of things I may have said off my site. Referer logs are the link to the wider community. Spamming these logs with irrelevant links to tool sites or portal pages increases the time I need to spend maintaining my weblog.

It's just like e-mail spam. Sure, it gets you a few click-throughs, but by subverting the intention of the medium, you're imposing your advertising on people by getting in the way of what they're really looking for.

The biggest culprits by volume are RSS newsreaders (Although many have stopped since Mark Pilgrim brought it up), but they're only a minor annoyance because they all just hit the RSS feeds, and since there can't really be any legitimate referers for RSS feeds, you can ignore them all. In fact, I rather like the idea of RSS readers leaving behind the site of the newsreader's user in the referer header, so you can see who's reading you. That's useful information, and almost related to the purpose of the header.

The biggest culprits by annoyance-value are the spiders, that hit random pages of the site, and leave their advertisements behind. You generally have to follow the link just in case it's somebody pointing to a bunch of your stuff at once, and it almost always turns up on some portal page that might be quite useful, but isn't what you're looking for at this particular time, so it puts you off ever going back.

The place for advertising is the User-Agent header. Use that instead.

Previously: Paying for Software

Next: PortableRemoteObject.biteMe()