October 21, 2002

HTTP Conditional Get for RSS Hackers

Given the massive confusion exhibited here, I've written a nice, simple guide on how to implement HTTP's Conditional GET mechanism, with regards to producers and consumers of RSS feeds.

This article presumes you are familiar with the mechanics of an HTTP query, and understand the layout of request, response, header and body.

What is a conditional get?

My full-length RSS feed is about 24,000 bytes long. It probably gets updated on average twice a day, but given the current tools, people still download the whole thing every hour to see if it's changed yet. This is obviously a waste of bandwidth. What they really should do, is first ask whether it's changed or not, and only download it if it has.

The people who invented HTTP came up with something even better. HTTP allows you to say to a server in a single query: “If this document has changed since I last looked at it, give me the new version. If it hasn't just tell me it hasn't changed and give me nothing.” This mechanism is called “Conditional GET”, and it would reduce 90% of those significant 24,000 byte queries into really trivial 200 byte queries.

Client implementation

The mechanism for performing a conditional get has changed slightly between HTTP versions 1.0 and 1.1. Like many things that changed between 1.0 and 1.1, you really have to do both to make sure you're satisfying everybody.

When you receive the RSS file from the webserver, check the response header for two fields: Last-Modified and ETag. You don't have to care what is in these headers, you just have to store them somewhere with the RSS file.

Next time you request the RSS file, include two headers in your request.. Your If-Modified-Since header should contain the value you snagged from the Last-Modified header earlier. The If-None-Match header should contain the value you snagged from the ETag header.

If the RSS file has changed since you last requested it, the server will send you back the new RSS file in the perfectly normal way. However, if the RSS file has not changed, the server will respond with a ‘304’ response code (instead of the usual 200), where 304 means ‘Not Modified’. In the case of a 304, the response will have an empty body and the RSS file won't be sent back to you at all.

There's a temptation for clients to put their own date in the If-Modified-Since header, instead of just copying the one the server sent. This is a bad thing, what you should be sending back is exactly the same date the server sent you when you received the file. There's two reasons for this. Firstly, your computer's clock is unlikely to be exactly synchronised with the webserver, so the server could still send you files by mistake. Secondly, if the server programmer has followed this guide (see below), it'll only work if you send back exactly what you received.

Server Implementation for Static Files

If you are using one of those weblogging tools that just sticks regular files on a regular webserver (e.g. or Moveable Type), your webserver will almost certainly already follow the get standard. HTTP 1.1 has been around 31 years now, and there's really not much of an excuse for anyone to not be following it.

One thing you'll have to watch out for, though, is if your site's RSS file is regenerated frequently even when it's not changed. If that happens, the server won't be able to keep track of the last modified time properly, and you'll get people downloading the file even when it's not changed. The solution is for the writers of weblogging tools to optimise their software to make sure that files are only updated if they've actually changed in some way. (i.e. have them generate the new file, compare it with the old one, and if they're the same leave the old one untouched.)

Server Implementation for Dynamic Content

If you've got a weblogging tool that re-generates the RSS file every time a request is made, there's a little more work to do. This section is aimed more at the writers of the tools than at the user, because it's the tool writers that need to fix their software so that it follows the specs.

I'll concentrate purely on RSS files, but the concepts used here can be applied to any page in the weblog, and may further reduce the bandwidth usage for your users.

In your RSS feed generator, you'll have to keep track of two values: the time the file was last modified (converted to Greenwich Mean Time), and an “etag”. According to RFC2616, the etag is an “opaque value”, which means you can put anything you like in it, providing you stick double-quotes around the whole lot. The time in the Last-Modified header needs to be formatted in a certain way, though, the same format used in email headers. For example, ‘Mon, 17 Sep 2001 11:54:29 GMT’.

Whenever someone requests your RSS file, send those values for the Last-Modified and Etag headers. Every web scripting language allows you to add and remove headers like that at will, just check the manual if you don't know how.

Now for the other bit. Whenever someone requests your RSS file, check the headers of their request for an If-Modified-Since header, or an If-None-Match header. If either of them are there, and if [deleted either ] both of them match the values you were planning to send out with the file, then don't send the file. Once again, consult your manual to see how to send back a "304 Not Modified" reply instead of the "200 OK" that you normally would. If you send back the 304 reply, you don't have to generate the RSS file at all. Just send out the headers, followed by two linefeeds to show the headers are done, and the client will know there's nothing else coming.

Technically, what you should do with an If-Modified-Since header is convert it to a date, and compare it with your stored date. However, 90% of the time you can get away with just doing a straight match, so it's probably not worth the effort.

How do I calculate the Last-Modified date?

Easy. It's the time that the most-recently-changed item in the RSS file was modified. Something like that should be pretty easy to store and fetch.

What should I put in an etag?

The Apache server uses a hash of the contents of the file. This isn't necessary though. All the eTag has to be is something that changes every time the file changes. So it could be a version number, or it could even be exactly the same as the Last-Modified date, just in double-quotes.

2002-11-11 Update: A number of people have written to me to remind me of HTTP's Gzip Content-encoding (compressing the files during transfer). This is a little beyond the scope of this essay. The worst thing you can do when suggesting a solution to a problem is to provide alternatives, people end up arguing the alternatives instead of implementing the fix.

1 in the original version of this document, this read ‘13 years&rsquo, because the author can not count. Mea culpa.

Posted to nerd, stories at October 21, 2002 02:23 PM
Comments currently disabled due to spam. If you want to comment on a post, email me, and I'll try to incorporate your feedback somehow.
Trackbacks <http://fishbowl.pastiche.org/mt-tb.cgi/47>

If-Modified-Since: whenever: TITLE: If-Modified-Since: whenever URL: http://markpasc.org/blog/2002/10/21.html#i004828 IP: 63.189.209.129 BLOG NAME: DATE: 10/21/2002 04:57:06 PM

From: at October 21, 2002 04:57 PM

Push: the once future king: Mark Pilgrim's problem with bandwidth is related to the fact that content aggregators are pulling his data too frequently. I'm not quite sure why he titled his entry "Push".... No reason for the aggregator to pull. So why is there pull here? Because...

From: Diamond Blog at October 22, 2002 03:35 PM

http://gibolin.dnsalias.org/archives/000009.html: The Fishbowl: HTTP Conditional Get for RSS Hackers "Given the massive confusion exhibited here, I've written a nice, simple guide

From: Mike @ Home at October 22, 2002 08:59 PM

If-Modified-Since and dynamic content: Brent added support for Etags and If-Modified-Since headers to the latest NetNewsWire beta. It's very cool. He added it after hints and pressure from among others Joel, Phil, Sam and Mark; I'll refrain from pointing out that I suggested it to him sever...

From: Ask Bjørn Hansen at October 23, 2002 09:17 PM

There Has Got To Be A Better Way: So I've got this nifty little RSS parser doohickey, Magpie. In the name of lowering the curve, and weaning

From: LaughingMeme at October 24, 2002 01:50 PM

Linux RSS aggregator search: I'm looking for an RSS aggregator. I've been using HotSheet recently, until I discovered it can't handle Dive Into Mark's

From: Petroglyphs at October 24, 2002 04:16 PM

MagpieRSS 0.3 is out!: MagpieRSS is the PHP RSS parser I wrote, because I was unhappy with all the existing solutions.(which you already

From: LaughingMeme at October 27, 2002 04:00 PM

Hacking: I hacked HTTP 1.1 Conditional Get support into phpicalendar today. Thank Zeus for NetNewsWire Lite's Bandwidth Statistics view. It made

From: Vertical Hold at November 21, 2002 12:15 PM

conditional HTTP get for RSS: http://fishbowl.pastiche.org/archives/001132.html

From: anil dash's daily links at January 21, 2003 02:04 PM

Conditional GET for RSS: This is a good summary of how to implementHTTP's conditional GET mechanism for RSS which I touched on here

From: From the Orient at January 22, 2003 08:39 PM

MagpieRSS 0.3 is out!: MagpieRSS is the PHP RSS parser I wrote, because I was unhappy with all the existing solutions.(which you already

From: LaughingMeme at February 12, 2003 03:53 PM

Using HTTP conditional GET in java for efficient downloading: If you're going to download a resource over HTTP from a URL more than once, there are a couple of features of HTTP you should make sure you're using. By giving the server some metadata about what you saw when you last downloaded the resource, it can gi...

From: hackdiary at April 10, 2003 07:49 AM

Using HTTP conditional GET in java for efficient downloading: If you're going to download a resource over HTTP from a URL more than once, there are a couple of features of HTTP you should make sure you're using. By giving the server some metadata about what you saw when you last downloaded the resource, it can gi...

From: hackdiary at April 10, 2003 07:50 AM

Using HTTP conditional GET in java for efficient polling: Includes code using Jakarta Commons HttpClient . hackdiary: Using HTTP conditional GET in java for efficient polling seeAlso: the fishbowl

From: Development Notebook at April 10, 2003 07:41 PM

http://google.com: TITLE: http://google.com URL: http://google.com IP: 68.35.86.96 BLOG NAME: DATE: 04/12/2003 10:06:46 AM

From: at April 12, 2003 10:06 AM

Bandwidth-saving tip of the day: Bandwidth a problem? Try gzip compression.

From: dive into mark at July 10, 2003 02:28 AM

How to save bandwidth fetching RSS: Bandwidth-saving tip of the day [dive into mark] How to save bandwidth fetching RSS... Conditional GET...

From: The CTO Speaks at July 10, 2003 03:41 AM

Clever Cactus: Among the projects I currently follow, there is CleverCactus. It seems that new features are added fast. rss performance improvementsNew cactus beta2 feature of the day: RSS-download performance improvements! Aside from tweaking the parsing a bit, I wa...

From: Misc... at July 11, 2003 11:23 PM

Ultra-liberal RSS parser 2.0: Jason Diamond did an incredible job adding ETag and Last-Modified support to my ultra-liberal RSS parser. This allows the parser to avoid redownloading feeds that haven't changed. This is a topic of much debate at the moment. I wish I could say I pl...

From: dive into mark at August 21, 2003 01:52 PM

Now Supporting If-Modified-Since: Articles like this make my life simple....

From: Pineapple blog at August 29, 2003 02:35 PM

Quick Links - October 02: RSS/ATOM Jeremy Allaire: RSS-Data: A Proposed Format and More discussion on RSS/XML-Data decafbad: RSS-Data: XML-RPC encoding in RSS 2.0...

From: hebig.org/blog at October 2, 2003 09:06 PM

Quick Links - October 02: RSS/ATOM Jeremy Allaire: RSS-Data: A Proposed Format and More discussion on RSS/XML-Data decafbad: RSS-Data: XML-RPC encoding in RSS 2.0...

From: hebig.org/blog at December 20, 2003 04:59 PM

Quick Links, October 02: RSS/ATOM Jeremy Allaire: RSS-Data: A Proposed Format and More discussion on RSS/XML-Data decafbad: RSS-Data: XML-RPC encoding in RSS 2.0...

From: hebig.org/blog at December 20, 2003 05:03 PM

Every time I think I'm out, they pull me back in...: Just can't get away from it.

From: Panopticon Central at April 5, 2004 04:53 AM

Conditional GET Plugin: DescriptionThe Conditional GET plugin reduces the bandwidth you send out from your blog for the flavors for which it is enabled. It works by examining various HTTP headers which indicate when the requesting client last polled your blog....

From: Confluence: blojsom at April 22, 2004 05:19 AM

Full-feed RSS and bandwidth: Eric Meyer on RSS and bandwidth: “All feeds will continue to use excerpts; I will not be publishing full-content feeds,...

From: JayAllen - The Daily Journey at May 4, 2004 11:20 PM

RSS melts the web: There has been some discussion of late about RSS readers causing the downfall of mankind. This has been discussed before on many occasions. Being that I like participating in the great world wide web as a responsible netizen I activated gzip encoding o...

From: randomthoughts at May 5, 2004 03:35 AM

RSS in CF: Supporting Conditional GET: One thing that's missing from most CF-powered, dynamically-generated syndication feeds is support for Conditional GET. Here's what you need to know. *Why Conditional GET?* When aggregators (both client- and server-side) retrieve your feed from the

From: Big Damn Heroes (MXBlogspace) at June 4, 2004 08:55 AM

RSS DDOS:

From: Matt's Blog at July 22, 2004 07:05 AM

RSS DDOS:

From: Matt's Blog at July 22, 2004 07:05 AM

RSS DDOS:

From: Matt's Blog at July 22, 2004 07:05 AM

Ønskeliste for et CMS: Kyrre skriver sin CMS ønskeliste, og jeg tenkte å skrive min

From: spdyvkng at August 30, 2004 07:12 AM

Conditional HTTP GET is Working: Robert Scoble says that MSDN is turning off full-text RSS feeds because they eat too much bandwidth. I left a comment suggesting they turn on support for conditional HTTP GET, and I wanted to give some real-world numbers for the...

From: Abe Fettig's Weblog at September 9, 2004 11:43 AM

Conditional HTTP GET is Working: Robert Scoble says that MSDN is turning off full-text RSS feeds because they eat too much bandwidth. I left a comment suggesting they turn on support for conditional HTTP GET, and I wanted to give some real-world numbers for the...

From: Abe Fettig's Weblog at September 9, 2004 11:44 AM

Conditional GET Plugin: DescriptionThe Conditional GET plugin reduces the bandwidth you send out from your blog for the flavors for which it is enabled. It works by examining various HTTP headers which indicate when the requesting client last polled your blog....

From: Confluence: blojsom at October 23, 2004 01:57 AM

last.fm script: A very crude and simple script to show your recent last.fm songs on your blog.

From: chaotic intransient prose bursts at December 5, 2004 11:19 PM

Interesting Concept About HTTP Conditional Gets for RSS Feeds: Very interesting article discussing the benefits of only retrieving partial data using an HTTP Get when consuming RSS Feeds. Especially interesting as RSS Feeds with special media enclosures (Pod Casting) begin to strain the bandwidth of the average bl...

From: MSDNReport.com at December 23, 2004 03:04 AM

What RSS Bandwidth Problem?: The so-called RSS Bandwidth Problem is a meme that just won't frickin' die. I think Joel Spolsky started it way...

From: Bryce Yehl at February 2, 2005 04:20 PM

Conditional Get: Conditional Get support has been added back to ewe. Mainly, even in an University where bandwidth is relatively cheap, because serving out 12-19MB a day (at lea...

From: Eos Web Editor Project at February 11, 2005 01:44 AM

RSS in the Mainstream, Coping with Rising Bandwidth: Support for RSS in Firefox, then Safari, are mostly encouraging developments to slowly further bring RSS into the mainstream. Both browsers are implementing the concept of "Live Bookmarks". Bookmarking a site now offers us the ability to not only save ...

From: The Apple Blog at May 3, 2005 06:15 PM
Comments