Everyone's noticed a bunch of posts appearing two or three times on Javablogs, and a few of the smarter ones amongst you have noticed that they're all JRoller blogs.
The major problem, it turns out, is JRoller suffering from an acute case of schizophrenia. Sometimes, it believes it's www.jroller.com, sometimes it believes it's just jroller.com, and every so often it has an acid flashback, and is convinced it's really the old-school freeroller.net.
Roller RSS feeds identify and link their posts using the <guid/> RSS element, like this:
<guid ispermalink="true"> http://www.jroller.com/page/username/20040131#my_post_here </guid>
So when JRoller has one of its schizophrenic moments, the GUID of the post changes to contain a different domain-name, and Javablogs can't help but think it's a totally new post.
Hopefully, it's something that can be easily fixed from the other end. :)
While the freeroller.net thing sucks, www.javablogs.com and javablogs.com should be treated the same. It's tedious, but it's the correct thing to do in the long run.
The challenge is because mostly these RSS & RDF files are generated through a script which looks up the current hostname dynamically to create the URL. This makes them easily usable across domains without requiring configurations. I have seen the same thing happen with my blog (which is run using WordPress) http://blog.taragana.com/, when I changed the URL from http://www.taragana.com/people/angsuman/blog/. Now both of them are correct, however I preferred the newer shorter URL. When I removed the old URL and added the new one to javablogs.com, the old feeds were not removed. So for a couple of days I could see the old as well as the new feeds.
It's been doing it since the good old freeroller days. Back then it used to alternate between www.freeroller.net, freeroller.net and roller.anthonyeden.com. It drove, and drives, me nuts as for a given blog my aggregator sometimes displays every post since the blog was started.
I filed a bug report on this a while back. I was told that it was a configuration issue, not a software issue. Here's the link:
http://opensource.atlassian.com/projects/roller/secure/ViewIssue.jspa?key=ROL-236
I just now changed the JRoller "absolute URL to site" setting to force the domain name to jroller.com. The setting was blank before.
Without this setting, the JRoller feeds were using whatever hostname was requested at cache refresh time. So if the first request after the cache timeout was for freeroller.net then the GUID's in the RSS feed would read freeroller.net until the next cache timeout.