glenn re-raises the rss bandwidth issue. i’ll admit i’m lazy, and haven’t implemented if-modified-since handling for my own aggregator (which only polls hourly, and only news sites — i don’t read blogs via an aggregator), or if-modified-since handling for the rss feeds i produce (particularly the scraped feeds).

but one thing i have done is implement a system that serves up 403 responses to people who poll the scraped feeds more than once an hour. it blocks over ten thousand requests a day. what’s amazing to me is the number of persistent attempts in the face of repeated errors. for my polling, i am emailed errors when they occur, so i would know pretty quickly when i’ve been blocked. does make if-modified-since requests. of course, since it is only making requests when there should be changes (it only does so in response to pings, it doesn’t poll), it doesn’t make much difference.


I wonder if you'd see better compliance if you were to serve an item in your RSS feed explictly stating that they're being blocked because they're scraping more than once per hour (or redirecting to a static feed that contains the blocked msg in its sole item) since their RSS aggregator may not indicate a 403 error . IIRC, slashdot was doing this a couple of months ago, though I'm not sure if they still are or not.

» Jason Perkins (link) » november 14, 2004 9:35am

What Jason said.

I use the jabrss service as my primary weblog aggregator, and it's not very good about reporting errors (to put it mildly), so I'd never notice if it was behaving badly.

» Ask Bjørn Hansen (link) » november 14, 2004 7:14pm

these jerks now get fed a small entry directing them here. let the whining begin!

» jim (link) » december 15, 2004 9:35am


» |ntel » december 15, 2004 1:06pm

add a comment

sorry, comments on this post are closed.