unfriendly robots
there's a nice long discussion following up on mark pilgrim's robots.txt support for über-aggregators
. it's never fun dealing with abusive robots, or even worse, software which encourages abusive behavior. it always happens when you have better things to do, but it never seems to happen often enough to make dealing with it in advance a big priority.
i like mark's step of making rssfinder.py observe the robots.txt standard. i'll try to find time to finally get my own robots (scraping news feeds and polling a few sites for blo.gs) to do the same.
(and some day i'll start crunching my own logs again, to figure out what bozos need to get blocked for bad behavior now that i haven't been paying attention for the last couple of months.)
Comments
Add a comment
Sorry, comments on this post are closed.
i'll try to find time to finally get my own robots (scraping news feeds and polling a few sites for blo.gs) to do the same.
If you used Perl and LWP you could just use the right module that overloads the LWP stuff and gives you that "for free". ;-)