February, 21, 2003 archives
unfriendly robots
there's a nice long discussion following up on mark pilgrim's robots.txt support for über-aggregators
. it's never fun dealing with abusive robots, or even worse, software which encourages abusive behavior. it always happens when you have better things to do, but it never seems to happen often enough to make dealing with it in advance a big priority.
i like mark's step of making rssfinder.py observe the robots.txt standard. i'll try to find time to finally get my own robots (scraping news feeds and polling a few sites for blo.gs) to do the same.
(and some day i'll start crunching my own logs again, to figure out what bozos need to get blocked for bad behavior now that i haven't been paying attention for the last couple of months.)
news to me
mysql> SELECT COUNT(*) FROM my_news; +----------+ | count(*) | +----------+ | 59995 | +----------+
that's about ten month's worth of data for all the feeds i collect. i should figure out something interesting to do with the data.