i’m getting too old for this
blo.gs appears to be under some sort of strange indirect attack. it’s being hammered by various robots trying to index search results. so there’s requests from the feedster bot, the technorati bot, and all sorts of bots i’ve never even heard of. oh, and the the yahoo!, google, and msn bots have all decided to get in on the action.
it is extremely strange. perhaps it is supposed to be some sort of clever distributed denial of service?
Comments
Hi Jim,
I'd be strongly surprised if its our bot that's hurting you but if it is my cell is 617 470 9654. Call anytime day or night and we'll get it addressed.
Scott
blo.gs had a fairly loose robots.txt, which i’ve tightened up (considerably — it will get loosened back up again). that doesn’t help with all of the poorly-written robots out there, of course.
here’s an instance of feedster adding to the trouble: it tried to access http://blo.gs/?q=knubble and nine variations over the span of 16 seconds earlier today. searching happens to be fairly heavyweight on blo.gs. a quick grep of the logs for today and yesterday shows about 1500 similar search requests.
feedster isn’t necessarily doing anything wrong (although i note it has not tried to access a robots.txt file from blo.gs in the last two weeks), it’s just a combination of two bits of unfortunate behavior (feedster trying to append path info to a request with a query string in order to find an rss feed, and the expensive nature of searching on blo.gs).
this has also brought to my mind how blo.gs could be used as a vector for attack on third parties, and i’ll be working to tighten that up over the weekend, too.
Add a comment
Sorry, comments on this post are closed.
Have you tried limiting the robots using robots.txt file?
Using something like the following to disallow the robots from grabbing certain pages?
User-agent: * Disallow: /account/ Disallow: /general/ Disallow: /login/
Works wonders for Google, Yahoo! and co's webcrawlers.