August, 15, 2004 archives
spamming the blog listing services
so as i’m cleaning up and optimizing blo.gs, i’ve stumbled across a huge set of spammed entries that came in via weblogs.com. this particular bunch will be easy to block going forward, and it inspired yet another couple of ping-spamming defenses.
one of the things i’m working on is doing better logging of some things so that i can more easily run reports that will show the statistical outliers. for example, most IP addresses only ping blo.gs a few times a day, at most. so addresses that do more than that are ripe for investigation.
also, since all updates are now going through a common function, i don’t need to worry about updating rules to handle direct pings as well as information imported from weblogs.com and blogger.com.
i’m taking advantage of some nifty new features of mysql 4.1 to optimize some bits of the site, like b-tree indexes on in-memory tables. i’ve also converted everything over so that the database realizes that it is storing utf-8 data.
so the site is back up, and hopefully won’t need to be taken down for as long a period as it was today, at least for a while. i hope i’m done with the schema changes to the main blogs table, though. it takes about 20 minutes to rebuild the table each time i change the schema.
spammer fallout
one fallout effect of the ping spam i eliminated yesterday is that various blo.gs search queries appear as search results on the major search engines. so now when the search page gets a referral from one of those search engines, it redirects over to the fbi. originally, i was just making them submit the search manually. but these are obviously not bright people, and they just went ahead and did the search on blo.gs anyway. they really are that desperate for their free porn.
here’s a fun search that brought up a blo.gs search result: pictures of nude men in fishing waders
. in fact, it’s the top listing on google right now. (soon to be supplanted by this entry, i suppose.)
needless to say, the search engines are now excluded from the search page, and this will be a non-issue once the entries drop out of the search databases.