« Team Project: AIM API | Main | Is AJAX here to stay? – My thoughts before reading the article »

SPAM or: Lesbian robots kissing Learningremix

Sunday, January 28, 2007; 3:14 pm. An email arrives. It’s Barbara with the subject “[Tom's Blog] New Comment Posted to 'Some mashups'”. Wow! Somebody is reading my Blog; let’s check and publish the comment… Oh, but what happened?

Lesbian Sistas say never planned her tryst lesbian kissing [...]

Barbara-- please! not here!

As Bud pointed out in one of the first lectures, the robots and spiders have finally found us and are trying to put SPAM onto Learningremix. But long ago there was a solution against these web crawlers-- at least against Google’s long arms: The robots.txt file. Here is an extract of Wikipedia’s description:

The robots exclusion standard or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website.

However, the robots.txt doesn’t seem to fit very well into Web 2.0 in which the emphasis is on sharing and spreading knowledge- also throughout search engines. Another problem in these days could be that today’s crawlers are simply ignoring these files and violating the politeness policy. Anyway – to protect my intellectual property (often referred to as bullshit ;-) from being cached by Google or any other crawler I just uploaded a robots.txt forbidding any of these to enter my directory. Let’s see if it works.

200701300039

Comments (1)

Hate to say it, but spammers don't respect robots.txt. It's voluntary.

Archives

Cool links

Blog Maintenance, EMUonline, Learningremix, Bud's Blog, Andre's Blog, Joern's Blog, Bud's Page, MYB-Tags, MYB-Joern, MYB-Andre, MYB-Thomas

Interesting stuff