Mon 22 Oct 2007
Every other day we see some new bot crawling our site, but most of the time they are useless for us. Some of them are just sent out for some special purpose of either spamming or some type of content theft so you have to be careful and should be able to identify such unnecessary guest who have a bad purpose.
What are are Bad Robots ?
Bots who ignore robots.txt file.
Follow links through cgi scripts
Traverse the whole web site in seconds, which affect website speed.
Revisit the web site too often, even when there is no update in site.
How to prevent them for crawling your site?
You can ban all identified bad robots from getting into your site by inserting few lines at .htaccess
You can ban such bad bots by two ways
*) by banning all accesses from a particular site
*) Banning all accesses that use a specific id to access the server.
Usually having a search engine bot on your site is good for your site as they crawl and index your website faster but you should take care that you don’t allow bad bots crawling your site.
4 Responses to “Bad Bot are nasty spiders.”
Leave a Reply
You must be logged in to post a comment.



















October 25th, 2007 at 12:32 pm
Thanks for the post, for blocking email add I added a line like this to the robots.txt file:
Disallow: /email-addresses/
November 6th, 2007 at 12:01 pm
All good about that article and for the information on bad bots as well but the most important thing of how does one make out that which bots are good and which are the intruders is the big question here. Any solution on this?
November 6th, 2007 at 2:09 pm
Chris, generally the intruders bots ignore robot.txt and revisit the website more often and follow links through cgi scripts, if you can keep close watch you can distinguish it which is bad and which are good.
November 7th, 2007 at 11:45 am
Yeah Paul thanks for the reply but what I’m trying to ask here is how are the good bots and the bad bots distinguishable by keeping a close watch? How can one make out that difference as I’m sure they don’t visit the site along with a tag.