Monthly Archives: April 2010

Baiduspider, Twiceler, and Yeti – Bad Robots!

One of the websites I manage had been experiencing tremendous use in bandwidth. “Sweet!” we thought, a solid boost in traffic is a good thing. But as time went by, it kept going up and up in an unreal way – something was awry. After checking the log files, I saw that the 3 bots mentioned above seemed to be literally attacking the website. After some Googling, I found that some or all of these robots have run amuck on other websites too, and are generally worthless as far as supplying any valuable traffic – so I just blocked them all by placing the following in my .htaccess file:

Options +FollowSymLinks
RewriteEngine on

# block bad bots

RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC]
RewriteRule ^.*$ http://google.com/ [R,L]

RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5(.*)Twiceler
RewriteRule . http://www.cuill.com/your_bot_sucks [R=301]

RewriteCond %{HTTP_USER_AGENT} ^Yeti [NC]
RewriteRule ^.*$ http://google.com/ [R,L]

# end bad bots

The above is a combination of various Googling (I wish I kept the links so I can thank the fine folks that helped me address this). You can send them away to any site you like – in the first and last I just kicked them to Google, and for the second I sent the robot back to it’s own site. My bandwidth now seems to back down to normal levels. Try them out, monitor your bandwidth and log files, and tweak as needed – good luck!

Update – December 2010

I’ve consolidated the code, and added another bot:

# block bad bots

RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5(.*)Twiceler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Yeti [OR]
RewriteCond %{HTTP_USER_AGENT} ^Java.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mail.Ru.*
RewriteRule ^(.*)$ http://help.naver.com/robots/ [R,L]