#1  
Old 10th November 2012, 14:31
blinky blinky is offline
Member
 
Join Date: Sep 2012
Posts: 34
Thanks: 0
Thanked 0 Times in 0 Posts
Default blocking bots

What's the best way to block bots from searching your website?

I have created a robots.txt file which looks like this:
Code:
User-agent: *
Disallow: /
Disallow: /cgi-bin/
I have included the following in my index.html file:
Code:
<meta name="robots" content="NOINDEX, NOFOLLOW">
And I have also included an .htaccess file in my root which looks like this:
Code:
SetEnvIfNoCase User-Agent "^Yandex*" bad_bot
Order Deny,Allow
Deny from env=bad_bot
Yet I'm still seeing entries in Apache's access.log:
Code:
178.154.164.251 - - [10/Nov/2012:04:33:14 -0500] "GET /robots.txt HTTP/1.1" 200 324 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
178.154.164.251 - - [10/Nov/2012:04:33:14 -0500] "GET /phpbb/search.php?search_id=active_topics&sid=3a033d745efebc4ace615dd64e8f63f7 HTTP/1.1" 200 3735 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
178.154.164.251 - - [10/Nov/2012:04:33:17 -0500] "GET /phpbb/ucp.php?mode=login&sid=3a033d745efebc4ace615dd64e8f63f7 HTTP/1.1" 200 3513 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
66.249.76.173 - - [10/Nov/2012:06:05:11 -0500] "GET /robots.txt HTTP/1.1" 200 368 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
178.154.164.251 - - [10/Nov/2012:06:32:14 -0500] "GET /phpbb/index.php?sid=3a033d745efebc4ace615dd64e8f63f7 HTTP/1.1" 200 3908 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
123.125.71.74 - - [10/Nov/2012:06:35:02 -0500] "GET /robots.txt HTTP/1.1" 200 331 "-" "Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"
I have even included the IP address 178.154.164.251 in my iinbound filter list on my router. (The fact that I see that address still listed in my Apache logs suggests (at least to me) that Yandex isn't coming from that address.


Thoughts anyone?

Last edited by blinky; 10th November 2012 at 22:43.
Reply With Quote
Sponsored Links
  #2  
Old 11th November 2012, 12:28
falko falko is offline
Super Moderator
 
Join Date: Apr 2005
Location: Lneburg, Germany
Posts: 41,701
Thanks: 1,900
Thanked 2,740 Times in 2,575 Posts
Default

Try
Code:
User-agent: Yandex
Disallow: /
in your roboty.txt. See http://help.yandex.com/webmaster/?id=1113851
__________________
Falko
--
Download the ISPConfig 3 Manual! | Check out the ISPConfig 3 Billing Module!

FB: http://www.facebook.com/howtoforge

nginx-Webhosting: Timme Hosting | Follow me on:
Reply With Quote
  #3  
Old 11th November 2012, 19:05
blinky blinky is offline
Member
 
Join Date: Sep 2012
Posts: 34
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Quote:
Originally Posted by falko View Post
Try
Code:
User-agent: Yandex
Disallow: /
in your roboty.txt. See http://help.yandex.com/webmaster/?id=1113851
I had tried that but it didn't seem to amke any diffence. Adding a serious of IP address blocks (a bit overboard) seems to ahve worked.

The:
Code:
User-agent: *
Disallow: /
seems to have stopped the vast majority of activity I'm not interested in having.

I believe I had a more serious problem though which I'll address in a seperate thread.

Sheesh, I'm getting more traffic than a free bordello beside a Naval dock!
Reply With Quote
  #4  
Old 12th November 2012, 13:13
falko falko is offline
Super Moderator
 
Join Date: Apr 2005
Location: Lneburg, Germany
Posts: 41,701
Thanks: 1,900
Thanked 2,740 Times in 2,575 Posts
Default

Quote:
Originally Posted by blinky View Post
The:
Code:
User-agent: *
Disallow: /
seems to have stopped the vast majority of activity I'm not interested in having.
You should be aware that this will also block the Google and BING bots...
__________________
Falko
--
Download the ISPConfig 3 Manual! | Check out the ISPConfig 3 Billing Module!

FB: http://www.facebook.com/howtoforge

nginx-Webhosting: Timme Hosting | Follow me on:
Reply With Quote
  #5  
Old 12th November 2012, 15:29
blinky blinky is offline
Member
 
Join Date: Sep 2012
Posts: 34
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Quote:
Originally Posted by falko View Post
You should be aware that this will also block the Google and BING bots...
Yes, I'm aware that is should stop ALL bots. And it does if they observe the rules in robots.txt. But if they don't, they'll keep knocking away with the zeal of a vaccuum cleaner salesman pounding on my front door.

Quick question you might know the answer to...

When I see an entry in my Apache logfile that says : GET /robots.txt" does that mean the robot has tried to do a search and then has recieved my robots.txt file? I guess what I'm really asking here is must a robot search at least once from 999.999.999.999 to recieve the robots.txt file after which searches from 999.999.999.999 will stop?
Reply With Quote
  #6  
Old 12th November 2012, 16:52
webguyz webguyz is offline
Member
 
Join Date: Oct 2012
Location: Earth
Posts: 91
Thanks: 28
Thanked 12 Times in 10 Posts
Default

Quote:
Originally Posted by blinky View Post
....they'll keep knocking away with the zeal of a vaccuum cleaner salesman pounding on my front door...

Do vaccuum cleaner salesman that make house calls still exist?
__________________
= WebGuyz.Net =
VPS and Web Hosting
Reply With Quote
  #7  
Old 13th November 2012, 16:49
falko falko is offline
Super Moderator
 
Join Date: Apr 2005
Location: Lneburg, Germany
Posts: 41,701
Thanks: 1,900
Thanked 2,740 Times in 2,575 Posts
 
Default

Quote:
Originally Posted by blinky View Post
When I see an entry in my Apache logfile that says : GET /robots.txt" does that mean the robot has tried to do a search and then has recieved my robots.txt file?
Bots explicitly request that file to learn what URLs they are allowed to index.
__________________
Falko
--
Download the ISPConfig 3 Manual! | Check out the ISPConfig 3 Billing Module!

FB: http://www.facebook.com/howtoforge

nginx-Webhosting: Timme Hosting | Follow me on:
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Blocking Based on Country with GeoIP, xtables-addons, and iptables linus3x Installation/Configuration 2 4th October 2013 23:56
Ban bots (ISPConfig 3) scottrill2 General 3 31st August 2012 06:45
Slowing Down Bots midcarolina General 3 11th February 2012 18:21
blocking spam emails in ispconfig3 Mitz General 4 3rd February 2010 12:26
blocking top-level-domains? Libor Server Operation 1 13th October 2008 11:45


All times are GMT +2. The time now is 06:27.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.