I have had a lot of bandwidth usage lately and checking the apache log it appears as if Yandex, AhrefsBot, linkdexbot are hitting my sites like there is no tomorrow and it also seems as if the latter 2 are managing to bypass the .htaccess files on my sites. Is there another option to block these bots other than .htzccess and robots.txt?
See the solution here: https://serverfault.com/questions/251988/blocking-apache-access-via-user-agent-string 1. Block user agents 2. Make custom log entry 3. Have fail2ban parse that log and ban ip
working on it now, but wondering why a bot should be able access the Apache HTTP Server Version 2.4 manual when it isn't in the web directory:- sure enough went to:- http://scm-rpg.com.au/manual/zh-cn/mod/module-dict.html and there it is ?????
I'll be honest and admit that I really don't understand that page at all can someone set out the steps for me in a simpler way using (and I'm guessing here) /etc/apache2/sites-enabled (vhost) At the moment I have at the top of my .htaccess files just under RewriteEngine On