I believe this has been going on always, but I have started wondering what are these bots (?) doing accessing forum. I do not understand what is the purpose. It may be the actual users of the forum cause less than 10 % of the traffic hitting the forum. I have examined the apache access log to find interesting stuff: Code: IP-address Count . . . 193.141.60.217 1675 92.51.2.24 1813 85.76.164.138 2431 88.99.240.224 5523 162.55.85.222 7400 136.243.212.93 9600 95.217.149.110 14120 88.192.16.204 21505 52.70.240.171 22964 185.83.118.127 23992 80.66.87.136 24828 23.22.35.162 132027 3.224.220.101 133014 Why read the forum a hundred thousand times? There might be that may articles in the forum, but next day the same bunch accesses forum again. Are these indexing bots that can not read only new entries and resort to indexing every article every day? If it is copying forum content for some use elsewhere, again why do it every day? The summary by logwatch is full of lines like these: Code: /download/file.php?id=1317&t=1&sid=144ebd6 ... d03c272f6868207: 1 Time(s) /download/file.php?id=1317&t=1&sid=14933da ... 1f017b34234955e: 1 Time(s) /download/file.php?id=1317&t=1&sid=1533ca2 ... 981dfe96ccd1d0e: 1 Time(s) /download/file.php?id=1317&t=1&sid=155c57d ... b3b875f25f5d49f: 1 Time(s) /download/file.php?id=1317&t=1&sid=15d6f55 ... 2747412ad98d20d: 1 Time(s) /download/file.php?id=1317&t=1&sid=169e318 ... c3789a412a78f07: 1 Time(s) /download/file.php?id=1317&t=1&sid=171c2f3 ... 34824f15d5a0a9b: 1 Time(s) /download/file.php?id=1317&t=1&sid=176afcb ... 49274ab2760f787: 1 Time(s) /download/file.php?id=1317&t=1&sid=17d4cbe ... ac1401f4d82efb0: 1 Time(s) /download/file.php?id=1317&t=1&sid=182fde0 ... 819a34034703e63: 1 Time(s) /download/file.php?id=1317&t=1&sid=1845e96 ... 054c768c3136088: 1 Time(s) /download/file.php?id=1317&t=1&sid=19fa326 ... eb1188a342c8b80: 1 Time(s) /download/file.php?id=1317&t=1&sid=1a9d8d3 ... fc10bc91ed683f4: 1 Time(s) The following are most numerous client strings, some of the say it is a bot or spider: Code: Count Client string 5687 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 7649 Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/) 7844 Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Mobile Safari/537.36 8215 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 11424 Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html) 15123 serpstatbot/2.1 (advanced backlink tracking bot; https://serpstatbot.com/; [email protected]) 20694 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 21505 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36 22403 Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/) 24443 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0 32779 Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 62844 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected]) 288005 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) 633751 Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; [email protected]) I tried blocking IP, does not really work since there are so many. Even blocking /16 subnets helps little, seems it guiets things for one minute, then traffic starts coming from somewhere else. I started blocking most prolific scrapers yesterday when the forum software (phpBB) seemed overloaded and (real) users started complaining forum is very slow.
It can be anything; my sites are also constantly downloaded by bots. If you can, put your forum behind Cloudflare. Cloudflare offers options to ban bad bots and to cache content.
They even seem to ignore standards like robots.txt in some cases just for the sake of more data, more money, more profit. Welcome to the modern web i guess. Sadly thats how things came to be, and in the example above we see more and more companies that want a piece of cake from the AI Hype, so i think we will see more and more bot traffic in the next few years. On a personal note, I am tired of the AI hype and hope it settles down some time soon, but i guess that will take some time for companies to realize that this is not an endless money generating trick.
yep.. bots are a nightmare.. get a load of crawling attempts from various ip's for MJ12Bot. or at least claiming to be MJ12Bot.. and as far as i can tell, they repeatedly, and deliberately ignore anything in the robots.txt file.