What are bots reading on forum?

Discussion in 'ISPConfig 3 Priority Support' started by Taleman, Apr 25, 2024.

Tags:
  1. Taleman

    Taleman Well-Known Member HowtoForge Supporter

    I believe this has been going on always, but I have started wondering what are these bots (?) doing accessing forum. I do not understand what is the purpose. It may be the actual users of the forum cause less than 10 % of the traffic hitting the forum.
    I have examined the apache access log to find interesting stuff:
    Code:
          IP-address      Count
    . . .
      193.141.60.217       1675
          92.51.2.24       1813
       85.76.164.138       2431
       88.99.240.224       5523
       162.55.85.222       7400
      136.243.212.93       9600
      95.217.149.110      14120
       88.192.16.204      21505
       52.70.240.171      22964
      185.83.118.127      23992
        80.66.87.136      24828
        23.22.35.162     132027
       3.224.220.101     133014
    
    Why read the forum a hundred thousand times? There might be that may articles in the forum, but next day the same bunch accesses forum again. Are these indexing bots that can not read only new entries and resort to indexing every article every day? If it is copying forum content for some use elsewhere, again why do it every day?
    The summary by logwatch is full of lines like these:
    Code:
           /download/file.php?id=1317&t=1&sid=144ebd6 ... d03c272f6868207: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=14933da ... 1f017b34234955e: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=1533ca2 ... 981dfe96ccd1d0e: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=155c57d ... b3b875f25f5d49f: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=15d6f55 ... 2747412ad98d20d: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=169e318 ... c3789a412a78f07: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=171c2f3 ... 34824f15d5a0a9b: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=176afcb ... 49274ab2760f787: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=17d4cbe ... ac1401f4d82efb0: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=182fde0 ... 819a34034703e63: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=1845e96 ... 054c768c3136088: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=19fa326 ... eb1188a342c8b80: 1 Time(s)
           /download/file.php?id=1317&t=1&sid=1a9d8d3 ... fc10bc91ed683f4: 1 Time(s)
    The following are most numerous client strings, some of the say it is a bot or spider:
    Code:
         Count                                                                             Client string
    
          5687 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36
          7649                        Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
          7844 Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Mobile Safari/537.36
          8215 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36
         11424                Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)
         15123 serpstatbot/2.1 (advanced backlink tracking bot; https://serpstatbot.com/; [email protected])
         20694 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
         21505 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
         22403                         Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)
         24443           Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0
         32779                              Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
         62844 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])
        288005 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
        633751 Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; [email protected])
    
    I tried blocking IP, does not really work since there are so many. Even blocking /16 subnets helps little, seems it guiets things for one minute, then traffic starts coming from somewhere else.
    I started blocking most prolific scrapers yesterday when the forum software (phpBB) seemed overloaded and (real) users started complaining forum is very slow.
     
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    It can be anything; my sites are also constantly downloaded by bots. If you can, put your forum behind Cloudflare. Cloudflare offers options to ban bad bots and to cache content.
     
    Taleman likes this.
  3. pyte

    pyte Well-Known Member HowtoForge Supporter

    They even seem to ignore standards like robots.txt in some cases just for the sake of more data, more money, more profit.

    Welcome to the modern web i guess. Sadly thats how things came to be, and in the example above we see more and more companies that want a piece of cake from the AI Hype, so i think we will see more and more bot traffic in the next few years. On a personal note, I am tired of the AI hype and hope it settles down some time soon, but i guess that will take some time for companies to realize that this is not an endless money generating trick.
     
  4. nhybgtvfr

    nhybgtvfr Well-Known Member HowtoForge Supporter

    yep.. bots are a nightmare..

    get a load of crawling attempts from various ip's for MJ12Bot. or at least claiming to be MJ12Bot.. and as far as i can tell, they repeatedly, and deliberately ignore anything in the robots.txt file.
     

Share This Page