Good Search spiders but high usage

Discussion in 'Server Operation' started by concept21, Sep 17, 2024.

  1. concept21

    concept21 Active Member

    Dear Expert Web Masters,
    I have finally submitted my site successfully to Google and MS Clarity.

    Good news is - they search my site very frequently. I confirm these from ISPConfig3 Nginx web1 access.log.
    Bad news is - their search activities use up more than 50% performance of my VPS. htop terminal shows that web1 user process CPU usage is 50% or more. That slow down visitor's browsing experience very much.
    What should I do to limit good search engines? :(
    My VPS is 4 sharing CPU cores, 8 GB RAM.
     
  2. concept21

    concept21 Active Member

    Hello Friend,
    Really? It sounds good. How do I set crawl delay in robots.txt? :)
     
  3. till

    till Super Moderator Staff Member ISPConfig Developer

    Example:

    Code:
    User-agent: *
    Crawl-delay: 10
    
    means to wait 10 seconds between requests. But not all web crawlers support that setting.
     
    concept21 and ahrasis like this.
  4. concept21

    concept21 Active Member

    My VPS seems smoother. :rolleyes:
     
    till likes this.
  5. Alex Mamatuik

    Alex Mamatuik Member

    maybe also to use redis as a rate limiter with some kind of snippet to index.php?
    PHP:
    <?php

    $redis 
    = new Redis();
    $redis->connect('127.0.0.1'6379);
    $redis->auth('REDIS_PASSWORD');

    $max_calls_limit  10;
    $time_period      10;
    $total_user_calls 0;

    if (!empty(
    $_SERVER['HTTP_CLIENT_IP'])) {
        
    $user_ip_address $_SERVER['HTTP_CLIENT_IP'];
    } elseif (!empty(
    $_SERVER['HTTP_X_FORWARDED_FOR'])) {
        
    $user_ip_address $_SERVER['HTTP_X_FORWARDED_FOR'];
    } else {
        
    $user_ip_address $_SERVER['REMOTE_ADDR'];
    }

    if (!
    $redis->exists($user_ip_address)) {
        
    $redis->set($user_ip_address1);
        
    $redis->expire($user_ip_address$time_period);
        
    $total_user_calls 1;
    } else {
        
    $redis->INCR($user_ip_address);
        
    $total_user_calls $redis->get($user_ip_address);
        if (
    $total_user_calls $max_calls_limit) {
           
            exit();
        }
    }
     
  6. Freda Koch

    Freda Koch New Member

    To reduce high CPU usage on your VPS from frequent search engine crawling, consider these steps:

    1. Set Crawl Delay: Add a crawl delay in your robots.txt file:

      makefile
      User-agent: *
      Crawl-delay: 10

    2. Nginx Rate Limiting: Configure rate limiting in Nginx to restrict requests from a single IP.

    3. Webmaster Tools: Adjust crawl rates in Google Search Console and Bing Webmaster Tools.

    4. Optimize or Upgrade VPS: Consider optimizing your current setup or upgrading your VPS plan.

    5. Redis Rate Limiting: Implement a Redis-based rate limiter in your PHP code to control access based on IP addresses.
    These measures can help lower server load and improve visitor experience.
     
  7. Taleman

    Taleman Well-Known Member HowtoForge Supporter

    Is @Freda Koch another account pasting AI bot answers to already solved threads?
     
    Alex Mamatuik and ahrasis like this.

Share This Page