Good Search spiders but high usage

Discussion in 'Server Operation' started by concept21, Sep 17, 2024.

  1. concept21

    concept21 Active Member

    Dear Expert Web Masters,
    I have finally submitted my site successfully to Google and MS Clarity.

    Good news is - they search my site very frequently. I confirm these from ISPConfig3 Nginx web1 access.log.
    Bad news is - their search activities use up more than 50% performance of my VPS. htop terminal shows that web1 user process CPU usage is 50% or more. That slow down visitor's browsing experience very much.
    What should I do to limit good search engines? :(
    My VPS is 4 sharing CPU cores, 8 GB RAM.
     
  2. lukafred

    lukafred New Member

    To limit search engine crawlers' impact on your VPS, use robots.txt to set a crawl delay, configure Nginx rate-limiting, adjust crawl rates in Google Search Console and Bing Webmaster Tools, and consider optimizing or upgrading your VPS. These steps will help reduce the load on your server.
     
  3. concept21

    concept21 Active Member

    Hello Friend,
    Really? It sounds good. How do I set crawl delay in robots.txt? :)
     
  4. till

    till Super Moderator Staff Member ISPConfig Developer

    Example:

    Code:
    User-agent: *
    Crawl-delay: 10
    
    means to wait 10 seconds between requests. But not all web crawlers support that setting.
     
    concept21 and ahrasis like this.
  5. concept21

    concept21 Active Member

    My VPS seems smoother. :rolleyes:
     
    till likes this.
  6. Alex Mamatuik

    Alex Mamatuik Member

    maybe also to use redis as a rate limiter with some kind of snippet to index.php?
    PHP:
    <?php

    $redis 
    = new Redis();
    $redis->connect('127.0.0.1'6379);
    $redis->auth('REDIS_PASSWORD');

    $max_calls_limit  10;
    $time_period      10;
    $total_user_calls 0;

    if (!empty(
    $_SERVER['HTTP_CLIENT_IP'])) {
        
    $user_ip_address $_SERVER['HTTP_CLIENT_IP'];
    } elseif (!empty(
    $_SERVER['HTTP_X_FORWARDED_FOR'])) {
        
    $user_ip_address $_SERVER['HTTP_X_FORWARDED_FOR'];
    } else {
        
    $user_ip_address $_SERVER['REMOTE_ADDR'];
    }

    if (!
    $redis->exists($user_ip_address)) {
        
    $redis->set($user_ip_address1);
        
    $redis->expire($user_ip_address$time_period);
        
    $total_user_calls 1;
    } else {
        
    $redis->INCR($user_ip_address);
        
    $total_user_calls $redis->get($user_ip_address);
        if (
    $total_user_calls $max_calls_limit) {
           
            exit();
        }
    }
     
  7. Freda Koch

    Freda Koch New Member

    To reduce high CPU usage on your VPS from frequent search engine crawling, consider these steps:

    1. Set Crawl Delay: Add a crawl delay in your robots.txt file:

      makefile
      User-agent: *
      Crawl-delay: 10

    2. Nginx Rate Limiting: Configure rate limiting in Nginx to restrict requests from a single IP.

    3. Webmaster Tools: Adjust crawl rates in Google Search Console and Bing Webmaster Tools.

    4. Optimize or Upgrade VPS: Consider optimizing your current setup or upgrading your VPS plan.

    5. Redis Rate Limiting: Implement a Redis-based rate limiter in your PHP code to control access based on IP addresses.
    These measures can help lower server load and improve visitor experience.
     
  8. Taleman

    Taleman Well-Known Member HowtoForge Supporter

    Is @Freda Koch another account pasting AI bot answers to already solved threads?
     
    ahrasis likes this.

Share This Page