VPS - High CPU Load - Debian 8.9

Discussion in 'ISPConfig 3 Priority Support' started by fatmike, Jul 22, 2017.

  1. fatmike

    fatmike Member

    Hi.
    The last couple of days I'm experiencing very high CPU load in a VPS container that I'm managing.
    The VPS is controlled via ISPConfig 3.1.
    The VPS serves only one website(Wordpress), a couple of mailboxes and one MySQL database. No DNS on that container.

    The problem started after a sudden reboot of the server and fail2ban failed to start.
    I noticed that after a couple of hours and a lot of brute force attacks to xmlrpc.php(know WP issue) file of Wordpress website.
    So after stop/start of fail2ban the rules(jails) started/banned attackers a should.

    The issue though appeared again and the situation getting worst hour by hour.
    All the services seem to be active but the webserver is down from time to time or very slow.
    All other services (ssh, mail, ftp etc) work as supposed to.

    I run chkrootkit and no issues found(except know false-positive bindshell).

    I run rkhunter and reported:
    Code:
    System checks summary
    =====================
    File properties checks...
        Files checked: 146
        Suspect files: 0
    Rootkit checks...
        Rootkits checked : 376
        Possible rootkits: 0
    Applications checks...
        Applications checked: 6
        Suspect applications: 0
    The system checks took: 18 minutes and 32 seconds
    All results have been written to the log file: /var/log/rkhunter.log
    One or more warnings have been found while checking the system.
    Please check the log file (/var/log/rkhunter.log)
    
    /var/log/rkhunter.log file is included in this post.

    top command shows high load of mysqld (CPU 122%) and is included as a txt file in this post.

    ps aux command output is attached to this post too.

    I noticed the "nobody" user near the end running "PassengerLoggingAgent" that has a warning in rkhunter log too.
    Code:
    USER  PID %CPU %MEM  VSZ  RSS TTY  STAT START  TIME COMMAND
    root  32065  0.0  0.0 223192  1988 ?  Ssl  Jul22  0:00 PassengerWatchdog
    root  32068  0.0  0.0 512892  2428 ?  Sl  Jul22  0:00 PassengerHelperAgent
    nobody  32074  0.0  0.0 226616  4692 ?  Sl  Jul22  0:00 PassengerLoggingAgent
    
    Is that normal?

    uptime reports:
    Code:
    00:07:32 up 1 day, 21:19,  1 user,  load average: 7.44, 8.81, 9.37
    
    The server is up-to-date with the latest not vulnerable apache2 version 2.4.10-10+deb8u10
    See here: https://www.debian.org/security/2017/dsa-3913

    UPDATE:
    There is one issue after apt-get upgrade command:
    Code:
    The following packages have been kept back:
      hhvm
    0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
    
    Sorry for the long post. I tried to include as much information I could think of.

    Any help is much appreciated.

    UPDATE 2:
    ISPProtect full version found no issues in /var/www folder

    Kind Regards
     

    Attached Files:

    Last edited: Jul 23, 2017
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    The high load in MySQL is most likely caused by a lot of database connections, probably coming from web2. So your server is probably not hacked and that's why scan tools don't find anything, there is just some kind of DOS. Take a look into the access.log of web2 website e.g. with tail -f command, you will probably see a lot of traffic there. When you see many connects from the same IP or same IP subnet, then youshouldd consider banning them with iptables r route command.
     
  3. fatmike

    fatmike Member

    The website has already medium to high traffic(10k users/day) and was kind of difficult to identify the bots. Anyway you were right.
    I run:
    Code:
    cat 20170723-access.log | awk -F\" '{print $6}' | sort | uniq -c | sort -n
    and found 3 (bad?) bots that was abusing the website.
    AhrefsBot with over 17000 requests
    MJ12bot with over 12000 requests
    GrapeshotCrawler with over 9000 requests
    while Googlebot made 5000 requests and bingbot 4300

    AhrefsBot, MJ12bot, GrapeshotCrawler seemed to ignore Crawl-Delay or/and Disallow in robots.txt.
    Additionally they use wide range of IPs so I created a jail(rule) in fail2ban to ban those IPs for one day each.

    I'm in Virtuozzo container so iptables command throws an error (???).

    In access log I see 403 from those bots now and the load is much better.
    Is this normal?
    Additionally after fail2ban restart (maybe because of iptables error) the IPs getting unbanned.

    I also run:
    Code:
    awk -F\" '($2 ~ /\.(jpg|gif)/ && $4 !~ /^https:\/\/www\.mydomain\.com/){print $4}' 20170723-access.log \ | sort | uniq -c | sort
    and found thousand of requests from copy-content websites with hot-linking to the images in my server.
    I added a redirect in .htaccess

    Checked the slow queries in mysql log but nothing extreme found except search functionality which seems normal because of the big number of articles (~30k).
    Code:
    # Query_time: 10.092863  Lock_time: 0.000106 Rows_sent: 34  Rows_examined: 28062
    
    I found this error in /var/log/upstart/php5-fpm.log
    Code:
    [24-Jul-2017 18:39:00] WARNING: [pool web2] server reached pm.max_children setting (50), consider raising it
    [24-Jul-2017 18:39:31] WARNING: [pool web2] child 21785 exited on signal 9 (SIGKILL) after 396.086924 seconds from start
    [24-Jul-2017 18:39:31] NOTICE: [pool web2] child 22829 started
    [24-Jul-2017 18:40:20] WARNING: [pool web2] server reached pm.max_children setting (50), consider raising it
    [24-Jul-2017 18:42:05] WARNING: [pool web2] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 18 idle, and 38 total children
    [24-Jul-2017 18:42:06] WARNING: [pool web2] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 19 idle, and 40 total children
    [24-Jul-2017 18:42:07] WARNING: [pool web2] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 18 idle, and 41 total children
    [24-Jul-2017 18:42:08] WARNING: [pool web2] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 19 idle, and 43 total children
    [24-Jul-2017 18:43:54] WARNING: [pool web2] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 19 idle, and 39 total children
    [24-Jul-2017 18:45:54] WARNING: [pool web2] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 19 idle, and 45 total children
    [24-Jul-2017 18:45:55] WARNING: [pool web2] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 19 idle, and 46 total children
    [24-Jul-2017 18:46:07] WARNING: [pool web2] server reached pm.max_children setting (50), consider raising it
    [24-Jul-2017 18:47:16] WARNING: [pool web2] server reached pm.max_children setting (50), consider raising it
    
    Kind Regards
     
  4. till

    till Super Moderator Staff Member ISPConfig Developer

    If you have more RAM, then you should consider increasing the php fpm limits of that site (options tab). And check the FPM mode that you use. Personally, I prefer the ondemand mode, if PHP is new enough to support it.
     
  5. fatmike

    fatmike Member

    Thanks for responding.

    FPM with CGI and SuEXEC enabled.

    FPM in dynamic mode:
    PHP-FPM pm.max_children: 50
    PHP-FPM pm.start_servers: 25
    PHP-FPM pm.min_spare_servers: 20
    PHP-FPM pm.max_spare_servers: 30
    PHP-FPM pm.max_requests: 300

    If I raise I get this kind of error:
    Code:
    [24-Jul-2017 19:29:27] WARNING: [pool web2] child 29605 exited on signal 9 (SIGKILL) after 296.005956 seconds from start
    [24-Jul-2017 19:29:27] NOTICE: [pool web2] child 30501 started
    [24-Jul-2017 19:29:28] WARNING: [pool web2] child 29618 exited on signal 9 (SIGKILL) after 295.664036 seconds from start
    [24-Jul-2017 19:29:28] NOTICE: [pool web2] child 30508 started
    
    which seems RAM overload


    What about:
    Kind regards
     
  6. till

    till Super Moderator Staff Member ISPConfig Developer

  7. fatmike

    fatmike Member

    Thanks for your assistance and suggestions.
    Now the server's load is much lighter.

    Kind Regards
     

Share This Page