Hello, One of my production servers has recently began having high load. When I run 'uptime' command it shows a value of 30 or even 40. That particular server has two websites (WordPress and a custom app) and owncloud. It serves 90 mailboxes. Hardware specification is: CPU Q6600, 8 GB RAM, 250 GB HDD (160 GB used). When users send mail (if it goes through), they have 3 or 4 copies of the sent mail in sent folder. Database sync takes forever. The server is generally unresponsive. What should I do? Please help.
You should first check the output of top, iotop and ps aux to check which processes are using that much power. Often this caused by hacked webs or mail accounts spamming or attacking other servers. Or it might be a hardware problem (e.g. hdd failure), you could check with smartctl if you have high CPU wait time.
Thank you very much for a fast response. I just ran 'smartctl --all /dev/sda' and this is what it returned: http://paste.ofcode.org/TfeL7T6AigEJGe8ZUg2ytS Could you please help me out? Does this mean that I have to replace HDD? If so, how to migrate data and should I use 250 GB HDD or I could go with 1 TB?
iostat shows: Code: root@vps3:~# iostat Linux 3.2.0-4-686-pae (vps3.goinfobl.com) 05/15/2015 _i686_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.83 0.53 0.28 23.47 0.00 74.89 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 15.69 158.30 98.29 1452663 901992 root@vps3:~#
What about running processes with high cpu wait time and run time? Check for processes running as web user or non-root in general.
Okay, for mee that seems to be quite too much load and cpu wait for the listed processes. Could you please install "iotop" and watch it's output for processes that have high i/o? Additionally check syslog for error messages.
chkrootkit: http://paste.ofcode.org/bQTSZd5dxeSHFPD5EibDvr rkhunter: http://paste.ofcode.org/GLEQtd5qkHuQiXvgNCQbW3 rkhunter summary: Code: System checks summary ===================== File properties checks... Required commands check failed Files checked: 137 Suspect files: 13 Rootkit checks... Rootkits checked : 307 Possible rootkits: 0 Applications checks... All checks skipped The system checks took: 26 minutes and 54 seconds All results have been written to the log file: /var/log/rkhunter.log One or more warnings have been found while checking the system. Please check the log file (/var/log/rkhunter.log)
Thats both ok, as far as I can see. The BINDSHELL ifection is a known false positive from chkrootkit, so nothing to worry about. This is a physical server or is it virtualized?
Just a note: I've put owncloud in inactive state before writing the first post in this thread. So all the measurements shown are with owncloud disabled.
Try to find out which service is causing the lod. Gets the load back to normal when you stop apache, when you stop postfix or when you stop dovecot or dos none of these services has a real impact?
Till, I didn't have the chance to find that out. This morning I couldn't log on to server ant went to my client's server room. The message on the display was 'Oh no! Something has gone wrong' So I got a new HDD and started cloning with CloneZilla, since I'm pretty sure that it's due to a faulty HDD. All of this started to happen when ownCloud caused HDD to be 92% full. Than I freed out some space (to 72%), put ownCloud in maintenance mode and started this thread. I just hope the cloning goes well. I'll update when it's done.
Update: CloneZilla finished (with -rescue option) and everything works fine. Now I have 1TB HDD with these partitions: Code: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk ├─sda1 8:1 0 223.4G 0 part / ├─sda2 8:2 0 1K 0 part └─sda5 8:5 0 9.5G 0 part [SWAP] root@vps3:~# Should I enlarge sda1 or sda1 AND sda5?