Last night my debian lenny server went under very high load and went of line for most of the night. This morning I can now ssh into it again but not having much luck restoring it back to a normal condition. The load is currently about 21. The server is a vds located in a data centre so i'm having difficulty getting a hard reboot and soft reboot doesn't do anything. I issue the reboot command and it broadcasts the shutdown message but nothing happens. This is what the processes look like: Code: badbison:~# ps -aux Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 10316 420 ? Ss Nov06 0:01 init [2] root 2 0.0 0.0 0 0 ? S< Nov06 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? S< Nov06 0:00 [migration/0] root 4 0.0 0.0 0 0 ? S< Nov06 0:00 [ksoftirqd/0] root 5 0.0 0.0 0 0 ? S< Nov06 0:12 [watchdog/0] root 6 0.0 0.0 0 0 ? S< Nov06 0:01 [events/0] root 7 0.0 0.0 0 0 ? S< Nov06 0:00 [khelper] root 41 0.0 0.0 0 0 ? S< Nov06 0:00 [kblockd/0] root 43 0.0 0.0 0 0 ? S< Nov06 0:00 [kacpid] root 44 0.0 0.0 0 0 ? S< Nov06 0:00 [kacpi_notify] root 170 0.0 0.0 0 0 ? S< Nov06 0:00 [ksuspend_usbd] root 176 0.0 0.0 0 0 ? S< Nov06 0:00 [khubd] root 179 0.0 0.0 0 0 ? S< Nov06 0:00 [kseriod] root 222 0.0 0.0 0 0 ? S< Nov06 0:00 [kswapd0] root 223 0.0 0.0 0 0 ? S< Nov06 0:00 [aio/0] root 775 0.0 0.0 0 0 ? S< Nov06 0:00 [ata/0] root 776 0.0 0.0 0 0 ? S< Nov06 0:00 [ata_aux] root 961 0.0 0.0 0 0 ? S< Nov06 0:00 [scsi_eh_0] root 1051 0.0 0.0 0 0 ? S< Nov06 0:02 [kjournald] root 1132 0.0 0.0 16836 340 ? S<s Nov06 0:00 udevd --daemon root 1606 0.0 0.0 0 0 ? S< Nov06 0:00 [kpsmoused] root 1895 0.0 0.0 0 0 ? R< Nov06 0:00 [kjournald] daemon 2086 0.0 0.0 8024 312 ? Ss Nov06 0:00 /sbin/portmap root 2415 0.0 0.0 12204 792 ? D 08:07 0:00 shutdown -r 0 w root 2447 0.0 0.0 0 0 ? S< Nov06 0:00 [vmmemctl] root 2541 0.0 0.1 67552 1620 ? Sl Nov06 0:31 /usr/sbin/vmtoo root 2562 0.0 0.1 122264 1352 ? Sl Nov06 0:01 /usr/sbin/rsysl root 2578 0.0 0.0 3800 332 ? Ss Nov06 0:00 /usr/sbin/acpid root 2589 0.0 0.0 48872 652 ? Ss Nov06 0:00 /usr/sbin/sshd root 2631 0.0 0.0 17320 404 ? S Nov06 0:00 /bin/sh /usr/bi mysql 2668 0.0 2.1 650620 22328 ? Sl Nov06 0:34 /usr/sbin/mysql root 2669 0.0 0.0 3784 368 ? R Nov06 0:00 logger -p daemo root 2836 0.0 0.0 6064 312 ? S Nov06 0:00 /usr/sbin/couri root 2837 0.0 0.0 29600 420 ? R Nov06 0:00 /usr/lib/courie root 2845 0.0 0.0 29600 96 ? R Nov06 0:00 /usr/lib/courie root 2846 0.0 0.0 29600 96 ? R Nov06 0:00 /usr/lib/courie root 2847 0.0 0.0 29600 96 ? R Nov06 0:00 /usr/lib/courie root 2848 0.0 0.0 29600 96 ? R Nov06 0:00 /usr/lib/courie root 2849 0.0 0.0 29600 96 ? R Nov06 0:00 /usr/lib/courie root 2852 0.0 0.0 6064 400 ? S Nov06 0:00 /usr/sbin/couri root 2853 0.0 0.0 9236 424 ? S Nov06 0:00 /usr/sbin/couri root 2864 0.0 0.0 6064 248 ? S Nov06 0:00 /usr/sbin/couri root 2865 0.0 0.0 9236 340 ? S Nov06 0:00 /usr/sbin/couri root 2870 0.0 0.0 6064 400 ? S Nov06 0:00 /usr/sbin/couri root 2871 0.0 0.0 9236 424 ? S Nov06 0:00 /usr/sbin/couri root 2876 0.0 0.0 12204 796 ? D 08:13 0:00 shutdown -r 0 w root 2882 0.0 0.0 6064 248 ? S Nov06 0:00 /usr/sbin/couri root 2883 0.0 0.0 9236 340 ? S Nov06 0:00 /usr/sbin/couri nobody 2887 0.0 0.0 26072 640 ? Ss Nov06 0:00 /usr/local/sbin postfix 2970 0.0 0.0 52064 596 ? R Nov06 0:00 qmgr -l -t fifo root 2972 0.0 0.0 31992 316 ? Rs Nov06 0:00 pure-ftpd (SERV root 2995 0.0 0.0 56460 556 ? Ss Nov06 0:00 /usr/sbin/sasla root 2996 0.0 0.0 56940 544 ? S Nov06 0:00 /usr/sbin/sasla root 2997 0.0 0.0 56700 544 ? S Nov06 0:00 /usr/sbin/sasla root 2998 0.0 0.0 56820 544 ? S Nov06 0:00 /usr/sbin/sasla root 2999 0.0 0.0 56340 544 ? S Nov06 0:00 /usr/sbin/sasla root 3008 0.0 0.0 24428 268 ? Ss Nov06 0:00 /usr/sbin/famd ntp 3023 0.0 0.0 22388 816 ? Ss Nov06 0:00 /usr/sbin/ntpd root 3044 0.0 0.0 19836 560 ? Ss Nov06 0:00 /usr/sbin/cron root 3059 0.0 0.5 92640 5320 ? Ss Nov06 0:02 /usr/sbin/apach root 3113 0.0 0.2 40368 2252 ? Ss Nov06 0:02 /usr/sbin/munin root 3185 0.0 0.2 171980 3072 ? Sl Nov06 0:54 /usr/bin/python root 3273 0.0 0.0 3800 328 tty1 Ss+ Nov06 0:00 /sbin/getty 384 root 3275 0.0 0.0 3800 328 tty2 Ss+ Nov06 0:00 /sbin/getty 384 root 3277 0.0 0.0 3800 328 tty3 Ss+ Nov06 0:00 /sbin/getty 384 root 3279 0.0 0.0 3800 328 tty4 Ss+ Nov06 0:00 /sbin/getty 384 root 3281 0.0 0.0 3800 328 tty5 Ss+ Nov06 0:00 /sbin/getty 384 root 3283 0.0 0.0 3800 328 tty6 Ss+ Nov06 0:00 /sbin/getty 384 root 3775 0.0 0.2 65936 3088 ? Ss 08:21 0:00 sshd: root@pts/ root 3777 0.0 0.1 18812 1872 pts/2 Ss 08:21 0:00 -bash root 3833 0.0 0.0 12204 788 pts/2 D+ 08:22 0:00 shutdown -r 0 w root 3915 0.0 0.2 65936 3084 ? Ss 08:24 0:00 sshd: root@pts/ root 3917 0.0 0.1 18812 1884 pts/3 Ss 08:25 0:00 -bash root 4387 0.0 0.0 10300 724 pts/3 S 08:29 0:00 /bin/bash /etc/ root 4388 0.0 0.1 26728 2044 pts/3 S 08:29 0:00 /usr/bin/mysqla root 4699 0.0 0.0 3652 408 pts/3 D+ 08:31 0:00 reboot -f root 4735 0.0 0.3 66076 3100 ? Ss 08:32 0:00 sshd: root@pts/ root 4737 0.0 0.1 18820 1908 pts/4 Ss 08:32 0:00 -bash root 5916 0.0 0.0 5052 532 pts/4 D+ 08:47 0:00 sync root 5944 0.0 0.2 65936 3088 ? Rs 08:48 0:00 sshd: root@pts/ root 5952 0.0 0.1 18812 1880 pts/5 Rs 08:48 0:00 -bash root 6237 0.0 0.1 16016 1112 pts/5 R+ 08:50 0:00 ps -aux root 6398 0.0 0.0 0 0 ? S 01:38 0:00 [pdflush] www-data 9966 82.3 0.1 41072 1568 ? Ssl Nov06 572:57 ./m64 -o stratu root 11352 0.0 0.0 28384 624 ? S 04:41 0:00 /USR/SBIN/CRON root 11353 0.0 0.0 8832 716 ? Ss 04:41 0:00 /bin/sh -c / root 11354 0.0 0.0 9060 740 ? S 04:41 0:00 /bin/bash /etc/ root 11462 0.0 0.0 21440 752 ? S 04:41 0:00 mysqlcheck -uro www-data 14231 0.0 0.2 29180 2220 ? S 06:08 0:00 sysprot www-data 15414 0.0 0.1 165944 1832 ? S 06:50 0:00 /usr/sbin/apach www-data 15435 0.0 0.1 249996 1428 ? R 06:50 0:00 /usr/sbin/apach www-data 15437 0.0 0.1 250112 1308 ? R 06:50 0:00 /usr/sbin/apach web137 15527 0.0 4.0 201256 42044 ? R 06:54 0:00 /usr/bin/php-cg www-data 15658 49.2 0.1 29180 1484 ? R 06:56 56:08 watchdog www-data 15718 49.2 0.1 29180 1808 ? R 06:56 56:07 watchdog nobody 15823 0.0 0.1 26304 1036 ? S 06:56 0:00 /usr/local/sbin root 17885 0.0 0.0 0 0 ? S 06:58 0:00 [pdflush] www-data 27449 0.0 0.1 29172 1200 ? S Nov06 0:03 -bash I notice watchdog is listed twice and taking up 49% of the cpu. Could I have some help getting this back to normal please. I'd rather not if possible get the data centre to hard reboot it. Thanks
Could you install "iotop" and check if any processes use a lot of disk io? And have a look at "top" what swap amount is used.