I've had my server setup for almost a week now and things have been pretty wonderful; running on a quadcore with 2 GB of ram. That said, this morning I woke up and it was.. down? Well, not exactly. It's primarily a web server. I tried to access its pages and it didn't timeout, but it was impossible to connect. Sometimes I was able to get little bits of the webpages. SSH was unable to connect. Eventually, it got back up to normal speed. How to I determine what it was doing during this time? I checked the apache error log, but there was nothing relevant there. When I did connect, I ran tops and found that 'yum-updatesd' had 6% of the MEM and large values in the other columns. I stopped this service and removed updatesd. Did I solve the issue?
I wasn't able to see anything of note in the daily crons. I think removing updatesd stopped it. I haven't had an issue until today. It seemed it was completely down (not taking a long time to do things, but frozen) and I had to reset the box. How do I go about finding out what locked up the server? In the apache logs, I see that the server was spammed by robots for a little while, but beyond that, ... thanks.
Code: no crontab for root I guess I need to figure out where the system logs reside and look for any error messages. This is my typical top output... I guess mysql runs indefinitely and I haven't figured out how to stop gdm yet; but as you can see, not much is going on. A crontab task does seem like a good theory. Code: top - 16:50:29 up 1 day, 7:11, 3 users, load average: 0.04, 0.02, 0.00 Tasks: 158 total, 1 running, 156 sleeping, 1 stopped, 0 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.4%id, 0.0%wa, 0.2%hi, 0.2%si, 0.0%st Mem: 2062324k total, 914924k used, 1147400k free, 151492k buffers Swap: 2031608k total, 0k used, 2031608k free, 519088k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6877 mysql 20 0 279m 37m 4432 S 0 1.9 6:14.42 mysqld 7287 gdm 20 0 270m 33m 16m S 0 1.6 0:00.41 gdmgreeter 3375 apache 20 0 269m 16m 3836 S 0 0.8 0:13.19 httpd 3373 apache 20 0 268m 15m 3876 S 0 0.8 0:13.62 httpd 3378 apache 20 0 268m 15m 4052 S 0 0.8 0:14.53 httpd 3376 apache 20 0 268m 15m 3832 S 0 0.8 0:13.23 httpd 3377 apache 20 0 268m 15m 3724 S 0 0.7 0:13.68 httpd 1990 apache 20 0 268m 14m 3704 S 0 0.7 0:09.71 httpd 3374 apache 20 0 265m 13m 4252 S 0 0.7 0:14.19 httpd 3369 root 20 0 263m 12m 6524 S 0 0.6 0:00.14 httpd 3371 apache 20 0 265m 12m 4060 S 0 0.6 0:15.49 httpd 3372 apache 20 0 265m 12m 3824 S 0 0.6 0:13.30 httpd 13357 apache 20 0 265m 12m 3308 S 0 0.6 0:00.52 httpd 7261 root 20 0 82360 8808 5436 S 0 0.4 0:02.43 Xorg 2419 root 20 0 166m 7020 796 S 0 0.3 0:00.01 python 7223 root 20 0 185m 4884 4012 S 0 0.2 0:00.03 gdm-binary 7289 gdm 20 0 98.9m 3440 2732 S 0 0.2 0:05.08 at-spi-registry 2878 haldaemo 20 0 24100 3364 2548 S 0 0.2 0:00.22 hald 3184 root 20 0 75884 2816 2180 S 0 0.1 0:00.34 sshd 3735 root 20 0 75884 2812 2180 S 0 0.1 0:00.25 sshd 3292 root 20 0 75880 2808 2180 S 0 0.1 0:00.35 sshd 7291 gdm 20 0 184m 2740 1908 S 0 0.1 0:00.11 bonobo-activati 7257 root 20 0 198m 2664 1832 S 0 0.1 0:00.00 gdm-binary 2431 root 20 0 134m 2296 1420 S 0 0.1 0:00.00 cupsd 2627 postfix 20 0 49924 2236 1720 S 0 0.1 0:00.02 qmgr 3291 postfix 20 0 49860 2204 1724 S 0 0.1 0:00.01 tlsmgr 2622 root 20 0 49800 2156 1668 S 0 0.1 0:00.03 master 13781 postfix 20 0 49868 2116 1648 S 0 0.1 0:00.00 pickup 2675 root 20 0 157m 1952 1200 S 0 0.1 0:00.00 console-kit-dae
I ran into similar issues yesterday. I'm not good enough with linux to know where to look. But it seemed like eth0 just randomly went down. I restarted its network and it went back up for a bit, but then died again right afterwards. Rebooting seemed to solve the issue. On rebooting, I noticed that it failed shutting down postfix; but perhaps that's unrelated. It's offline again this morning, here's what /var/logs/messages had: Sep 12 04:04:57 localhost syslogd 1.4.2: restart. Sep 12 08:26:40 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out Sep 12 08:26:40 localhost kernel: sky2 eth0: tx timeout Sep 12 08:26:40 localhost kernel: sky2 eth0: disabling interface Sep 12 08:26:40 localhost kernel: sky2 eth0: enabling interface Sep 12 08:26:40 localhost kernel: sky2 eth0: ram buffer 0K Sep 12 08:26:43 localhost kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both Sep 12 09:15:18 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out Sep 12 09:15:18 localhost kernel: sky2 eth0: tx timeout Sep 12 09:15:18 localhost kernel: sky2 eth0: disabling interface Sep 12 09:15:18 localhost kernel: sky2 eth0: enabling interface Sep 12 09:15:18 localhost kernel: sky2 eth0: ram buffer 0K Sep 12 09:15:21 localhost kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
This is really frustrating, it just went offline again. I just upgraded my router to a 10/100/1000.. so I guess this is the cause, but I'm not sure of the solution.
That's normal on Fedora, nothing to worry about. Maybe it's a problem with the driver for your network card. Do you have another network card that you can try?
Same problem I´m having the same problem, running Fedora 6 for about 6 months without a problem and since Sept 10th, is a mess; I was checking the System monitor and in thje proccess window appear the netstat as "zombie"??? is this normal??
Thats no problem and might happen from time to time. Please try to kill the process with: kill -9 id_of_the_process
I tried to kill netstat but I got the message "arguments nust be process or job IDs", however it seems that netstat is not the problem, I have seen that the processor and memory usage are about the 98% to 100% is this normal too?
You must find out the process ID of the netstat process, e.g. with Code: ps aux , and then you use this process ID (e.g. 1234) to kill the process: Code: kill -9 1234
We increase the SPAM filtering and is already working perfectly again, it seems that we had a spammer trying to send a lot of mail thru us. Thanks for the help
Im having a very similar issue. FC7 internet servers seem to randomly go down. Ping works, but SSH, HTTP, and Dovecot all refuse to respond. A small time later, they work again. Restarting the services seems to get them back up aswell.