I have three identical HP Proliant servers. All three are identical in hardware. I was having a problem getting Remote Desktop to work on these three, plus, they were all hanging quite often. So, I made a disk image that I had on a working server, and I installed that image on these three HP servers. Worked for about two weeks. Now, they hang again. When they hang, the screen is blank, the keyboard and the mouse don't function, and only a hard reboot fixes it. I work for a radio station. These servers are running TeamSpeak. We use it for remote broadcasts. That, and Gnome, is the only thing these machines run. CPU usage is 67%...with Teamspeak using 30%. Here are the logs (both /var/log/messages and /var/log/syslog) for around the times the machines crash (all three appear to lock up at the same time. /////////////////////////////////////// messages Jul 20 10:51:58 kkbjam kernel: [ 952.692211] hrtimer: interrupt took 244719 ns Jul 21 17:31:03 kkbjam kernel: [ 4920.314196] hrtimer: interrupt took 253936 ns syslog Jul 21 18:17:01 river /USR/SBIN/CRON[2339]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) messages Jul 20 10:37:01 realcountry kernel: [ 51.227233] 3:1:1: cannot get freq at ep 0x1 Jul 21 16:14:07 realcountry kernel: [ 49.828382] 3:1:1: cannot get freq at ep 0x1 syslog Jul 20 12:17:01 realcountry /USR/SBIN/CRON[2111]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 21 17:17:01 realcountry /USR/SBIN/CRON[2059]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) messages Jul 20 10:36:57 mix kernel: [ 50.837271] 3:1:1: cannot get freq at ep 0x1 Jul 21 16:08:40 mix kernel: [ 175.159294] hrtimer: interrupt took 310925 ns Jul 21 16:38:59 mix rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="926" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'. syslog Jul 20 12:17:01 mix /USR/SBIN/CRON[2114]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 21 18:17:01 mix/USR/SBIN/CRON[2310]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) //////////////////////////////////////// Any ideas what's causing these machines to hang at the same time?
Well from these log snippets I think nobody will really find anything interesting. You could additionally try activating auditd (see e.g. http://doc.opensuse.org/products/dr...d_draft/art.auditquick.html#sec.audit.qs.conf for more info) and check if there are any interesting results after a necessary reboot.
Our electric provider "ripples" us when demand is high. That means they cut our power, so that they have enough to meet the demand. For this, we receive a discount on our monthly bill. We have a generator that kicks in to power the building (including these servers) when we are rippled. There is also a UPS hooked up to the server rack that these servers reside in. I was next to the rack looking through the logs this morning...and the generator kicked in...and the servers all hung! Turns out, I have a bad UPS!