monit: cpu wait usage of 99.9% matches resource limit [cpu wait usage>20.0%]

bobpit · Mar 10, 2014

Sometimes I get these messages from monit and I can't understand why they occur or what they mean. I checked munin and I do not see anything abnormal in cpu usage.
Code:
Date:        Mon, 10 Mar 2014 13:05:27
Action:      alert
Host:        server1
Description: cpu wait usage of 99.9% matches resource limit [cpu wait usage>20.0%]
Code:
Date:        Mon, 10 Mar 2014 13:06:29
Action:      alert
Host:        server1
Description: 'xxxxxx' cpu wait usage check succeeded [current cpu wait usage=0.0%]

srijan · Mar 11, 2014

Hi

Did you checked which process was eating the resource. Use TOP then shift F then choose m or n which will sort the process according to the usage.

Further check

lsof -u (PID of max CPU utilization)

Br//
Srijan

bobpit · Mar 11, 2014

I received the message in my inbox hours after the incident. So I had no chance to investigate online the cpu usage.

MUNIN graphs of cpu revealed nothing extraordinary. Generally low cpu usage.

This is from /etc/monit/monitrc:
Code:
  check system server1.surf-anonymous.info
    if loadavg (1min) > 70 then alert
    if loadavg (5min) > 40 then alert
    if memory usage > 75% then alert
    if swap usage > 25% then alert
    if cpu usage (user) > 70% then alert
    if cpu usage (system) > 30% then alert
    if cpu usage (wait) > 20% then alert 
Does it help?

srijan · Mar 11, 2014

Hi

This is from /etc/monit/monitrc:
Code:

check system server1.surf-anonymous.info
if loadavg (1min) > 70 then alert
if loadavg (5min) > 40 then alert
if memory usage > 75% then alert
if swap usage > 25% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert

Does it help?
Click to expand...

This will not help.We can not investigate the previous occured incident.
I will suggest you to check the PID at the time of incident occurance. Use IOTOP program to check the waiting time of data waiting for the data from the hardisk read/write function.

Br//
Srijan

bobpit · Mar 11, 2014

From what I understand the incident lasts no more than 1-2 minutes, see the times of the emails. Even if I am constantly online to catch something that happens once every 3 weeks, I will not have the time to use IOTOP or anything else to extract something meaningfull. All I can do is look at the logs and the munin charts. This is how we troubleshooted all the problems up to now.

Log in or Sign up

monit: cpu wait usage of 99.9% matches resource limit [cpu wait usage>20.0%]

bobpit Member

srijan New Member HowtoForge Supporter

bobpit Member

srijan New Member HowtoForge Supporter

bobpit Member

Share This Page

Log in or Sign up

monit: cpu wait usage of 99.9% matches resource limit [cpu wait usage>20.0%]

bobpit Member

srijan New Member HowtoForge Supporter

bobpit Member

srijan New Member HowtoForge Supporter

bobpit Member

Share This Page

Useful Searches