monit alerts -- connection failed

Discussion in 'General' started by Jaom, Oct 29, 2014.

  1. Jaom

    Jaom New Member

    I'm running debian 7.7 and I've used "the perfect server" tutorial to set up the server (with apache).

    My problem is that monit is constantly (it's 3-5 times a day) giving me alerts like this:
    Code:
    Connection failed Service amavisd
    
            Date:        Wed, 29 Oct 2014 12:41:22
    
            Action:      restart
            Host:        jaom.pl
            Description: failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable
    
    
    Your faithful employee,
    Monit
    
    usually followed by another mail informing that connection succeeded.
    This is mostly about amavisd however sometimes I'm also given same note regarding sshd.

    First I was sure this was resource issue on the server, so I've moved from 1 core and 1GB RAM to 2 cores with 2GB of RAM (my VPS is in fact an OpenVZ container). However, little did it help.
     
    Last edited: Oct 29, 2014
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    Check your log files especially the syslog for errors at the timepoints were you receive the alerts.
     
  3. Jaom

    Jaom New Member

    Error timings:
    http://i.imgur.com/tmIAz7J.png

    Syslog entries:
    Code:
    root@jaom:/var/log# cat syslog | grep 'amavisd'
    Oct 29 02:17:30 jaom monit[26207]: 'amavisd' failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable
    Oct 29 02:17:31 jaom monit[26207]: 'amavisd' trying to restart
    Oct 29 02:17:31 jaom monit[26207]: 'amavisd' stop: /etc/init.d/amavis
    Oct 29 02:17:31 jaom monit[26207]: 'amavisd' start: /etc/init.d/amavis
    Oct 29 02:18:37 jaom monit[26207]: 'amavisd' connection succeeded to INET[localhost:10024] via TCP
    Oct 29 03:18:02 jaom /USR/SBIN/CRON[26890]: (amavis) CMD (test -e /usr/sbin/amavisd-new-cronjob && /usr/sbin/amavisd-new-cronjob sa-sync)
    Oct 29 06:18:02 jaom /USR/SBIN/CRON[1498]: (amavis) CMD (test -e /usr/sbin/amavisd-new-cronjob && /usr/sbin/amavisd-new-cronjob sa-sync)
    Oct 29 09:18:05 jaom /USR/SBIN/CRON[8610]: (amavis) CMD (test -e /usr/sbin/amavisd-new-cronjob && /usr/sbin/amavisd-new-cronjob sa-sync)
    Oct 29 12:18:02 jaom /USR/SBIN/CRON[15791]: (amavis) CMD (test -e /usr/sbin/amavisd-new-cronjob && /usr/sbin/amavisd-new-cronjob sa-sync)
    Oct 29 12:41:22 jaom monit[26207]: 'amavisd' failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable
    Oct 29 12:41:22 jaom monit[26207]: 'amavisd' trying to restart
    Oct 29 12:41:22 jaom monit[26207]: 'amavisd' stop: /etc/init.d/amavis
    Oct 29 12:41:22 jaom monit[26207]: 'amavisd' start: /etc/init.d/amavis
    Oct 29 12:42:26 jaom monit[26207]: 'amavisd' connection succeeded to INET[localhost:10024] via TCP
    
    It's pretty much everything in the log (nothing interesting with similar timing):
    Code:
    Oct 29 12:40:11 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1
    Oct 29 12:40:11 jaom pure-ftpd: (?@::1) [INFO] Logout.
    Oct 29 12:40:16 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1
    Oct 29 12:40:16 jaom pure-ftpd: ([email protected]) [INFO] Logout.
    Oct 29 12:41:02 jaom /USR/SBIN/CRON[16893]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done)
    Oct 29 12:41:04 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1
    Oct 29 12:41:04 jaom pure-ftpd: (?@::1) [INFO] Logout.
    Oct 29 12:41:17 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1
    Oct 29 12:41:17 jaom pure-ftpd: ([email protected]) [INFO] Logout.
    Oct 29 12:41:22 jaom monit[26207]: 'amavisd' failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable
    Oct 29 12:41:22 jaom monit[26207]: 'amavisd' trying to restart
    Oct 29 12:41:22 jaom monit[26207]: 'amavisd' stop: /etc/init.d/amavis
    Oct 29 12:41:22 jaom monit[26207]: 'amavisd' start: /etc/init.d/amavis
    Oct 29 12:42:01 jaom /USR/SBIN/CRON[16964]: (web18) CMD (/var/www/xxxxxxxxxxxxx/private/digitalocean/checkBBB.sh #xxxxxxxxxxxxx)
    Oct 29 12:42:01 jaom /USR/SBIN/CRON[16963]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done)
    Oct 29 12:42:04 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1
    Oct 29 12:42:04 jaom pure-ftpd: (?@::1) [INFO] Logout.
    Oct 29 12:42:26 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1
    Oct 29 12:42:26 jaom pure-ftpd: ([email protected]) [INFO] Logout.
    Oct 29 12:42:26 jaom monit[26207]: 'amavisd' connection succeeded to INET[localhost:10024] via TCP
    Oct 29 12:43:01 jaom /USR/SBIN/CRON[17004]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done)
    Oct 29 12:43:04 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1
    Oct 29 12:43:04 jaom pure-ftpd: (?@::1) [INFO] Logout.
    Oct 29 12:43:30 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1
    Oct 29 12:43:30 jaom pure-ftpd: ([email protected]) [INFO] Logout.
    Oct 29 12:44:01 jaom /USR/SBIN/CRON[17029]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done)
    Oct 29 12:44:01 jaom /USR/SBIN/CRON[17030]: (web18) CMD (/var/www/xxxxxxxxxxxxx/private/digitalocean/checkBBB.sh #xxxxxxxxxxxxx)
    Oct 29 12:44:05 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1
    Oct 29 12:44:05 jaom pure-ftpd: (?@::1) [INFO] Logout.
    Oct 29 12:44:31 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1
    Oct 29 12:44:31 jaom pure-ftpd: ([email protected]) [INFO] Logout.
    Oct 29 12:45:01 jaom /USR/SBIN/CRON[17064]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done)
    Oct 29 12:45:01 jaom /USR/SBIN/CRON[17065]: (getmail) CMD (/usr/local/bin/run-getmail.sh > /dev/null 2>> /dev/null)
    
     
  4. bubaweb

    bubaweb New Member

    Dear Till,
    i didnt opened a new thread but i have similar problem.
    This problem in 2 times help me to fight spam some1 was sending from my server, but now there are only 40 mails in queue and I cant send them.

    I received same alert by mail:
    Code:
    Connection failed Service amavisd 
    
    	Date:        Wed, 17 Dec 2014 16:42:01
    	Action:      restart
    	Host:        xxx.yyyyyy.it
    	Description: failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable
    
    
    Your faithful employee,
    After 1 min new mail:
    Code:
    Connection succeeded Service amavisd 
    
    	Date:        Wed, 17 Dec 2014 16:43:02
    	Action:      alert
    	Host:        xxx.yyyyyy.it
    	Description: connection succeeded to INET[localhost:10024] via TCP
    
    Your faithful employee,
    Monit
    Now I check syslog same 16.40.00

    Code:
    Dec 17 16:42:00 xxx postfix/smtp[16598]: D4448FC08BE: to=<[email protected]>, relay=mx.disegnioriginali.net[62.149.128.212]:25, delay=0.3, delays=0.06/0/0.23/0.01, dsn=2.0.0, status=sent (250 2.0.0 Ufhz1p01j1SSKAu01fhzLP mail accepted for delivery)
    Dec 17 16:42:00 xxx postfix/qmgr[12400]: D4448FC08BE: removed
    Dec 17 16:42:01 xxx /USR/SBIN/CRON[16612]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done)
    Dec 17 16:42:01 xxx monit[4297]: 'amavisd' failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable#012
    Dec 17 16:42:01 xxx postfix/smtpd[13524]: connect from localhost[127.0.0.1]
    Dec 17 16:42:01 xxx postfix/smtpd[13524]: 7CE46FC08BE: client=localhost[127.0.0.1]
    Dec 17 16:42:01 xxx postfix/cleanup[16482]: 7CE46FC08BE: message-id=<[email protected]>
    Dec 17 16:42:01 xxx postfix/smtpd[13524]: disconnect from localhost[127.0.0.1]
    Dec 17 16:42:01 xxx postfix/qmgr[12400]: 7CE46FC08BE: from=<[email protected]>, size=813, nrcpt=1 (queue active)
    Dec 17 16:42:01 xxx monit[4297]: 'amavisd' trying to restart
    Dec 17 16:42:01 xxx monit[4297]: 'amavisd' stop: /etc/init.d/amavis
    Dec 17 16:42:01 xxx postfix/smtpd[16481]: disconnect from localhost[127.0.0.1]
    Dec 17 16:42:01 xxx postfix/smtpd[16491]: disconnect from localhost[127.0.0.1]
    Dec 17 16:42:01 xxx monit[4297]: 'amavisd' start: /etc/init.d/amavis
    Dec 17 16:42:01 xxx postfix/smtp[16468]: C9CADFC2307: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10024, conn_use=8, delay=23385, delays=23335/47/0/2.9, dsn=4.3.2, status=deferred (host 127.0.0.1[127.0.0.1] said: 421 4.3.2 Service shutting down, closing channel (in reply to end of DATA command))
    Dec 17 16:42:01 xxx postfix/smtp[16474]: C516BFC2381: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10024, conn_use=8, delay=21071, delays=21021/48/0/1.7, dsn=4.3.2, status=deferred (host 127.0.0.1[127.0.0.1] said: 421 4.3.2 Service shutting down, closing channel (in reply to end of DATA command))
    Dec 17 16:42:01 xxx postfix/smtp[16468]: connect to 127.0.0.1[127.0.0.1]:10024: Connection refused
    Dec 17 16:42:01 xxx postfix/smtp[16474]: connect to 127.0.0.1[127.0.0.1]:10024: Connection refused
    I dont know what check now.
    Any suggest ?
    Thx
     
  5. Jaom

    Jaom New Member

    Sad to say I haven't solved my problem yet. :/
    And yes it was typical for postfix to queue up mails for days and with large amount of queued up mails - amavisd was crushing over and over again when restarted.

    I moved from VPS 1GB RAM 1 core to 2 cores and 2 GB suspecting that lack of resources might be the problem. Helped a bit - first of all postfix is running just fine now and everything seems to be working (well, not crushing is what i mean).

    Sad to say I still get fair amount of connection amavisd succeeded / connection failed pair of mails (always in this order - first one I receive is succeeded). Occasionally I'm getting also sshd / pureftpd / apache connection failed emails. Especially when server is more loaded (still really far from using even 3/4 of the resources...).

    I suspect that this might be some kind of fault on the OpenVZ side (my VPS is an OpenVZ container).

    The fact is, hoverer, that amavisd seems to crash every time when server is overloaded, even for a short time, and if it crashes - it doesn't feel like going up. If there is a postfix queue for, let's say, 300 mails - amavisd will check ~ 10-150 mails after manual restart and crash again. Kind of pain in the back.
     
  6. till

    till Super Moderator Staff Member ISPConfig Developer

    Check with:

    cat /proc/user_beancounters

    if any limits of your vm are reached.
     
  7. Jaom

    Jaom New Member

    I'm not exactly sure if I met any:
    Code:
    Version: 2.5
           uid  resource                     held              maxheld              barrier                limit              failcnt
        72265:  kmemsize                 84142864            231481344  9223372036854775807  9223372036854775807                    0
                lockedpages                     0                 4175               524288               524288                    0
                privvmpages                834176              1283849  9223372036854775807  9223372036854775807                    0
                shmpages                    17705                22137  9223372036854775807  9223372036854775807                    0
                dummy                           0                    0  9223372036854775807  9223372036854775807                    0
                numproc                       166                  252  9223372036854775807  9223372036854775807                    0
                physpages                  342777               536765                    0               524288                    0
                vmguarpages                     0                    0               557056  9223372036854775807                    0
                oomguarpages               345527               416786               524288  9223372036854775807                    0
                numtcpsock                     39                  142  9223372036854775807  9223372036854775807                    0
                numflock                       11                   36  9223372036854775807  9223372036854775807                    0
                numpty                          2                   10  9223372036854775807  9223372036854775807                    0
                numsiginfo                      1                  123  9223372036854775807  9223372036854775807                    0
                tcpsndbuf                  682472              3085624  9223372036854775807  9223372036854775807                    0
                tcprcvbuf                  638976              2791704  9223372036854775807  9223372036854775807                    0
                othersockbuf               333176              3110024  9223372036854775807  9223372036854775807                    0
                dgramrcvbuf                     0               313920  9223372036854775807  9223372036854775807                    0
                numothersock                  221                  500                  500                  500                   26
                dcachesize               13742546            163168231  9223372036854775807  9223372036854775807                    0
                numfile                      2503                 4415  9223372036854775807  9223372036854775807                    0
                dummy                           0                    0  9223372036854775807  9223372036854775807                    0
                dummy                           0                    0  9223372036854775807  9223372036854775807                    0
                dummy                           0                    0  9223372036854775807  9223372036854775807                    0
                numiptent                      40                   47  9223372036854775807  9223372036854775807                    0
    
    This numothersock is kind of disturbing - it surely sounds like it might be the root of every evil.
     
  8. bubaweb

    bubaweb New Member

    so it is "only" a lack of resources ?
    If so it's ok.
    How fix it ? Restatart server ?
    Wait to delivery all queue ?
    Thx for any other onfo
     
  9. till

    till Super Moderator Staff Member ISPConfig Developer

    Yes, the numothersock is most likely the cause. When it is reached, then amavis can not create any new sockets and the process dies. A amavis restart should fix that, maybe postfix needs to be restarted too. The mails in the queue will processed then automatically.
     

Share This Page