I'm running debian 7.7 and I've used "the perfect server" tutorial to set up the server (with apache). My problem is that monit is constantly (it's 3-5 times a day) giving me alerts like this: Code: Connection failed Service amavisd Date: Wed, 29 Oct 2014 12:41:22 Action: restart Host: jaom.pl Description: failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable Your faithful employee, Monit usually followed by another mail informing that connection succeeded. This is mostly about amavisd however sometimes I'm also given same note regarding sshd. First I was sure this was resource issue on the server, so I've moved from 1 core and 1GB RAM to 2 cores with 2GB of RAM (my VPS is in fact an OpenVZ container). However, little did it help.
Error timings: http://i.imgur.com/tmIAz7J.png Syslog entries: Code: root@jaom:/var/log# cat syslog | grep 'amavisd' Oct 29 02:17:30 jaom monit[26207]: 'amavisd' failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable Oct 29 02:17:31 jaom monit[26207]: 'amavisd' trying to restart Oct 29 02:17:31 jaom monit[26207]: 'amavisd' stop: /etc/init.d/amavis Oct 29 02:17:31 jaom monit[26207]: 'amavisd' start: /etc/init.d/amavis Oct 29 02:18:37 jaom monit[26207]: 'amavisd' connection succeeded to INET[localhost:10024] via TCP Oct 29 03:18:02 jaom /USR/SBIN/CRON[26890]: (amavis) CMD (test -e /usr/sbin/amavisd-new-cronjob && /usr/sbin/amavisd-new-cronjob sa-sync) Oct 29 06:18:02 jaom /USR/SBIN/CRON[1498]: (amavis) CMD (test -e /usr/sbin/amavisd-new-cronjob && /usr/sbin/amavisd-new-cronjob sa-sync) Oct 29 09:18:05 jaom /USR/SBIN/CRON[8610]: (amavis) CMD (test -e /usr/sbin/amavisd-new-cronjob && /usr/sbin/amavisd-new-cronjob sa-sync) Oct 29 12:18:02 jaom /USR/SBIN/CRON[15791]: (amavis) CMD (test -e /usr/sbin/amavisd-new-cronjob && /usr/sbin/amavisd-new-cronjob sa-sync) Oct 29 12:41:22 jaom monit[26207]: 'amavisd' failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable Oct 29 12:41:22 jaom monit[26207]: 'amavisd' trying to restart Oct 29 12:41:22 jaom monit[26207]: 'amavisd' stop: /etc/init.d/amavis Oct 29 12:41:22 jaom monit[26207]: 'amavisd' start: /etc/init.d/amavis Oct 29 12:42:26 jaom monit[26207]: 'amavisd' connection succeeded to INET[localhost:10024] via TCP It's pretty much everything in the log (nothing interesting with similar timing): Code: Oct 29 12:40:11 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1 Oct 29 12:40:11 jaom pure-ftpd: (?@::1) [INFO] Logout. Oct 29 12:40:16 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1 Oct 29 12:40:16 jaom pure-ftpd: ([email protected]) [INFO] Logout. Oct 29 12:41:02 jaom /USR/SBIN/CRON[16893]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done) Oct 29 12:41:04 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1 Oct 29 12:41:04 jaom pure-ftpd: (?@::1) [INFO] Logout. Oct 29 12:41:17 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1 Oct 29 12:41:17 jaom pure-ftpd: ([email protected]) [INFO] Logout. Oct 29 12:41:22 jaom monit[26207]: 'amavisd' failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable Oct 29 12:41:22 jaom monit[26207]: 'amavisd' trying to restart Oct 29 12:41:22 jaom monit[26207]: 'amavisd' stop: /etc/init.d/amavis Oct 29 12:41:22 jaom monit[26207]: 'amavisd' start: /etc/init.d/amavis Oct 29 12:42:01 jaom /USR/SBIN/CRON[16964]: (web18) CMD (/var/www/xxxxxxxxxxxxx/private/digitalocean/checkBBB.sh #xxxxxxxxxxxxx) Oct 29 12:42:01 jaom /USR/SBIN/CRON[16963]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done) Oct 29 12:42:04 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1 Oct 29 12:42:04 jaom pure-ftpd: (?@::1) [INFO] Logout. Oct 29 12:42:26 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1 Oct 29 12:42:26 jaom pure-ftpd: ([email protected]) [INFO] Logout. Oct 29 12:42:26 jaom monit[26207]: 'amavisd' connection succeeded to INET[localhost:10024] via TCP Oct 29 12:43:01 jaom /USR/SBIN/CRON[17004]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done) Oct 29 12:43:04 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1 Oct 29 12:43:04 jaom pure-ftpd: (?@::1) [INFO] Logout. Oct 29 12:43:30 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1 Oct 29 12:43:30 jaom pure-ftpd: ([email protected]) [INFO] Logout. Oct 29 12:44:01 jaom /USR/SBIN/CRON[17029]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done) Oct 29 12:44:01 jaom /USR/SBIN/CRON[17030]: (web18) CMD (/var/www/xxxxxxxxxxxxx/private/digitalocean/checkBBB.sh #xxxxxxxxxxxxx) Oct 29 12:44:05 jaom pure-ftpd: (?@::1) [INFO] New connection from ::1 Oct 29 12:44:05 jaom pure-ftpd: (?@::1) [INFO] Logout. Oct 29 12:44:31 jaom pure-ftpd: ([email protected]) [INFO] New connection from 127.0.0.1 Oct 29 12:44:31 jaom pure-ftpd: ([email protected]) [INFO] Logout. Oct 29 12:45:01 jaom /USR/SBIN/CRON[17064]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done) Oct 29 12:45:01 jaom /USR/SBIN/CRON[17065]: (getmail) CMD (/usr/local/bin/run-getmail.sh > /dev/null 2>> /dev/null)
Dear Till, i didnt opened a new thread but i have similar problem. This problem in 2 times help me to fight spam some1 was sending from my server, but now there are only 40 mails in queue and I cant send them. I received same alert by mail: Code: Connection failed Service amavisd Date: Wed, 17 Dec 2014 16:42:01 Action: restart Host: xxx.yyyyyy.it Description: failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable Your faithful employee, After 1 min new mail: Code: Connection succeeded Service amavisd Date: Wed, 17 Dec 2014 16:43:02 Action: alert Host: xxx.yyyyyy.it Description: connection succeeded to INET[localhost:10024] via TCP Your faithful employee, Monit Now I check syslog same 16.40.00 Code: Dec 17 16:42:00 xxx postfix/smtp[16598]: D4448FC08BE: to=<[email protected]>, relay=mx.disegnioriginali.net[62.149.128.212]:25, delay=0.3, delays=0.06/0/0.23/0.01, dsn=2.0.0, status=sent (250 2.0.0 Ufhz1p01j1SSKAu01fhzLP mail accepted for delivery) Dec 17 16:42:00 xxx postfix/qmgr[12400]: D4448FC08BE: removed Dec 17 16:42:01 xxx /USR/SBIN/CRON[16612]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done) Dec 17 16:42:01 xxx monit[4297]: 'amavisd' failed protocol test [SMTP] at INET[localhost:10024] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable#012 Dec 17 16:42:01 xxx postfix/smtpd[13524]: connect from localhost[127.0.0.1] Dec 17 16:42:01 xxx postfix/smtpd[13524]: 7CE46FC08BE: client=localhost[127.0.0.1] Dec 17 16:42:01 xxx postfix/cleanup[16482]: 7CE46FC08BE: message-id=<[email protected]> Dec 17 16:42:01 xxx postfix/smtpd[13524]: disconnect from localhost[127.0.0.1] Dec 17 16:42:01 xxx postfix/qmgr[12400]: 7CE46FC08BE: from=<[email protected]>, size=813, nrcpt=1 (queue active) Dec 17 16:42:01 xxx monit[4297]: 'amavisd' trying to restart Dec 17 16:42:01 xxx monit[4297]: 'amavisd' stop: /etc/init.d/amavis Dec 17 16:42:01 xxx postfix/smtpd[16481]: disconnect from localhost[127.0.0.1] Dec 17 16:42:01 xxx postfix/smtpd[16491]: disconnect from localhost[127.0.0.1] Dec 17 16:42:01 xxx monit[4297]: 'amavisd' start: /etc/init.d/amavis Dec 17 16:42:01 xxx postfix/smtp[16468]: C9CADFC2307: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10024, conn_use=8, delay=23385, delays=23335/47/0/2.9, dsn=4.3.2, status=deferred (host 127.0.0.1[127.0.0.1] said: 421 4.3.2 Service shutting down, closing channel (in reply to end of DATA command)) Dec 17 16:42:01 xxx postfix/smtp[16474]: C516BFC2381: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10024, conn_use=8, delay=21071, delays=21021/48/0/1.7, dsn=4.3.2, status=deferred (host 127.0.0.1[127.0.0.1] said: 421 4.3.2 Service shutting down, closing channel (in reply to end of DATA command)) Dec 17 16:42:01 xxx postfix/smtp[16468]: connect to 127.0.0.1[127.0.0.1]:10024: Connection refused Dec 17 16:42:01 xxx postfix/smtp[16474]: connect to 127.0.0.1[127.0.0.1]:10024: Connection refused I dont know what check now. Any suggest ? Thx
Sad to say I haven't solved my problem yet. :/ And yes it was typical for postfix to queue up mails for days and with large amount of queued up mails - amavisd was crushing over and over again when restarted. I moved from VPS 1GB RAM 1 core to 2 cores and 2 GB suspecting that lack of resources might be the problem. Helped a bit - first of all postfix is running just fine now and everything seems to be working (well, not crushing is what i mean). Sad to say I still get fair amount of connection amavisd succeeded / connection failed pair of mails (always in this order - first one I receive is succeeded). Occasionally I'm getting also sshd / pureftpd / apache connection failed emails. Especially when server is more loaded (still really far from using even 3/4 of the resources...). I suspect that this might be some kind of fault on the OpenVZ side (my VPS is an OpenVZ container). The fact is, hoverer, that amavisd seems to crash every time when server is overloaded, even for a short time, and if it crashes - it doesn't feel like going up. If there is a postfix queue for, let's say, 300 mails - amavisd will check ~ 10-150 mails after manual restart and crash again. Kind of pain in the back.
I'm not exactly sure if I met any: Code: Version: 2.5 uid resource held maxheld barrier limit failcnt 72265: kmemsize 84142864 231481344 9223372036854775807 9223372036854775807 0 lockedpages 0 4175 524288 524288 0 privvmpages 834176 1283849 9223372036854775807 9223372036854775807 0 shmpages 17705 22137 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 numproc 166 252 9223372036854775807 9223372036854775807 0 physpages 342777 536765 0 524288 0 vmguarpages 0 0 557056 9223372036854775807 0 oomguarpages 345527 416786 524288 9223372036854775807 0 numtcpsock 39 142 9223372036854775807 9223372036854775807 0 numflock 11 36 9223372036854775807 9223372036854775807 0 numpty 2 10 9223372036854775807 9223372036854775807 0 numsiginfo 1 123 9223372036854775807 9223372036854775807 0 tcpsndbuf 682472 3085624 9223372036854775807 9223372036854775807 0 tcprcvbuf 638976 2791704 9223372036854775807 9223372036854775807 0 othersockbuf 333176 3110024 9223372036854775807 9223372036854775807 0 dgramrcvbuf 0 313920 9223372036854775807 9223372036854775807 0 numothersock 221 500 500 500 26 dcachesize 13742546 163168231 9223372036854775807 9223372036854775807 0 numfile 2503 4415 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 numiptent 40 47 9223372036854775807 9223372036854775807 0 This numothersock is kind of disturbing - it surely sounds like it might be the root of every evil.
so it is "only" a lack of resources ? If so it's ok. How fix it ? Restatart server ? Wait to delivery all queue ? Thx for any other onfo
Yes, the numothersock is most likely the cause. When it is reached, then amavis can not create any new sockets and the process dies. A amavis restart should fix that, maybe postfix needs to be restarted too. The mails in the queue will processed then automatically.