SOLVED: amavisd-new connections timing out

Ovidiu · Apr 13, 2016

This is a perfect server Debian Jessie system running ISPCFG3 + nginx with the usual postfix dovecote amavisd-new spamassassin setup

After an upgrade from mysql 5.6 to 5.7 which went slightly wrong but seems fixed now I am now experiencing timeouts with amavis meaning every email is being deferred for minutes up to hours. I'd be grateful for any pointers. Funny enough, nothing was changed when it comes to the mailing system so it must be related to the mysql upgrade.
I have changed the log level for amavis to 5 and couldn't spot any errors so its down to 4 now but still nothing jumps out at me.

Code:

message deferral detail
-----------------------
  error (total: 110)
        86   lost connection with 127.0.0.1[127.0.0.1] while sending end of...
        11   lost connection with 127.0.0.1[127.0.0.1] while sending RCPT TO
         9   conversation with 127.0.0.1[127.0.0.1] timed out while sending...
         2   address resolver failure
         2   conversation with 127.0.0.1[127.0.0.1] timed out while receivi...
  smtp (total: 590)
       349   lost connection with 127.0.0.1[127.0.0.1] while sending RCPT TO
       171   lost connection with 127.0.0.1[127.0.0.1] while sending end of...
        33   Host not found, try again
        14   conversation with 127.0.0.1[127.0.0.1] timed out while sending...
         9   conversation with 127.0.0.1[127.0.0.1] timed out while receivi...

Syslog also notes some of these DB issues but I have tested the user/password for dbispconfig inside that file and they are ok. This coudl explain the deferred emails but how to fix or debug?

Code:

Apr 13 07:43:00 alfred postfix/smtp[30110]: warning: mysql:/etc/postfix/mysql-virtual_relaydomains.cf: table lookup problem

And this is what I did when upgrading mysql:

Code:

dpkg status changes

Installed:
  libio-socket-ip-perl:all 0.32-1
  libmecab2:amd64 0.996-1.1
  mysql-server:amd64 5.7.12-1debian8
  mysqltuner:all 1.3.0+git20140812-1
Reinstalled:
  mysql-apt-config:all 0.7.0-1
Upgraded:
  libmysqlclient18:amd64 5.6.29-1debian8 => 5.6.30-1debian8
  libmysqlclient18:amd64 5.6.30-1debian8 => 5.7.5-m15-2debian8
  linux-libc-dev:amd64 3.16.7-ckt25-1 => 3.16.7-ckt25-2
  mysql-apt-config:all 0.5.3-1 => 0.7.0-1
  mysql-client:amd64 5.6.29-1debian8 => 5.6.30-1debian8
  mysql-client:amd64 5.6.30-1debian8 => 5.7.12-1debian8
  mysql-common:amd64 5.6.29-1debian8 => 5.6.30-1debian8
  mysql-common:amd64 5.6.30-1debian8 => 5.7.12-1debian8
  mysql-community-client:amd64 5.6.29-1debian8 => 5.6.30-1debian8
  mysql-community-client:amd64 5.6.30-1debian8 => 5.7.12-1debian8
  mysql-community-server:amd64 5.6.29-1debian8 => 5.6.30-1debian8
  mysql-community-server:amd64 5.6.29-1debian8 => 5.7.12-1debian8
  mysql-server:amd64 5.6.29-1debian8 => 5.6.30-1debian8
Removed:
  mysql-community-server:amd64 5.6.29-1debian8
  mysql-server:amd64 5.6.30-1debian8

till · Apr 13, 2016

Please try the mysql settings that I mentioned in your other thread. without that setting (or with strict_trans_tables), ispconfig 3.0.5. will fail as well with mysql 5.7.

Ovidiu · Apr 13, 2016

You lost me a bit with " strict_trans_tables" - I googled and apparently its default with mysql starting 5.7.5. I run MySQL on 127.0.0.1 (5.7.12)

Any more advice how and where to check this?
###edit###
double checked this is in my my.cnf
Code:
[mysqld]
# added after upgrade to mysql 5.7
sql_mode=NO_ENGINE_SUBSTITUTION
so no strict_trans_tables there.

Ovidiu · Apr 13, 2016

any other advice to sort out this amavis / mysql issue or at least properly debug it?

till · Apr 13, 2016

I did not had this issue yet and I switched all newer servers to mariadb, so I dont use MySQL 5.7. Maybe the new MySQL version has too short timeouts, did you try to set longer timeouts? Another possibility is that this is a false error or in other words, one that dont matter. Do you have actaul mail problems or just log messages about timeouts, it might be that amavis tries to keep the connection open but mysql closes it after a longer idle period, in this case, amavis should create a new connection automatically, but you may see such errors in the log without a real issue behind them.

Ovidiu · Apr 13, 2016

hm, I'll look into the timeouts, thanks but the server isn't really busy at all...

shall I look at mysql timeouts (current settings)
wait_timeout = 60
interactive_timeout = 180

or amavis timeouts? I found these 2 so far:
$child_timeout
$smtpd_timeout

THERE ARE REAL issues as every incoming and outgoing email is deferred for a while. I've been restarting amavis and doing postqueue -f regularly today, otherwise emails got stuck for minutes up to hours.

till · Apr 13, 2016

Ovidiu said: ↑

hm, I'll look into the timeouts, thanks but the server isn't really busy at all...
Click to expand...

The timeout problem is an issue on servers with low load (it happens when no new mail comes in within the specified timeout period), so when the timeout is e.g. 1 minute and you don't receive a message every minute, then you might get such an error. At a higher load, the connection will not time out as it don't gets idle too long.

The timeouts are in MySQL, not amavis. Try to increase both to e.g. 5 minutes and check if this changes the behaviour or if you get less timeouts.

Ovidiu · Apr 13, 2016

Ah, I understand. I have now changed the 2 mysql timeouts to 5 minutes as advised and will observe and report back here.

Ovidiu · Apr 13, 2016

ok, so those timeouts didn't help at all.
additionally I noticed that inside /etc/postfix/master.cf I had

Code:

amavis unix - - - - 2 smtp
        -o smtp_data_done_timeout=2400
        -o smtp_send_xforward_command=yes

but inside /etc/amavis/conf.d/50-user was

Code:

$max_servers = 20;

and I read that those must match so I adjusted it to

Code:

$max_servers = 2;

and yet I see mails being deferred, here is one example:

Code:

root@alfred:/# cat /var/log/mail.log | grep 40BE9520D3A
Apr 13 17:13:03 alfred postfix/smtpd[26559]: 40BE9520D3A: client=za-smtp-delivery-153.mimecast.co.za[41.74.201.153]
Apr 13 17:13:03 alfred postfix/cleanup[26677]: 40BE9520D3A: message-id=<[email protected]>
Apr 13 17:13:04 alfred postfix/qmgr[8500]: 40BE9520D3A: from=<[email protected]>, size=20857, nrcpt=1 (queue active)
Apr 13 17:13:04 alfred postfix/smtp[26683]: 40BE9520D3A: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.5, delays=1.5/0/0/0.01, dsn=4.4.2, status=deferred (lost connection with 127.0.0.1[127.0.0.1] while sending RCPT TO)
Apr 13 17:23:00 alfred postfix/qmgr[8500]: 40BE9520D3A: from=<[email protected]>, size=20857, nrcpt=1 (queue active)
Apr 13 17:23:00 alfred postfix/smtp[29541]: 40BE9520D3A: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10024, delay=598, delays=598/0/0/0.01, dsn=4.4.2, status=deferred (lost connection with 127.0.0.1[127.0.0.1] while sending RCPT TO)
Apr 13 17:33:00 alfred postfix/qmgr[8500]: 40BE9520D3A: from=<[email protected]>, size=20857, nrcpt=1 (queue active)
Apr 13 17:33:00 alfred postfix/smtp[32666]: 40BE9520D3A: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10024, delay=1197, delays=1197/0/0/0.01, dsn=4.4.2, status=deferred (lost connection with 127.0.0.1[127.0.0.1] while sending RCPT TO)

Any more advice?

###edit###
Checking for listening amavis processes, shouldn't they be "listening" instead of close_wait?

Code:

root@alfred:~# netstat -a | grep 10024
tcp        0      0 localhost.localdo:10024 *:*                     LISTEN
tcp        1      0 localhost.localdo:10024 localhost.localdo:39738 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:43624 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:43344 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:34184 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:38856 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:59782 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:35630 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:50686 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:58866 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:53602 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:32970 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:44712 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:59774 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:36006 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:36554 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:58864 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:48692 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:39708 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:39054 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:38282 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:50688 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:33054 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:34992 CLOSE_WAIT
tcp    16740      0 localhost.localdo:10024 localhost.localdo:41910 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:39058 CLOSE_WAIT
tcp        1      0 localhost.localdo:10024 localhost.localdo:38272 CLOSE_WAIT
tcp6       0      0 localhost:10024         [::]:*                  LISTEN

Ovidiu · Apr 14, 2016

Sorry to insist but I'm stuck here.
Any more ideas?
Shall I up the timeouts of MYSQl even more?
Will this not negatively influence its performance? (I don't really care about RAM right now, the machine has plenty and handles very low email/web volumes)

Ovidiu · Apr 15, 2016

FIXED. I got this under control by slowly increasing the wait times for mysql until I hit the sweet spot with these settings:
Code:
wait_timeout = 3600
interactive_timeout = 3600
Next I'll probably slowly decrease the interactive_timeout as an amavis connection will definitely not take that long
I don't think this will be any problem in the future as this is a very low-volume mail and web server.

Any further input would be appreciated.

Ovidiu · May 26, 2016

The problem is back, I've been restarting amavis once every half an hour and then doing a postqueue -f the whole day.

Any more pointers are appreciated. This problem only surfaced today, not aware of anythign that could have triggered it :-(

Ovidiu · Jun 20, 2016

as a last resort I found this thread: http://www.iredmail.org/forum/post41574.html#p41574 recommending:
Code:
$max_servers = 1;
$max_requests = 1;
lets see how that goes.

till · Jun 20, 2016

ensure that you set the connection count in postfix master.cf to 1 as well.

Ovidiu · Aug 18, 2016

I'm still having this problem, especially Monday morning as there are hardly any emails coming through on the weekends so I have a cron job restarting amavis and flushing the postfix queue Monday morning. But if there is a public holiday I have to manually do this :-/

So I was thinking, could this be "monitored" with monit?
If it can, what should I monitor, amavisd or mysql?
Any help with a monitor setup to keep this in check?

Ovidiu · Sep 7, 2016

so I implemented checking of amavis with monit:
Code:
check process amavisd with pidfile /var/run/amavis/amavisd.pid
group mail
start program = "/usr/sbin/service amavis start"
stop program = "/usr/sbin/service amavis stop"
if failed port 10024 protocol smtp then exec /root/restart_mailserver.sh
if 5 restarts within 25 cycles then timeout
then I found a slightly more expanded version to do this here: https://www.howtoforge.com/communit...guration-for-debian-ispconfig-3-server.43177/
but I'm not sure if that is needed?

And yet, this morning, I had many, many emails in my deferred queue so obviously this is not working :-(

till · Sep 7, 2016

So amavis was not running again? Or was it running but stopped to respond on its ports?

Ovidiu · Sep 7, 2016

embarrassingly enough, I totally panicked and simply restarted amavis and postfix so I cannot answer your question.

I had put the solution with monit monitoring into place about 2 weeks ago and have since received emails from monit when amavis was not responding and was restarted by monit. It seemed all was working for 2 weeks.

I'm at a total loss with this issue. I fear my email hosting customers will soon walk. Do you guys do paid issue solving? Or do you have any recommendations?
Or more ideas for solving this myself?

Ovidiu · Sep 7, 2016

and on top of this, after restarting amavis and flushing the postfix queue the email from my biggest client are still stuck in the deferred queue:
T 5 10 20 40 80 160 320 640 1280 1280+
TOTAL 17 0 0 0 0 3 8 3 3 0 0

All other mails were sent out. All new incoming and outgoing emails are working now except these which are still stuck in the queue. All of them are saying:
Code:
(delivery temporarily suspended: lost connection with 127.0.0.1[127.0.0.1] while sending end of data -- message may be sent more than once)
###edit###
using postqueue -i xxxxxxx I was able to send out most of these for now.

Ovidiu · Sep 10, 2016

@till:

got the exact same problem right now. Stuff in the deferred queue. Amavis running. Even did a manual amavis restart. It seems the problem is between monit and amavis. also monit monitoring only port 10024 and not 10026.

The log files now actually point to a problem with amavis on port 10026 not 10024:
Code:
[LIST]
[*]Sep 10 12:01:43 alfred postfix/smtpd: 1FDEC52076E: client=jacob.ict-consult.co.za[51.254.252.92]
[*]Sep 10 12:01:43 alfred postfix/cleanup: 1FDEC52076E: message-id=<[email protected]>
[*]Sep 10 12:01:43 alfred postfix/qmgr: 1FDEC52076E: from=<whatsnew-bounces+denis=localrecipientdomain.co.za@list.gcis.gov.za>, size=14113, nrcpt=1 (queue active)
[*]Sep 10 12:01:43 alfred postfix/smtp: 1FDEC52076E: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10026, delay=0.09, delays=0.05/0.03/0/0.01, dsn=4.4.2, status=deferred (lost connection with 127.0.0.1[127.0.0.1] while sending RCPT TO)
[*]Sep 10 12:04:51 alfred postfix/qmgr: 1FDEC52076E: from=<whatsnew-bounces+denis=localrecipientdomain.co.za@list.gcis.gov.za>, size=14113, nrcpt=1 (queue active)
[*]Sep 10 12:06:52 alfred postfix/smtp: 1FDEC52076E: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10026, delay=309, delays=189/0/0/120, dsn=4.4.2, status=deferred (conversation with 127.0.0.1[127.0.0.1] timed out while sending DATA command)
[*]Sep 10 12:14:02 alfred postfix/qmgr: 1FDEC52076E: from=<whatsnew-bounces+denis=localrecipientdomain.co.za@list.gcis.gov.za>, size=14113, nrcpt=1 (queue active)
[*]Sep 10 12:17:07 alfred postfix/error: 1FDEC52076E: to=<[email protected]>, relay=none, delay=924, delays=739/185/0/0, dsn=4.4.2, status=deferred (delivery temporarily suspended: conversation with 127.0.0.1[127.0.0.1] timed out while receiving the initial server greeting)
[*]Sep 10 12:21:29 alfred postfix/qmgr: 1FDEC52076E: from=<whatsnew-bounces+denis=localrecipientdomain.co.za@list.gcis.gov.za>, size=14113, nrcpt=1 (queue active)
[/LIST]
###edit###
And only a few minutes after restarting amavis, I realize the deferred queue might be empty but the active one is now full and stuck apparently.
Code:
Sep 10 12:33:53 alfred postfix/pickup:  E59F852097B: uid=0 from=<root>
Sep 10 12:33:53 alfred postfix/cleanup:  E59F852097B: message-id=<[email protected]>
Sep 10 12:33:53 alfred postfix/qmgr:  E59F852097B: from=<[email protected]>, size=31473, nrcpt=1 (queue active)
Sep 10 12:38:53 alfred postfix/smtp:  E59F852097B: to=<[email protected]>, orig_to=<root>, relay=127.0.0.1[127.0.0.1]:10024, delay=300, delays=0.02/0/300/0, dsn=4.4.2, status=deferred (conversation with 127.0.0.1[127.0.0.1] timed out while receiving the initial server greeting)
It looks like I have an amavis problem now:
Code:
root@alfred:~# /etc/init.d/amavis restart
Stopping amavisd: (not running).
Starting amavisd: amavisd-new.
root@alfred:~# /etc/init.d/amavis status
[FAIL] amavisd is not running ... failed!
monit keeps saying: amavisd Execution failed
I tried both: start program = "/usr/sbin/service amavis start" and start program = "/etc/init.d/amavis start" with the same result. It seems stopping and starting monit helped. I enable monitoring of amavisd in monit, then I see amavisd Initializing for a while, then monit says: amavisd Execution failed and I realized there is no /var/run/amavis/amavisd.pid

all I can find is a /var/run/amavis/amavisd.lock

Any help in properly monitoring both ports of amavis and figuring out the PID location?
Please also see this thread where I tried debugging amavis: https://www.howtoforge.com/community/threads/debugging-amavis.73925/
in the end I fixed thigns for now by restarting the server.

Log in or Sign up

SOLVED: amavisd-new connections timing out

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

Share This Page

Log in or Sign up

SOLVED: amavisd-new connections timing out

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

Ovidiu Active Member

till Super Moderator Staff Member ISPConfig Developer

Ovidiu Active Member

Ovidiu Active Member

Ovidiu Active Member

Share This Page

Useful Searches