Hi all, After months of having to reboot my server every week because of the mail queue getting blocked, I've now finally worked out what is causing the problem. Every week or so, postfix would get in a tangle and refuse all deliveries, instead holding them in a queue with "delivery temporarily suspended". I could only fix this by rebooting the server so I just accepted this and did it whenever the problem occurred. I now know that this is being caused by amavisd cutting out, so I know that if I restart amavisd and then postfix through SSH the problem will eventually clear up. The problem is that this keeps happening and it's driving me nuts - I wouldn't usually notice until I'd realise I hadn't had any email for 24 hours and then I'd check the mailqueue, and what a surprise everything is held in queue. The problem now just seems to be getting worse with it occurring around every 48 hours rather than every week. Does anybody know how to fix this infuriating problem? It only started occurring around about 6 months ago - pretty sure it was when I upgraded Debain on my server. Do I need to reinstall or reconfigure amavis? Thanks so much in advance, Alex EDIT: Had a look at the mail.log - this is concerning me: I'm getting repeats of these messages throughout the postfix outage, right until I restart amavisd and postfix Mar 21 20:51:31 server1 amavis[27645]: (27645-04) (!)connect to /var/run/clamav/clamd.ctl failed, attempt #1: Can't connect to UNIX socket /var/run/clamav/clamd.ctl: Connection refused Mar 21 20:51:31 server1 amavis[27645]: (27645-04) (!)ClamAV-clamd: All attempts (1) failed connecting to /var/run/clamav/clamd.ctl, retrying (2) Mar 21 20:51:37 server1 amavis[27645]: (27645-04) (!)connect to /var/run/clamav/clamd.ctl failed, attempt #1: Can't connect to UNIX socket /var/run/clamav/clamd.ctl: Connection refused Mar 21 20:51:37 server1 amavis[27645]: (27645-04) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamav/clamd.ctl (All attempts (1) failed connecting to /var/run/clamav/clamd.ctl) at (eval 113) line 603.\n Mar 21 20:51:37 server1 amavis[27645]: (27645-04) (!)WARN: all primary virus scanners failed, considering backups
Havent found the solution to this yet *sigh* But in the meantime, I've created a workaround shell script to overcome the problem for now. Essentially it checks whether amavis is running, and if it's crashed for whatever reason it restarts amavis and then postfix to resume deliveries. Not a solution, but a workaround for now. If you're having similar problems, that may help.