amavis issue

Discussion in 'Server Operation' started by pyte, Mar 22, 2024.

  1. pyte

    pyte Well-Known Member HowtoForge Supporter

    Hi,
    i'm currently facing a issue with amavisd-new on our old mailserver. We noticed that alot of smtp connections to our system timed out, after debugging the issue i realized that amavis caused this issue. After removing amavis as a milter the postfix handles connections as usual. The only relevant messages that i can find are these:

    Mar 22 11:21:03 mailserver amavis[75290]: (75290) (!!)TROUBLE in process_request: Read from client socket FAILED: Connection reset by peer at (eval 132) line 126, <GEN52> line 12.
    Mar 22 11:21:03 mailserver amavis[75290]: (75290) (!)Requesting process rundown after fatal error

    Oddly this does happen for alot of connections but not all of them and it results in postfix not handling new connections in a timly manner, the server greeting while connecting either takes very very long or times out most of the time. As soon as i removed amavis as a milter and restart postfix the server greeting is there instantly.

    Any ideas?
     
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    Have you tried to increase the number of amavis processes?
     
    pyte likes this.
  3. pyte

    pyte Well-Known Member HowtoForge Supporter

    Just increased max_servers from 10 to 50. Let's see what happens now :) Thank you for the hint
     
    ztk.me likes this.
  4. pyte

    pyte Well-Known Member HowtoForge Supporter

    Well it took way longer now to cause an issue, but still after some time the initial server greeting takes about 20 sec to appear. Sadly i don't have any metrics to check if the incoming connection count i higher than usual for this system.

    Any idea what may cause this? There were no configuration changes yesterday and the system started to act up yesterday around 10am.

    //Edit: Ok and it seems normal again. I'll keep checking it and see if it gets completly unresponsive at some point.
     
    Last edited: Mar 22, 2024
  5. till

    till Super Moderator Staff Member ISPConfig Developer

    Not really. The only situation where I have seen such behaviours was when the number of servers was too low. Please check if Clamd is running, if its not running then amavis will fall back to clamsacen, which is much slower. And check if there are any issues with DNS resolving, maybe amavis tries to query RBL#s and waits until it gets a timeout. And try to get more verbose logging from amavis.
     
  6. pyte

    pyte Well-Known Member HowtoForge Supporter

    It seem to have stabilized for now. I've already checked clamd, and i appears to be fine. I'll increase the log level if the issue occures again for now i think this was due to a traffic spike and the server not beging able to handle this many requests in a timly manner. I'll post an update if there are any news to the situation.

    Thank you :)
     
    till likes this.
  7. pyte

    pyte Well-Known Member HowtoForge Supporter

    Comming back to this with some information. Allthough the situation got better after making the changes and increasing max_servers the issue still perstisted in some way. The amavis scans seem to take unusually long and i coulnd't point my finger on the issue. There was no sign of unusually high traffic on the mailserver.
    However checking the HyperVisor platform where this VM was located at the time i soon realized that the HyperVisor itself had issues. The CPU only clocked at 0.3 GHz which resulted in a dramstic performance loss on all VMs on this node. Sadly no monitoring alerted us on the issue as the system was operating well enough to not trigger any major alarms.

    Unusual error and no information in the VM but fortunately found promptly. I am amazed that the mail server still functioned reasonably well with so little cpu power.
     
    ahrasis, ztk.me and till like this.

Share This Page