Ispconfig Spamassassin Training On Large Inbox

Discussion in 'General' started by FeraTechInc, Sep 11, 2014.

  1. FeraTechInc

    FeraTechInc ISPConfig Developer ISPConfig Developer

    So I'm trying to help eliminate spam. However, every time I run "sa_learn" I get the following error:

    /usr/bin/sa-learn: Argument list too long

    Here are the contents of my /bin/sa_learn file:

    Code:
    #!/bin/bash
    /usr/bin/sa-learn --spam /var/vmail/*/*/*/.Junk/*/*
    /usr/bin/sa-learn --ham /var/vmail/*/*/*/cur
    
    Seems like I have too many e-mails in the directory. Is there any way around this? Otherwise spamassassin will only learn spam e-mails and become biased
     
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    I guess the problem is that you try to learn all mailboxes at once (wildcards on several directory levels. You can e.g. try to add loop in your shell script and then feed the maldirs one by one to sa-learn.
     
  3. FeraTechInc

    FeraTechInc ISPConfig Developer ISPConfig Developer

    Nope.... tried a couple mailboxes individually and got the same error. Just one of the mailboxes is too much...

    Is there any way to modify the search capacity of this script?
     
  4. till

    till Super Moderator Staff Member ISPConfig Developer

    Thats quite specific, I guess you might have to ask at the spamassassin mailinglist.

    Or you feed the emails one by one to the script. basically run the find command on the maildir:

    find /var/vmail/domain.tld/user/Maildir/.Junk

    and pipe the output to the sa-learn command
     
  5. mattltm

    mattltm Member

    Can you provide an example of this for me Till?
     
  6. KalanVryce

    KalanVryce New Member

    I was having the same issue and the resolution is SO SIMPLE it evaded me for a while. I found your post while looking for an answer and figured that I would help out. The answer is just a matter of using quotes...

    #!/bin/bash
    /usr/bin/sa-learn --spam "/var/vmail/*/*/*/.Junk/*/*"
    /usr/bin/sa-learn --ham "/var/vmail/*/*/*/cur"

    I also have to give credit to "mikeserv" for the not as direct answer to another question that gave me this resolution. Apparently I can't link to it on the site so maybe I can out wit their filter...

    unix.stackexchange.com/questions/215530/argument-list-too-long-in-for-loop
     
    Last edited: Jan 28, 2016
    till likes this.
  7. KalanVryce

    KalanVryce New Member

    I also wanted to expand further on this as I have sub folders and my email client auto-sorts by rules to help my mail stay tidy. Here is my whole script and a string that will run the entire process and email you the results! I just have a small server with a few users so this may not work so well on a larger email system and you may not want to email the results as the ham learning echos results for each dir scanned though that may be changeable however I haven't put the time into it yet.

    <EmailSA>
    /home/USER/LearnSpam > output ; mail -s "SA-Learn Output" "USER@DOMAIN" < output

    <LearnSpam>
    #!/bin/bash
    echo "Forcing Expire..."
    /usr/bin/sa-learn --force-expire
    echo "Learning from Junk folders..."
    /usr/bin/sa-learn --spam "/var/vmail/*/*/*/.Junk/*/*"
    echo "Cleaning Junk Folders..."
    /bin/rm /var/vmail/*/*/*/.Junk/cur/* -rf
    echo "Learning from Inbox folders..."
    find /var/vmail/*/*/Maildir/ -maxdepth 1 -type d -not -name .Junk -not -name .Spam -not -name .Trash -not -name tmp -not -name "." -not -name .Sent -not -name .Drafts -not -name new -not -name Maildir -exec /usr/bin/sa-learn --ham {} \;
    echo "Current Bayes Info..."
    /usr/bin/sa-learn --dump magic
     

Share This Page