sa-learn seems not to work

Discussion in 'Installation/Configuration' started by viniciusmassuchetto, Apr 23, 2011.

  1. viniciusmassuchetto

    viniciusmassuchetto New Member

    I'm using ISPConfig 3 on Debian Lenny.

    I have a lot of messages separated in a "global Junk" mail folder. Things seems to go well when I run:

    Code:
    sa-learn --spam --dir .Junk/cur/
    But even learning tons of messages that are supposed to be SPAM, the day after they get to our servers again without being marked, as nothing has been learned before.

    Also, the configuration on /etc/amavis/conf.d/50-user differs from the ones in the ISPConfig Panel. For example, I see by the amavis logs that "$sa_tag_level_deflt" and "$spam_quarantine_to" are completely ignored in that file, and that ISPConfig uses the values in its "spamfilter_policy" database table.

    Not sure if things are related, but like that I can't figure out where ISPConfig tells amavis + sa to get the learned rules from.

    Many Thanks
     
    Last edited: Apr 23, 2011
  2. viniciusmassuchetto

    viniciusmassuchetto New Member

    Maybe I was doing it wrong. I was actually creating the bayes database into /root/.spamassassin/ folder. As I'm with amavis integrated, the right folder seems to be /var/lib/amavis/.spamassassin.

    So I used the --dbpath option pointing to this folder in sa-learn, and it seemed to increase de database, as the bayes_toks file increased almost 2MB.

    After this, when I went to ISPConfig Panel and added some spamfilter rules in black/whitelist. Then the size of the bayes_toks file in the amavis folder just went back to the size it was before I ran the sa-learn on them, as I could see by the modification time.

    After all... what's the right way of learning spam with ISPConfig?
     
  3. cbj4074

    cbj4074 Member

    I have the exact same question:

    How is one supposed to train SpamAssassin, manually, using the "sa-learn" executable when using Dovecot + Amavis + SpamAssassin + ISPConfig?

    The original poster's attempts to flag spam were failing because he was executing the "sa-learn" executable as the "root" user, so the Bayesian tokens were not being added to the effective user's (amavis's) database. (The tokens were being added to the "root" user's database.)

    I have found this to be the case as well. How? By discovering that SpamAssassin's "bayes_path" directive is not defined anywhere on the system in question, and the relevant source code indicates that the default value is ~/.spamassassin/bayes, which should translate to /var/lib/amavis/.spamassassin in the normal course of events.

    I tried the following:

    Code:
    # su amavis -c 'sa-learn --spam "/var/vmail/example.com/sa-training/Maildir/.INBOX.Spam"'
    
    archive-iterator: no access to /var/vmail/example.com/user/Maildir/cur: 13 at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 539.
    archive-iterator: no access to /var/vmail/example.com/user/Maildir/cur: 13 at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 771.
    archive-iterator: unable to open /var/vmail/example.com/user/Maildir/cur: 13
    
    This does not work because the permissions on each user's "mail directory" (e.g., Maildir) are 700, with vmail:vmail ownership. Adding the "amavis" user to the "vmail" group will not solve the problem, due to the 700 permissions.

    Am I missing something obvious?

    Thank you.

    UPDATE:

    Indeed I was missing something "obvious".

    The solution is to include the --username switch to the 'sa-learn' executable, e.g.:

    Code:
    # sa-learn --username=amavis --spam /var/vmail/example.com/trainer/Maildir/.Spam/cur
    
    This enables one to execute the command as "root" or "vmail", which provides for the necessary permissions, while at the same time adding the tokens to the "amavis" user's database.
     
    Last edited: Aug 20, 2012
  4. cbj4074

    cbj4074 Member

    Also, I am curious to know if ISPConfig configures Amavis to maintain a separate Bayes token database for each virtual mail user (e.g., within a database).

    Or, does ISPConfig configure Amavis to use a single Bayes database, e.g., that in /var/lib/amavis/.spamassassin?
     
  5. till

    till Super Moderator Staff Member ISPConfig Developer

    A single bayes database is used as far as I know. There is no special configuration in ISPConfig about this, so the defaults of the Linux distribution were you installed the system on are used.
     
  6. cbj4074

    cbj4074 Member

    Thanks, Till!

    Do you happen to know whether or not it is necessary to restart Amavis for changes to the Bayes database to be effective?

    I realize that SpamAssassin is accessed on-demand when used with Amavis, but it's not clear whether the Bayes values are loaded once when Amavis is started, or whether a look-up is performed against whatever data exists in the Bayes database with Amavis's every request to SpamAssassin.

    Thanks again.
     
  7. cbj4074

    cbj4074 Member

    Actually, this was not the solution; I was mistaken.

    Users on the SpamAssassin mailing list pointed-out that the --username switch is intended for use with virtual user configurations, e.g., those tied to a SQL database of some kind. (It's worth noting that using an invalid --username doesn't throw a warning or error, and seems to use the current username instead.)

    The solution was to "hard-code" the SpamAssassin Bayes database location in the configuration file (typically /etc/spamassassin/local.cf on Debian/Ubuntu systems):

    Code:
    bayes_path /var/lib/amavis/.spamassassin/bayes
    
    With this directive in-place, the sa-learn command will always use the specified database (unless the --username argument is provided [and is valid]).

    To ensure that the correct database is being used:

    Code:
    # spamassassin -D -t < /usr/share/doc/spamassassin/examples/sample-spam.txt 2>&1 | egrep '(bayes:|whitelist:|AWL)'
    
    [...]
    dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_toks
    dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_seen
    [...]
    
    The directive is having the intended effect; even though the command is executed as "root", the Amavis user's database file is used.

    Now, when training SpamAssassin, the sa-train executable can be called as the "root" user, which allows for access to the mailboxes in /var/vmail while at the same time populating the correct Bayes database (Amavis's).
     
  8. mattltm

    mattltm Member

    Is there a way to check that bayes is actually working for incoming email?

    I have over 55000 example emails in my database but still get the same type of spam through.
     
  9. cbj4074

    cbj4074 Member

    Yes, there is.

    My recommendation is to read-through the relevant bits of the thread at https://lists.gt.net/spamassassin/users/176514/?page=1;mh=-1; .

    The thread is long, but very worthwhile when it comes to understanding how SpamAssassin, Bayes, and AMaViS function together.

    Note in particular the link cited in the first post of the above thread; that also contains a wealth of relevant and useful information where this problem is concerned.

    If those resources don't lead you to the answer, let me know...
     
    Last edited: Jan 25, 2017
  10. mattltm

    mattltm Member

    I've taken a look through and it seems that the X-SPAM header is not getting added to any of my incoming emails.

    How do I add them?
     
  11. till

    till Super Moderator Staff Member ISPConfig Developer

    The X-SPAM header gets added when the spam score of a email exceeds the spam score 2 evel that you set in the amavis policy.
     
  12. mattltm

    mattltm Member

    Is that set in /etc/amavis/conf.d/50-user (Debian)?

    If so, I have mine set to -9999 to make sure it is added to every email.
     
  13. cbj4074

    cbj4074 Member

    That's one way to do it.

    You can also just control it via the ISPConfig interface; modify the "spamfilter policy" and set the SPAM tag level to -9999.

    Curious to see where you land with the Bayes. If and until one fully understands how Bayes works, especially when so-called "glue" is involved, it can be a real mother****** to get working correctly.

    As you can see from the thread I cited in my initial reply to you (I started and ended that thread), I jumped through many hoops to "get it all working". Hopefully, my adventure is of use to you! Happy to answer any questions you may have when you hit the next roadblock.
     
  14. mattltm

    mattltm Member

    Changing it in ISPConfig did the trick.

    Just need to wait for some yummy spam to see what's going on now..
     
  15. mattltm

    mattltm Member

    That was quick!

    Code:
    X-Spam-Flag: NO
    X-Spam-Score: -0.008
    X-Spam-Level:
    X-Spam-Status: No, score=-0.008 tagged_above=-9999 required=5
    	tests=[HTML_MESSAGE=0.001, MIME_HTML_MOSTLY=0.001,
    	T_RP_MATCHES_RCVD=-0.01] autolearn=unavailable
    
    No mention of BAYES and should autolearn be available?
     
  16. cbj4074

    cbj4074 Member

    Bayes requires at least 200 spam and 200 ham messages to be used. This explains both issues (Bayes not being used, and autolearn being unavailable).

    Go back to that thread I cited earlier for explicit instructions regarding how to check your token database.
     
  17. mattltm

    mattltm Member

    I should have plenty in there...

    0.000 0 3 0 non-token data: bayes db version
    0.000 0 56062 0 non-token data: nspam
    0.000 0 497 0 non-token data: nham
    0.000 0 1220985 0 non-token data: ntokens
    0.000 0 1026091208 0 non-token data: oldest atime
    0.000 0 1398679531 0 non-token data: newest atime
    0.000 0 1381878605 0 non-token data: last journal sync atime
    0.000 0 1398726633 0 non-token data: last expiry atime
    0.000 0 0 0 non-token data: last expire atime delta
    0.000 0 0 0 non-token data: last expire reduction count

    From what I have read, autolearn=unavailable can mean almost anything.

    I have checked and it seems to be working OK now.

    Thanks.
     
  18. cbj4074

    cbj4074 Member

    Everything is sorted? Your messages are now being scored appropriately, using Bayes, and the X-Spam-Status header reflects this?

    Or were you referring to something else "working OK now"?
     
  19. mattltm

    mattltm Member

    Yes, all headers are present and correct and messages are being scored.

    I am seeing a mix of "autolearn=no", "autolearn=yes" and autolearn=unavailable" messages in the headers and I have seen a marked reduction in spam.

    Happy days! :)
     
  20. cbj4074

    cbj4074 Member

    Happy days, indeed! If ever the Skype "dancing man" icon were relevant, it would be here.

    Nice work! Cheerio!
     

Share This Page