sa-learn seems not to work

viniciusmassuchetto · Apr 23, 2011

I'm using ISPConfig 3 on Debian Lenny.

I have a lot of messages separated in a "global Junk" mail folder. Things seems to go well when I run:
Code:
sa-learn --spam --dir .Junk/cur/
But even learning tons of messages that are supposed to be SPAM, the day after they get to our servers again without being marked, as nothing has been learned before.

Also, the configuration on /etc/amavis/conf.d/50-user differs from the ones in the ISPConfig Panel. For example, I see by the amavis logs that "$sa_tag_level_deflt" and "$spam_quarantine_to" are completely ignored in that file, and that ISPConfig uses the values in its "spamfilter_policy" database table.

Not sure if things are related, but like that I can't figure out where ISPConfig tells amavis + sa to get the learned rules from.

Many Thanks

viniciusmassuchetto · Apr 24, 2011

Maybe I was doing it wrong. I was actually creating the bayes database into /root/.spamassassin/ folder. As I'm with amavis integrated, the right folder seems to be /var/lib/amavis/.spamassassin.

So I used the --dbpath option pointing to this folder in sa-learn, and it seemed to increase de database, as the bayes_toks file increased almost 2MB.

After this, when I went to ISPConfig Panel and added some spamfilter rules in black/whitelist. Then the size of the bayes_toks file in the amavis folder just went back to the size it was before I ran the sa-learn on them, as I could see by the modification time.

After all... what's the right way of learning spam with ISPConfig?

cbj4074 · Aug 20, 2012

I have the exact same question:

How is one supposed to train SpamAssassin, manually, using the "sa-learn" executable when using Dovecot + Amavis + SpamAssassin + ISPConfig?

The original poster's attempts to flag spam were failing because he was executing the "sa-learn" executable as the "root" user, so the Bayesian tokens were not being added to the effective user's (amavis's) database. (The tokens were being added to the "root" user's database.)

Not sure if things are related, but like that I can't figure out where ISPConfig tells amavis + sa to get the learned rules from.
Click to expand...

As I'm with amavis integrated, the right folder seems to be /var/lib/amavis/.spamassassin
Click to expand...

I have found this to be the case as well. How? By discovering that SpamAssassin's "bayes_path" directive is not defined anywhere on the system in question, and the relevant source code indicates that the default value is ~/.spamassassin/bayes, which should translate to /var/lib/amavis/.spamassassin in the normal course of events.

I tried the following:
Code:
# su amavis -c 'sa-learn --spam "/var/vmail/example.com/sa-training/Maildir/.INBOX.Spam"'

archive-iterator: no access to /var/vmail/example.com/user/Maildir/cur: 13 at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 539.
archive-iterator: no access to /var/vmail/example.com/user/Maildir/cur: 13 at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 771.
archive-iterator: unable to open /var/vmail/example.com/user/Maildir/cur: 13
This does not work because the permissions on each user's "mail directory" (e.g., Maildir) are 700, with vmail:vmail ownership. Adding the "amavis" user to the "vmail" group will not solve the problem, due to the 700 permissions.

Am I missing something obvious?

Thank you.

UPDATE:

Indeed I was missing something "obvious".

The solution is to include the --username switch to the 'sa-learn' executable, e.g.:
Code:
# sa-learn --username=amavis --spam /var/vmail/example.com/trainer/Maildir/.Spam/cur
This enables one to execute the command as "root" or "vmail", which provides for the necessary permissions, while at the same time adding the tokens to the "amavis" user's database.

cbj4074 · Aug 20, 2012

Also, I am curious to know if ISPConfig configures Amavis to maintain a separate Bayes token database for each virtual mail user (e.g., within a database).

Or, does ISPConfig configure Amavis to use a single Bayes database, e.g., that in /var/lib/amavis/.spamassassin?

till · Aug 21, 2012

A single bayes database is used as far as I know. There is no special configuration in ISPConfig about this, so the defaults of the Linux distribution were you installed the system on are used.

cbj4074 · Aug 21, 2012

Thanks, Till!

Do you happen to know whether or not it is necessary to restart Amavis for changes to the Bayes database to be effective?

I realize that SpamAssassin is accessed on-demand when used with Amavis, but it's not clear whether the Bayes values are loaded once when Amavis is started, or whether a look-up is performed against whatever data exists in the Bayes database with Amavis's every request to SpamAssassin.

Thanks again.

cbj4074 · Aug 27, 2012

cbj4074 said: ↑
UPDATE:

Indeed I was missing something "obvious".

The solution is to include the --username switch to the 'sa-learn' executable, e.g.:
Code:
# sa-learn --username=amavis --spam /var/vmail/example.com/trainer/Maildir/.Spam/cur
This enables one to execute the command as "root" or "vmail", which provides for the necessary permissions, while at the same time adding the tokens to the "amavis" user's database.
Click to expand...
Actually, this was not the solution; I was mistaken.

Users on the SpamAssassin mailing list pointed-out that the --username switch is intended for use with virtual user configurations, e.g., those tied to a SQL database of some kind. (It's worth noting that using an invalid --username doesn't throw a warning or error, and seems to use the current username instead.)

The solution was to "hard-code" the SpamAssassin Bayes database location in the configuration file (typically /etc/spamassassin/local.cf on Debian/Ubuntu systems):
Code:
bayes_path /var/lib/amavis/.spamassassin/bayes
With this directive in-place, the sa-learn command will always use the specified database (unless the --username argument is provided [and is valid]).

To ensure that the correct database is being used:
Code:
# spamassassin -D -t < /usr/share/doc/spamassassin/examples/sample-spam.txt 2>&1 | egrep '(bayes:|whitelist:|AWL)'

[...]
dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_toks
dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_seen
[...]
The directive is having the intended effect; even though the command is executed as "root", the Amavis user's database file is used.

Now, when training SpamAssassin, the sa-train executable can be called as the "root" user, which allows for access to the mailboxes in /var/vmail while at the same time populating the correct Bayes database (Amavis's).

mattltm · Oct 15, 2013

Is there a way to check that bayes is actually working for incoming email?

I have over 55000 example emails in my database but still get the same type of spam through.

cbj4074 · Jan 25, 2017

Yes, there is.

My recommendation is to read-through the relevant bits of the thread at https://lists.gt.net/spamassassin/users/176514/?page=1;mh=-1; .

The thread is long, but very worthwhile when it comes to understanding how SpamAssassin, Bayes, and AMaViS function together.

Note in particular the link cited in the first post of the above thread; that also contains a wealth of relevant and useful information where this problem is concerned.

If those resources don't lead you to the answer, let me know...

mattltm · Apr 22, 2014

I've taken a look through and it seems that the X-SPAM header is not getting added to any of my incoming emails.

How do I add them?

till · Apr 22, 2014

The X-SPAM header gets added when the spam score of a email exceeds the spam score 2 evel that you set in the amavis policy.

mattltm · Apr 22, 2014

till said: ↑

The X-SPAM header gets added when the spam score of a email exceeds the spam score 2 evel that you set in the amavis policy.
Click to expand...

Is that set in /etc/amavis/conf.d/50-user (Debian)?

If so, I have mine set to -9999 to make sure it is added to every email.

cbj4074 · Apr 22, 2014

That's one way to do it.

You can also just control it via the ISPConfig interface; modify the "spamfilter policy" and set the SPAM tag level to -9999.

Curious to see where you land with the Bayes. If and until one fully understands how Bayes works, especially when so-called "glue" is involved, it can be a real mother****** to get working correctly.

As you can see from the thread I cited in my initial reply to you (I started and ended that thread), I jumped through many hoops to "get it all working". Hopefully, my adventure is of use to you! Happy to answer any questions you may have when you hit the next roadblock.

mattltm · Apr 22, 2014

cbj4074 said: ↑

That's one way to do it.

You can also just control it via the ISPConfig interface; modify the "spamfilter policy" and set the SPAM tag level to -9999.

Curious to see where you land with the Bayes. If and until one fully understands how Bayes works, especially when so-called "glue" is involved, it can be a real mother****** to get working correctly.

As you can see from the thread I cited in my initial reply to you (I started and ended that thread), I jumped through many hoops to "get it all working". Hopefully, my adventure is of use to you! Happy to answer any questions you may have when you hit the next roadblock.
Click to expand...

Changing it in ISPConfig did the trick.

Just need to wait for some yummy spam to see what's going on now..

mattltm · Apr 22, 2014

That was quick!

Code:

X-Spam-Flag: NO
X-Spam-Score: -0.008
X-Spam-Level:
X-Spam-Status: No, score=-0.008 tagged_above=-9999 required=5
	tests=[HTML_MESSAGE=0.001, MIME_HTML_MOSTLY=0.001,
	T_RP_MATCHES_RCVD=-0.01] autolearn=unavailable

No mention of BAYES and should autolearn be available?

cbj4074 · Apr 29, 2014

Bayes requires at least 200 spam and 200 ham messages to be used. This explains both issues (Bayes not being used, and autolearn being unavailable).

Go back to that thread I cited earlier for explicit instructions regarding how to check your token database.

mattltm · Apr 29, 2014

I should have plenty in there...

0.000 0 3 0 non-token data: bayes db version
0.000 0 56062 0 non-token data: nspam
0.000 0 497 0 non-token data: nham
0.000 0 1220985 0 non-token data: ntokens
0.000 0 1026091208 0 non-token data: oldest atime
0.000 0 1398679531 0 non-token data: newest atime
0.000 0 1381878605 0 non-token data: last journal sync atime
0.000 0 1398726633 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire reduction count

From what I have read, autolearn=unavailable can mean almost anything.

I have checked and it seems to be working OK now.

Thanks.

cbj4074 · Apr 29, 2014

Everything is sorted? Your messages are now being scored appropriately, using Bayes, and the X-Spam-Status header reflects this?

Or were you referring to something else "working OK now"?

mattltm · Apr 29, 2014

Yes, all headers are present and correct and messages are being scored.

I am seeing a mix of "autolearn=no", "autolearn=yes" and autolearn=unavailable" messages in the headers and I have seen a marked reduction in spam.

Happy days!

cbj4074 · Apr 29, 2014

Happy days, indeed! If ever the Skype "dancing man" icon were relevant, it would be here.

Nice work! Cheerio!

Log in or Sign up

sa-learn seems not to work

viniciusmassuchetto New Member

viniciusmassuchetto New Member

cbj4074 Member

cbj4074 Member

till Super Moderator Staff Member ISPConfig Developer

cbj4074 Member

cbj4074 Member

mattltm Member

cbj4074 Member

mattltm Member

till Super Moderator Staff Member ISPConfig Developer

mattltm Member

cbj4074 Member

mattltm Member

mattltm Member

cbj4074 Member

mattltm Member

cbj4074 Member

mattltm Member

cbj4074 Member

Share This Page

Log in or Sign up

sa-learn seems not to work

viniciusmassuchetto New Member

viniciusmassuchetto New Member

cbj4074 Member

cbj4074 Member

till Super Moderator Staff Member ISPConfig Developer

cbj4074 Member

cbj4074 Member

mattltm Member

cbj4074 Member

mattltm Member

till Super Moderator Staff Member ISPConfig Developer

mattltm Member

cbj4074 Member

mattltm Member

mattltm Member

cbj4074 Member

mattltm Member

cbj4074 Member

mattltm Member

cbj4074 Member

Share This Page

Useful Searches