Comments on Learning Spam With SpamAssassin And All Of Your ISPConfig Clients [ISPConfig 3]
Learning Spam With SpamAssassin And All Of Your ISPConfig Clients [ISPConfig 3] This is a quick way of learning spam from all of your ISPConfig clients by running a quick and simple command. Please note that this is for ISPConfig 3, not 2.
16 Comment(s)
Comments
What happens after running the sa-learn command from the command line? Does SpamAssassin continue monitoring the folders into the future? Rather, does the command have to be kicked off periodically to continue learning? If the answer is the later, this would likely be included in a cron job correct?
you need to put it into a cronjob to let it automatically process the spam/ham
How should we setup this cronjob?
The cron job already exists, at least if you followed the steps in the Perfect Server how to forge series, at /usr/sbin/amavisd-new-cronjob.
Its only for small mailservers.
If you've a large mailserver sa-learn say its a too long command, bye..
admins
Simply remove the final * from all commands. If you use maildir, it is sufficient to give the directory name and sa_learn will investigate all mails in the directory.
Thank for this short guide, I have translated in Germany. Here you can see
http://www.howtoforge.de/uncategorized/ispconfig-3-clients-lernen-spam-mit-spamassassin/
. Best ThanksPlaNet Fox
Hy,
you have missing in /bin/sa_learn the Maildir so script change to:
#!/bin/bash
/usr/bin/sa-learn --spam /var/vmail/*/*/*/.Junk/*/*
/usr/bin/sa-learn --ham /var/vmail/*/*/*/cur/*
Thank you for your work!
Is it bad to run this script if a lot of the emails in the spam folder is already marked as spam by spamassasin (***SPAM*** in the title) ? Or doesn't it matter?
For the learned bayes tokens to actually be used while amavis calls spamassassin, the following line has to go in /etc/spamassassin/local.cf:
bayes_path /var/lib/amavis/.spamassassin/bayes
By default, the learned tokens go to ~/.spamassassin/ of the user under which sa-learn is run where it will never be read (since virtual mailboxes are used). Instead, all tokens have to go under the home directory of the amavis user which is /var/lib/amavis.
To verify that the correct directory is used by spamassassin, execute:
spamassassin -D -t < /usr/share/doc/spamassassin/examples/sample-spam.txt 2>&1 | egrep '(bayes:|whitelist:|AWL)'
You should see these lines:
[...]
dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_toks
dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_seen
[...]
based on the suggestions by others in this thread,
this seems to be working for me:
to set the common ie server training tokens folder:
vi /etc/spamassassin/local.cf
bayes_path /var/lib/amavis/.spamassassin/bayes
To verify that the correct directory is used by spamassassin, execute:
spamassassin -D -t < /usr/share/doc/spamassassin/examples/sample-spam.txt 2>&1 | egrep '(bayes:|whitelist:|AWL)'
You should see these lines:
dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_toks
dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_seen
show bayes sa status:
sa-learn --dump magic
training:
/usr/bin/sa-learn --spam /var/vmail/*/*/*/.Junk/*/*
/usr/bin/sa-learn --ham /var/vmail/*/*/*/cur/*
better training ham before spam as we might have picked up spam in our ham and it will unleard the ham during the spam:
/usr/bin/find /var/vmail/*/*/Maildir -maxdepth 1 -not -ipath '*/*Junk*' -not -ipath '*/*Trash*' -not -ipath '*/Maildir' -not -ipath '*/*spam*' -type d -exec /usr/bin/sa-learn --ham {} \;
/usr/bin/find /var/vmail/*/*/Maildir/ -type d \( -iname "*Junk*" -o -iname "*spam*" \) -exec /usr/bin/sa-learn --spam {} \;
crontab for ham running once as it takes over an hour and spam runs ever 2 hours:
# this is how we train spamassassin spam vs ham
22 2 * * * /usr/bin/find /var/vmail/*/*/Maildir -maxdepth 1 -not -ipath '*/*Junk*' -not -ipath '*/*Trash*' -not -ipath '*/Maildir' -not -ipath '*/*spam*' -type d -exec /usr/bin/sa-learn --ham {} \; 2>&1 > /dev/null
02 */2 * * * /usr/bin/find /var/vmail/*/*/Maildir/ -type d \( -iname "*Junk*" -o -iname "*spam*" \) -exec /usr/bin/sa-learn --spam {} \; 2>&1 > /dev/null
59 5 * * * /usr/bin/sa-learn --dump magic
amavisd-new brings a cron job (/etc/cron.d/amavisd-new), which runs /usr/sbin/amavisd-new-cronjob every 3 hours as user 'amavis'. Changing the sa-sync and sa-clean actions invoke sa-learn (with "--spam /var/vmail/*/*/Maildir/.Junk/*/*", "--ham /var/vmail/*/*/Maildir/cur/*") is simple. However, /var/vmail is vmail:vmail 700.
What is the best way to integrate:
1) run sa-learn as vmail instead of amavis?
2) run sa-learn as root instead of amavis?
3) figure out how the make /var/vmail vmail:amavis 750 for all adds/changes done via ISPConfig?
After digging into sa-learn in a amavisd-new, spamassassin, postfix, dovecot system, I decided to move the discussion to ISPConfig 3->Installation/Configuration communitity discussion [1].
[1] https://www.howtoforge.com/community/threads/sa-learn-how-to-resolve-permission-issue.73056/
my solution for the ham learning process:find /var/vmail/*/* -type d -not -path "*.Spam*" -not -path "*.Junk*" -not -path "*.Trash*" -not -path "*new*" -not -path "*tmp*" -not -path "*.Sent*" -not -path "*.Archive*" -not -path "*Maildir/cur*" -not -path "*dovecot*" -not -path "*sieve*" -not -path "*quotausage*" -not -path "*courier*" -type d -exec /usr/bin/sa-learn --ham {} \;
For centos 7 the paths are as follows:spamassassin local.cf: /etc/mail/spamassassin/local.cf
bayes db location: /var/spool/amavisd/.spamassassin/bayes
so update the file /etc/mail/spamassassin/local.cf, add "bayes_path /var/spool/amavisd/.spamassassin/bayes"
Hi there,
I'm trying to figure out if there is a way for ISPConfig 3 + SpamAssassin to isolate bayesian filters at the domain level. What I mean by that?
I have a customer that is pretty good at sorting false negative (spam in Inbox) and false positive (legit mail in Spam folder) but say I have 10 other customers who are not doing a good job there with sorting those false positives/negatives.
Now, if I run sa-learn I assume that if I learn bayesian filters on legit spam that is in Spam folder caught by one good customer then if I run sa-learn --ham through inboxes of other 10 "bad" customers it will simply overweight what it learned from the good one.
Would there be a way for per customer (domain) or even per user isolation of this?
Thanks!