Spamfiltering in Niksula

How the system works?

There is SpamAssassin installed and spamd daemon running in mailer. SpamAssassin tries to detect spam by doing various tests. Found spam is tagged with specified headers.

How to do it?

We now show to you some different examples how to use spam filtering. Those are for different needs, some needing more knowledge and work, but are more efficient. Users who are getting lots of spam have more motivation to fight back spam with better weapons.

If you don't know what procmail is or can't understand the following examples, you should check 'man procmail' first.

1) The easy way - basic spamfiltering

All incoming mail to Niksula are tagged with some default settings. Therefore, all you have to do is to add the following lines to your ~/.procmailrc

MAILDIR=$HOME/mail
   
:0:
* ^X-Spam-Niksula: Yes
spambox

NOTE! Change value of MAILDIR variable to your mail directory. It can be ~/mail (as above), ~/Mail or something else.

This filters your mail through SpamAssassin and saves spam to folder called "spambox". No system is foolproof, so please check "spambox" folder frequently.

2) The medium level - spamfiltering reloaded

This example shows how to use some personal settings. Different users get different kind spam so default settings can't be the best for everyone.

Similarly, add some linest to your ~/.procmailrc

MAILDIR=$HOME/mail

:0fw: spamassassin.lock
| /c/smtp/bin/spamc

:0:                   
* ^X-Spam-Niksula: Yes
spambox

To do some personal configuration edit ~/.spamassassin/user_prefs

You can set lots of variables to suit better for your own mails. The default required score for spam is 5.0. If you still get spam with scores like 4.5 or something, you can set the limit lower.

# How many hits before a mail is considered spam.
required_hits     4.0

And now also the spams with scores higher than 4.0 go to your spam folder.

See Mail::SpamAssassin::Conf POD documentation for details of what can be tweaked.

3) Last but far from the least - the bayesian filtering

Even though SpamAssassin has very good rule-based filters, sometimes even those are not enough. Some people might get even hundreds of spams per day so it is very annoying. SpamAssassin includes a Bayesian module that learns from your emails which ones are spam and which ones are good (ham).

Again you have to add some lines to your procmailrc.

MAILDIR=$HOME/mail
   
:0fw: spamassassin.lock
| /c/smtp/bin/spamc
                    
:0:
* ^X-Spam-Niksula: Yes
spambox

Now you should make two colloctions of training material. A suitable size is something between 200 to 1000 mails of both ham and spam. 200 is the default minimum of SpamAssassin and too large collection is not very good either. Make sure that in the ham collection is only good mails and the spam collection is only spam.

Once you have a mailbox full of spam and another one full of ham ("ham" is the technical term for non-spam), ssh to some Solaris machine and run sa-learn on them. This may take some time so it is nice to use command 'nice'.

Please use some other Solaris machine than kekkonen.cs.hut.fi.

nice /p/bin/sa-learn --showdots --spam --mbox spambox
nice /p/bin/sa-learn --showdots --ham --mbox hambox

Once it is done, add the following line in your .spamassassin/user_prefs file:

use_bayes 1

You can test that the new filter is working by checking the X-Spam-Status headers of emails that you receive; SpamAssassin tests that look like BAYES_00 should be visible even in ordinary emails.

And for reminder: No system is foolproof, so please check your spamfolder frequently.


This page was last updated on October 22th 2004 by the feared URUG of Niksula.