Ask SIPB - January 7, 2004

Deluged with spam? In this column, we revisit the topic of spam filtering with SpamAssassin and discuss the new changes made to MIT's SpamAssassin configuration.

What is SpamAssassin?

SpamAssassin is a mail filter that allow users to control the junk (spam) mail they receive, and has been available to the MIT community since February 2003. It uses a set of rules to give each incoming e-mail message a numerical spam score. Messages with scores greater than a configurable threshold get marked as spam, allowing users to deal with them appropriately.

SpamAssassin tags your email so that you can filter and delete messages that might be spam. While this service is optional and not enabled by default, we recommend that you use it if you get a lot of spam. Keep in mind, however, that the filter is not perfect, so you should do at least a cursory check of your suspected spam before deleting it.

How do I enable SpamAssassin?

If you are using an IMAP mail client, such as Evolution, Mozilla, Outlook, or Athena Pine, you can have all messages marked as spam filtered into a separate folder automatically. Simply create a new folder in your INBOX named Spamscreen. Warning: If you create such a folder, you will not be able to use POP mail clients, such as Eudora, SIPB Pine, or nmh, to view email tagged as spam by the po servers.

For information on configuring SpamAssassin's settings, or enabling spam filtering with non-IMAP mail clients, you can refer to our March 14 column at http://www.mit.edu/~asksipb/2003columns/2003-03-14-spamassassin/ and the I/S Spam Screening web page at http://web.mit.edu/is/help/nospam/.

How do I get zephyr notification of non-spam mail only?

As you may know, you can get zephyr notification of incoming mail by subscribing to <mail, *, %me%>. If you've created the Spamscreen folder in your mailbox, and you'd like to get zephyr notification of non-spam mail only, you can subscribe instead to <mail, inbox, %me%>.

If you're using zwgc, the default zephyr client, you can do this by typing:

athena% zctl add mail inbox %me%
Then, to remove the old subscription:
athena% zctl del mail \* %me%
For more information on zephyr, you can refer to our August 27, 2003 column at http://www.mit.edu/~asksipb/2003columns/2003-08-27-zephyr/.

What's new in SpamAssassin?

In December, a new version of SpamAssassin was installed on each of the MIT Post Office servers (po9, po10, po11, po12, and po14). The major new feature is word-by-word statistical filtering, also known as Bayesian filtering.

This method analyzes messages considered spam and non-spam (called "ham"), and records what words are found in each. When new mail comes in, it analyzes the words in the message and uses the previously recorded statistics to determine whether the message is spam. This method allows the filters to be constantly updated, and is generally very effective.

For a more detailed look at this method, refer to Paul Graham's "A Plan for Spam" at http://www.paulgraham.com/spam.html, and MIT's Information Technology Architecture Group's presentation on the topic at http://web.mit.edu/itag/seminars/20031210.html.

How do I train the Bayesian filter?

Training the Bayesian filter is only possible using an IMAP mail client. If you have created a Spamscreen folder as described above, but receive a piece of spam that is misclassified, copy the mail to your Spamscreen folder. Each night, the filter will be trained using the mail found in that folder.

To do so, if you are using a graphical mail client, drag the message into the folder with your mouse. If you are using Athena Pine, press S (for Save), and then type Spamscreen.

Conversely, if a legitimate message ends up in the Spamscreen folder, you should train the filter so that it can avoid making the same mistake in the future. To do so, create a Hamscreen folder in your INBOX. Then, copy the legitimate message into the Hamscreen folder, and the filter will be trained with the message that night.

In both cases, after at least one night has passed, you can go back and delete these messages from the Spamscreen and Hamscreen folders as they are no longer needed. Note that all of MIT shares a common Ham and Spam training database, so you will also benefit from other users' training.


To ask us a question, send email to sipb@mit.edu. We'll try to answer you quickly, and we can address your question in our next column. You can also stop by our office in W20-557 or call us at x3-7788 if you need help. Copies of each column and pointers to additional information are posted on our website: http://www.mit.edu/~asksipb/