Dan Ellis : Resources :

SPAM filtering

Introduction

Round about mid 2002, the amount of SPAM in my email inbox got to the level where it was no longer an annoyance that could be easily overlooked. I was driven to look for technical help, and lo and behold, the net delivered. I now have email filtering that lets through only 2 or 3 spam messages a week, with a false alarm rate of maybe one message a month.

I'm using a combination of two filters:

Using SpamAssassin at EE.COLUMBIA.EDU

SpamAssassin has been installed on the main EE department mail server, and you can easily configure your mail client to take advantage of it. This involves two steps:

  1. Enable SpamAssassin tagging of your mail: SpamAssassin acts as a filter that all your incoming messages go through. It looks at the message headers and content, and assigns a "spamicity" score based on its internal rules. It then adds extra headers to the message indicating the spam score to the message before passing it on to your normal mail program.

    To enable SpamAssassin tagging of your mail, you need to set up a file called ".procmailrc" in your Unix home directory. The file should contain the following:

    
    # SPAMASSASSIN
    # Version 2.43 - central client version
    :0fw
    | /usr/bin/spamc
    
  2. Configure your email client to redirect the Spam: You can now configure your local email reading program to handle the spam however you like, based on the headers added by SpamAssassin; the most common choice is to have messages marked spam immediately redirected to a separate SPAM folder, which you can scan periodically to check for erroneously tagged messages.

    The exact way to do this depends on the program you use to read mail; there are good instructions for a variety of programs here.

Spam Trends (2003-06-19 et seq.)

Something bad is happening with SPAM. I analyzed the dates of the messages that ended up in my SPAM folder - now a total of 18,000 in 20 months. But things have really picked up lately:

[Chart of spam messages per day]

SPAM levels doubled in about 2 weeks around the beginning of May 2003. It may just be that somehow my address got onto some widely-distributed list - but I can't think of anything obvious that would account for it. It would be interesting to correlate this with other people's experiences. Certainly, everyone feels that SPAM levels are getting difficult to manage.

Spam dropped off in mid July 2003 because of a block that accidentally discarded all mail forwarded from my old address at Berkeley (which was also forwarding all the mail from my very old address at MIT). Hmm, maybe there's a useful lesson there...

Spam massively jumped up in mid August 2003 because of the Sobig.F virus, and that's not even counting another 60 or so bounce messages I've received as a result of Sobig virii masquerading as coming from me.

In June 2004, one machine (or a small group?) starting sending all its virus messages using my address as the "From:"; as a result, I started getting several thousand bounce messages per day, and had to start automatically deleting spam messages with large spamassassin scores.

I recently (2004-06-22) added the green trace showing the raw number of messages delivered to my inbox (i.e. getting past SpamAssassin). This number has stayed roughly constant at 20-80 messages/day over the past 2 years; it's only around May 2004 that the number of spam messages started consistently exceeding this.

In Sep 2005, our sysadmin Christian Gough installed a MIMEdefang/ SpamAssassin setup centrally on our mail server, so any immediately obvious spam is bounced rather than being delivered. This drastically cut the spam that was left for my, downstream, spamassassin setup to catch -- and it dropped still further after a few weeks when he tuned the rules.

Interestingly, in the year following that original introduction, spam has been creeping back up - partly due to the ever increasing total volume, but probably because spammers have developed some immunity to that particular set of SpamAssassin rules.


Last updated: $Date: 2003/08/27 20:34:29 $

Dan Ellis <[email protected]>