Here are some interesting milestones in my personal efforts to keep my inbox free of the dreaded scourge, spam. Click here to return to my home page. This page was last updated at $Date: 2011/02/22 15:25:40 $.
For several years prior to October 2008, my email setup consisted of:
This setup was consistently achieving spam-blocking a success rate very close to 100%, with very few false positives. It was not unusual for bogofilter to reject 1,000 or even more spam message per day.
In October 2008, I stopped using the NJABL blocklist, because it was being used in a political "dirty tricks" attack against Barack Obama's presidential campaign and its administrators weren't doing anything to put a stop to it.
Around March 10, 2010, the accuracy of my spam filter plummeted virtually overnight. For the next several weeks, its success rate averaged around 80% and at times was as low as 70%. Ouch!
Upon investigation, I discovered that there was a new breed of spam coming into my inbox which was quite effective at circumventing bogofilter. While spammers had been adding "Hash busters" to their spam and engaging in other, similar "Bayesian poisoning" attacks for years, This new kind of spam was inserting several randomly selected passages of real text on widely varying topics at the end of each spam message. For example, one spam message of this sort contained excerpts from the following Web pages, among others:
While it is difficult to know for certain, it seems likely that the program generating this spam does so by simply fetching the Wikipedia Random article link repeatedly, grabbing a small excerpt from each returned page, and inserting it into the spam.
This seemingly new strategy, of using multiple text passages on different topics, was quite successful at hitting on at least a couple of "ham-heavy" keywords (i.e., keywords that my bogofilter database thinks are more likely to appear in non-spam than spam), thus causing a much later percentage of spam messages to make it through the filter.
On April 6, 2010 I was able to once again get my spam load under control by enhancing my spam filtering as follows:
On April 7, 2010, I released my new version of bogofilter-milter along with a new bogofilter-milter home page which describes how to use it and in particular describes in detail my own personal setup, including links to all of the scripts and tools that I use.
On April 9, 2010, I realized just how bad it was that although my primary mail server was filtering incoming mail against the Spamhaus ZEN blocklist, my MX server was not. Spammers often send email through MX servers to circumvent anti-spam mechanisms in place only on a site's primary mail server, and that was exactly what was happening to me. I got rid of the MX server, and the number of spam senders blocked by the Spamhaus list jumped from 8,200 on April 8 to 26,592 on April 9. Similarly, the number of spam messages that made it through to bogofilter dropped from 534 on April 8 to 381 on April 9, and dropped even lower, to 179, on April 10. Having an MX server is important, but I won't be turning mine back on until I can set one up that does blocklist filtering!
Late on April 14, 2010, I brought on-line my new MX server with Spamhaus ZEN blocklist support.
On May 24, 2010, I added SpamAssassin to my mail configuration. I do not use SpamAssassin to reject email directly, but rather to generate SpamAssassin header keywords which are then be used by bogofilter to help distinguish spam from non-spam. Adding SpamAssassin to my configuration did not make a noticeable difference in the effectiveness of bogofilter.
On June 30, 2010, I retuned bogofilter with my training script, and bogofilter's accuracy immediately increased dramatically. Gotta remember to retune when things start to look bad!
October 2, 2010: Something has definitely been going right in the past few months in the fight against spam, email viruses, etc. As noted above, I filter incoming SMTP connections with the Spamhaus ZEN blocklist. As of May 14, 2010, I was seeing a 30-day average of 29,184 bad connection attempts per day. In contrast, as of October 1, 2010, I am seeing a 30-day average of only 14,287 attempts per day, a reduction of 51%. Wow! Here's a graph:
I think some really significant bot networks must have been killed in the last few months. To whoever is responsible for this: kudos and keep up the good work!
On February 4, 2011, I retuned bogofilter with my training script.
On February 22, 2011, I retuned bogofilter again, since I was seeing more spam than I should have been. I've found that sometimes bogotune gets “tricked” by the particular corpus you give it and generates sub-optimal parameters, but if you train again after a few days or weeks it usually corrects itself.
Also, I noticed on February 22, 2011 that there has been a huge spike in spam over the past few days. In the 30 days prior to February 19, my average per-day spam count was 98, with a peak of 137. In contract, for February 19-21, the spam counts have been 207, 253 and 315, for an average of 264. I think somebody has brought a new spam botnet online!