Re: [OT] Tom's/Marc's spam filters? - Mailing list pgsql-general

From Joe Conway
Subject Re: [OT] Tom's/Marc's spam filters?
Date
Msg-id 4086B6A6.5000306@joeconway.com
Whole thread Raw
In response to Re: [OT] Tom's/Marc's spam filters?  (Michael Chaney <mdchaney@michaelchaney.com>)
Responses Re: [OT] Tom's/Marc's spam filters?
Re: [OT] Tom's/Marc's spam filters?
Re: [OT] Tom's/Marc's spam filters?
List pgsql-general
Michael Chaney wrote:
> Make sure you have the latest SA and make sure that Bayesian filtering
> is turned on and working, and make sure to train the filter.  Reply to
> me offlist if you need a group of 5000 or so spams to help train it.

I've got the latest SA and I'm using Bayesian filtering, autolearn,
razor2, dcc, and pyzor. I'm also using relays.ordb.org,
sbl.spamhaus.org, bl.spamcop.net, and blackholes.five-ten-sg.com
(although I just added that last one yesterday). I've verified that
autolearn is working. I have my threshold set downward, from the default
of 5.0, to 2.5.

I get a comparible amount of spam (~600 to 1000 per day) and my setup
*was* about 98% effective until a month or so ago. These days it is more
like 80%. I've noticed many of the spam getting through appears
specifically targeted at getting by SA -- no HTML, a paragraph of
nonsense (or sometimes out of some public domain book), and a one liner
trying to sell me a mortgage or something.

The one thing I had *not* been doing, but started to do as of last
night, is to use the false-negatives to explicitly train the Bayesian
filter.  It was easy enough to set up. I created an hourly cron job as
follows:

   /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox

Now I just drop all false negatives into that mailbox, and clean them
out periodically. Hopefully that will make a significant improvement.

Joe

pgsql-general by date:

Previous
From: Michael Chaney
Date:
Subject: Re: [OT] Tom's/Marc's spam filters?
Next
From: Jord Tanner
Date:
Subject: kill -2