Thread: Tom Lane heads up

Tom Lane heads up

From
DeJuan Jackson
Date:
Just dropping a quick not for Tom Lane.  I sent a personal message
today, but I wasn't sure if you'd get it after I remembered all of the
spam filters you've got set up.

Sorry for the off topic post.

Re: Tom Lane heads up

From
Shachar Shemesh
Date:
DeJuan Jackson wrote:

> Just dropping a quick not for Tom Lane.  I sent a personal message
> today, but I wasn't sure if you'd get it after I remembered all of the
> spam filters you've got set up.
>
> Sorry for the off topic post.

That's ok. He is only filtering me :-)

Actually, you get a rejection notice if his spam filters catch you. If
you didn't get one, your'e ok.

          Shachar

--
Shachar Shemesh
Lingnu Open Systems Consulting
http://www.lingnu.com/


[OT] Tom's/Marc's spam filters?

From
Will Trillich
Date:
On Fri, Mar 05, 2004 at 01:19:13PM +0200, Shachar Shemesh wrote:
> DeJuan Jackson wrote:
> >I sent a personal message today, but I wasn't sure if you'd
> >get it after I remembered all of the spam filters you've got
> >set up.
>
> Actually, you get a rejection notice if his spam filters catch
> you. If you didn't get one, you're ok.

is there some way of getting a look at tom's or marc's filters? i could
sure use a bit of help there. lordy, we're close to drowing in the
stuff!

if it's in the archives already, i apparently didn't hit the
right search string. a quickie pointer is all i need...

thanks in advance!

--
"Why did they hard code that value into the program?".
"My only guess would be to maximize suckage."
http://suso.suso.org/docs/apache_and_frontpage/htmldocs/part4-2.phtml

Re: [OT] Tom's/Marc's spam filters?

From
"Marc G. Fournier"
Date:
On Mon, 19 Apr 2004, Will Trillich wrote:

> On Fri, Mar 05, 2004 at 01:19:13PM +0200, Shachar Shemesh wrote:
> > DeJuan Jackson wrote:
> > >I sent a personal message today, but I wasn't sure if you'd
> > >get it after I remembered all of the spam filters you've got
> > >set up.
> >
> > Actually, you get a rejection notice if his spam filters catch
> > you. If you didn't get one, you're ok.
>
> is there some way of getting a look at tom's or marc's filters? i could
> sure use a bit of help there. lordy, we're close to drowing in the
> stuff!

Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
enabled ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [OT] Tom's/Marc's spam filters?

From
Joe Conway
Date:
Marc G. Fournier wrote:
> Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
> enabled ...

I use exactly the same setup. But recently I've noticed that the
spammers are getting smarter -- I think 20% of it is slipping by the
filters. I'm going to need something better.

Joe



Re: [OT] Tom's/Marc's spam filters?

From
Bruce Momjian
Date:
Joe Conway wrote:
> Marc G. Fournier wrote:
> > Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
> > enabled ...
>
> I use exactly the same setup. But recently I've noticed that the
> spammers are getting smarter -- I think 20% of it is slipping by the
> filters. I'm going to need something better.

Here is what I use:

    http://candle.pha.pa.us/main/writings/spam/

I get 98% blockage with no false positives, or at least only 1-2 a year
(that folks tell me about).  :-)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [OT] Tom's/Marc's spam filters?

From
Robert Creager
Date:
When grilled further on (Mon, 19 Apr 2004 21:19:05 -0700),
Joe Conway <mail@joeconway.com> confessed:

> Marc G. Fournier wrote:
> > Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
> > enabled ...
>
> I use exactly the same setup. But recently I've noticed that the
> spammers are getting smarter -- I think 20% of it is slipping by the
> filters. I'm going to need something better.
>

Have you played with the "spamassassin --report" feature?  Works fairly well if
you can integrate it into your e-mail client and report a bunch of
messages as spam.  It trains the Bayes filter and reports to Razor (at
the least).

Sylpheed Claws has actions (you use "spamassassin--report %F" as the action),
and it'll batch the report on all selected messages.

I find that after a 10-20 messages, it starts finding the ones that were
slipping through.  Since February, I have 200 missed out of 4200.

Cheers,
Rob

--
 22:28:27 up 3 days,  2:06,  3 users,  load average: 3.24, 3.08, 3.45
Linux 2.6.5-01 #5 SMP Tue Apr 6 21:32:39 MDT 2004

Attachment

Re: [OT] Tom's/Marc's spam filters?

From
Tom Lane
Date:
Will Trillich <will@serensoft.com> writes:
> is there some way of getting a look at tom's or marc's filters? i could
> sure use a bit of help there. lordy, we're close to drowing in the
> stuff!

Tell me about it :-(

I currently use four levels of filtering:

1. DNSBL lists: blackholes.five-ten-sg.com, bl.spamcop.net, relays.ordb.org
(there are others out there, but these seem to have a good impedance
match to my personal spam load).

2. Private blacklist of IP ranges that have sent me too much spam.
sendmail has a pretty easy mechanism to support this, although it
only seems to support /8 /16 or /24 ranges which is a bit coarse.
(If you've gotten a "Go away spammer" bounce from me, you were caught
by this filter --- let me know and I'll tighten the ranges.)

3. I have noticed that bouncing any machine that sends "HELO
sss.pgh.pa.us" gets rid of a ton of spam and viruses.  I don't know of
any real clean way to do this, but I have a sendmail.cf hack for it.

4. Very long list of procmail filters on header and body patterns.

#2 and #4 are fairly personal, in the sense that they have a decent
success/failure ratio for the junk mail I get.  I wouldn't recommend
that someone else try my lists, and in any case they take a heck of a
lot of hand maintenance.  I've been looking into more automated methods
such as CRM114 but haven't made the jump yet.

            regards, tom lane

Re: [OT] Tom's/Marc's spam filters?

From
"Jim Wilson"
Date:
Tom Lane said:

> Will Trillich <will@serensoft.com> writes:
> > is there some way of getting a look at tom's or marc's filters? i could
> > sure use a bit of help there. lordy, we're close to drowing in the
> > stuff!
>
> Tell me about it :-(
>
> I currently use four levels of filtering:
>
> 1. DNSBL lists: blackholes.five-ten-sg.com, bl.spamcop.net, relays.ordb.org
> (there are others out there, but these seem to have a good impedance
> match to my personal spam load).
>
> 2. Private blacklist of IP ranges that have sent me too much spam.
> sendmail has a pretty easy mechanism to support this, although it
> only seems to support /8 /16 or /24 ranges which is a bit coarse.
> (If you've gotten a "Go away spammer" bounce from me, you were caught
> by this filter --- let me know and I'll tighten the ranges.)
>
> 3. I have noticed that bouncing any machine that sends "HELO
> sss.pgh.pa.us" gets rid of a ton of spam and viruses.  I don't know of
> any real clean way to do this, but I have a sendmail.cf hack for it.
>
> 4. Very long list of procmail filters on header and body patterns.
>
> #2 and #4 are fairly personal, in the sense that they have a decent
> success/failure ratio for the junk mail I get.  I wouldn't recommend
> that someone else try my lists, and in any case they take a heck of a
> lot of hand maintenance.  I've been looking into more automated methods
> such as CRM114 but haven't made the jump yet.

Yes they sure are.  I tried my personal blacklist on a client's server one
time after they complained of seeing dozens a minute slipping by.  It did just
about nothing, but it got them started on their own.  #3 looks interesting
though...

Best regards,

Jim Wilson


Re: [OT] Tom's/Marc's spam filters?

From
"Keith C. Perry"
Date:
Quoting Tom Lane <tgl@sss.pgh.pa.us>:

> Will Trillich <will@serensoft.com> writes:
> > is there some way of getting a look at tom's or marc's filters? i could
> > sure use a bit of help there. lordy, we're close to drowing in the
> > stuff!
>
> Tell me about it :-(
>
> I currently use four levels of filtering:
>
> 1. DNSBL lists: blackholes.five-ten-sg.com, bl.spamcop.net, relays.ordb.org
> (there are others out there, but these seem to have a good impedance
> match to my personal spam load).
>
> 2. Private blacklist of IP ranges that have sent me too much spam.
> sendmail has a pretty easy mechanism to support this, although it
> only seems to support /8 /16 or /24 ranges which is a bit coarse.
> (If you've gotten a "Go away spammer" bounce from me, you were caught
> by this filter --- let me know and I'll tighten the ranges.)

There is a sendmail script for this called "cidrexpand" that allows you to put
in CIDR blocks- i.e. things like 216.185.96.0/19 can be put into the sendmail
access file.

<--stuff deleted-->


--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

Re: [OT] Tom's/Marc's spam filters?

From
Karel Zak
Date:
On Tue, Apr 20, 2004 at 01:06:18AM -0400, Tom Lane wrote:
> Will Trillich <will@serensoft.com> writes:
> > is there some way of getting a look at tom's or marc's filters? i could
> > sure use a bit of help there. lordy, we're close to drowing in the
> > stuff!
>
> Tell me about it :-(
>
> I currently use four levels of filtering:
>
> 1. DNSBL lists: blackholes.five-ten-sg.com, bl.spamcop.net, relays.ordb.org
> (there are others out there, but these seem to have a good impedance
> match to my personal spam load).
>
> 2. Private blacklist of IP ranges that have sent me too much spam.
> sendmail has a pretty easy mechanism to support this, although it
> only seems to support /8 /16 or /24 ranges which is a bit coarse.
> (If you've gotten a "Go away spammer" bounce from me, you were caught
> by this filter --- let me know and I'll tighten the ranges.)
>
> 3. I have noticed that bouncing any machine that sends "HELO
> sss.pgh.pa.us" gets rid of a ton of spam and viruses.  I don't know of
> any real clean way to do this, but I have a sendmail.cf hack for it.
>
> 4. Very long list of procmail filters on header and body patterns.

 It must  be pretty  difficult maintain these  header and  body patterns
 and  the  others  lists.  I  had  same  problem  and  I resolve  if  by
 "spamassassin", it  knows learn  and it's  more simple  than procmailrc
 coding. Now I have cca 5% of all spams in my INBOX.

    Karel
--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

Re: [OT] Tom's/Marc's spam filters?

From
"Marc G. Fournier"
Date:
On Mon, 19 Apr 2004, Joe Conway wrote:

> Marc G. Fournier wrote:
> > Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
> > enabled ...
>
> I use exactly the same setup. But recently I've noticed that the
> spammers are getting smarter -- I think 20% of it is slipping by the
> filters. I'm going to need something better.

do you force learn those spam that get through the cracks?  I get about 20
or 30 messages that slip through the cracks, which I process through with
sa-learn nightly ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [OT] Tom's/Marc's spam filters?

From
Steve Manes
Date:
Karel Zak wrote:

>  It must  be pretty  difficult maintain these  header and  body patterns
>  and  the  others  lists.  I  had  same  problem  and  I resolve  if  by
>  "spamassassin", it  knows learn  and it's  more simple  than procmailrc
>  coding. Now I have cca 5% of all spams in my INBOX.

It's not that difficult here but I'm using Postfix, which has built in
pattern checking.  Because my mail server also hosts a bunch of topical
internet mailing lists (mainly motorcycle and bass player stuff) and all
of their admin addresses were harvested by spammers long ago, I don't
just get one copy of spam.  I usually get several because each of those
admin addresses eventually alias back to me.

I don't use SpamAssassin or Razor but I manage to kill 95% of spam at
the SMTP stage, before the message is accepted for delivery.  This works
better than a delivery stage mail processor like procmail because it
bounces the spam back to the server actually sending it.  It's easy to
see from the maillogs what IPs are regularly sending me this crap so
they can be blackholed permanently.  I think I've got most of CHINANET
in the bit bucket now <g>.



Re: [OT] Tom's/Marc's spam filters?

From
"Matthew D. Fuller"
Date:
On Tue, Apr 20, 2004 at 05:35:51AM -0000 I heard the voice of
Jim Wilson, and lo! it spake thus:
> Tom Lane said:
> >
> > 3. I have noticed that bouncing any machine that sends "HELO
> > sss.pgh.pa.us" gets rid of a ton of spam and viruses.  I don't know of
> > any real clean way to do this, but I have a sendmail.cf hack for it.
>
> #3 looks interesting though...

I've been blocking HELO as anything under my domain, as well as my IP
address (as well as any bare IP addresses) for a while, and it
certainly drops a fair bit.  And I maintain a long list of HELO names,
AND IP ranges, AND sending hostnames, AND senders domains, plus all
the filtering I do after accepting the mail...  Wacky.  If we just
renamed 'spam' to 'justifiable homicide'...


--
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/

"The only reason I'm burning my candle at both ends, is because I
      haven't figured out how to light the middle yet"

Re: [OT] Tom's/Marc's spam filters?

From
Joe Conway
Date:
Marc G. Fournier wrote:
> do you force learn those spam that get through the cracks?  I get about 20
> or 30 messages that slip through the cracks, which I process through with
> sa-learn nightly ...

No, I haven't been doing that, but I guess I ought to start. Thanks for
the suggestion!

Joe

Re: [OT] Tom's/Marc's spam filters?

From
"Marc G. Fournier"
Date:
On Tue, 20 Apr 2004, Joe Conway wrote:

> Marc G. Fournier wrote:
> > do you force learn those spam that get through the cracks?  I get about 20
> > or 30 messages that slip through the cracks, which I process through with
> > sa-learn nightly ...
>
> No, I haven't been doing that, but I guess I ought to start. Thanks for
> the suggestion!

Also check to make sure that you don't have autolearn disabled ... you
would have had to do it manually, as it is enabled by default, but, for
instance, if you are a user on a system, the site-wide may be set to
disable autolearn, so you'd have to enable it yourself ...

I'm looking forward to 3.x coming out, as the Bayes stuff will be able to
run out of an SQL database instead of flat files ... so servers running
Cyrus IMAPd, where there are no physical user accounts, will be able to
start makng use of Bayes as well ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [OT] Tom's/Marc's spam filters?

From
"Keith C. Perry"
Date:
Quoting "Matthew D. Fuller" <fullermd@over-yonder.net>:

> On Tue, Apr 20, 2004 at 05:35:51AM -0000 I heard the voice of
> Jim Wilson, and lo! it spake thus:
> > Tom Lane said:
> > >
> > > 3. I have noticed that bouncing any machine that sends "HELO
> > > sss.pgh.pa.us" gets rid of a ton of spam and viruses.  I don't know of
> > > any real clean way to do this, but I have a sendmail.cf hack for it.
> >
> > #3 looks interesting though...
>
> I've been blocking HELO as anything under my domain, as well as my IP
> address (as well as any bare IP addresses) for a while, and it
> certainly drops a fair bit.  And I maintain a long list of HELO names,
> AND IP ranges, AND sending hostnames, AND senders domains, plus all
> the filtering I do after accepting the mail...  Wacky.  If we just
> renamed 'spam' to 'justifiable homicide'...
>
>
> --
> Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
> Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
>
> "The only reason I'm burning my candle at both ends, is because I
>       haven't figured out how to light the middle yet"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

We could only wish for "justifiable homicide".  Now there's a law I would
support!  :)

Are you guys miltering to drop the messages with those HELO patterns?  I'm
nailing 80%+ across all my clients and I may get 20 to 50 spams/day (down from
200+) which is acceptable but I was going to start using some netfilter hooks
(i.e. Linux firewall code) to inspect mail traffic and apply some more patterns.
If you guys are getting 95%+ via miltering then thats definitely the way to go.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

Re: [OT] Tom's/Marc's spam filters?

From
jseymour@LinxNet.com (Jim Seymour)
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
[snip]
>
> 3. I have noticed that bouncing any machine that sends "HELO
> sss.pgh.pa.us" gets rid of a ton of spam and viruses.

IOW: Anything the HELOs with your mail server's own hostname.  If you
can do it: Changing that to anything that HELOs with your domain name
(that's not supposed to) and you'll catch still more. Add to that
anything HELOing with your mail server's IP address and you'll catch
more yet.

>                                                        I don't know of
> any real clean way to do this, but I have a sendmail.cf hack for it.
[snip]

Postfix, which is what I use, has built-in support for HELO checks.

--
Jim Seymour                | Spammers sue anti-spammers:
jseymour@LinxNet.com       |     http://www.LinxNet.com/misc/spam/slapp.php
http://jimsun.LinxNet.com  | Please donate to the SpamCon Legal Fund:
                           |     http://www.spamcon.org/legalfund/

Re: [OT] Tom's/Marc's spam filters?

From
Will Trillich
Date:
On Tue, Apr 20, 2004 at 10:17:05AM -0300, Marc G. Fournier wrote:
> On Mon, 19 Apr 2004, Joe Conway wrote:
>
> > Marc G. Fournier wrote:
> > > Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
> > > enabled ...
> >
> > I use exactly the same setup. But recently I've noticed that the
> > spammers are getting smarter -- I think 20% of it is slipping by the
> > filters. I'm going to need something better.
>
> do you force learn those spam that get through the cracks?  I get about 20
> or 30 messages that slip through the cracks, which I process through with
> sa-learn nightly ...

i have been doing that some -- but i still get about 200 false
negatives per day. takes too much time to run 'sa-learn' all the
time when it seems like spam #n is an awful lot like spam #n-1.

--
"Why did they hard code that value into the program?".
"My only guess would be to maximize suckage."
http://suso.suso.org/docs/apache_and_frontpage/htmldocs/part4-2.phtml

Re: [OT] Tom's/Marc's spam filters?

From
"Marc G. Fournier"
Date:
On Tue, 20 Apr 2004, Will Trillich wrote:

> On Tue, Apr 20, 2004 at 10:17:05AM -0300, Marc G. Fournier wrote:
> > On Mon, 19 Apr 2004, Joe Conway wrote:
> >
> > > Marc G. Fournier wrote:
> > > > Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
> > > > enabled ...
> > >
> > > I use exactly the same setup. But recently I've noticed that the
> > > spammers are getting smarter -- I think 20% of it is slipping by the
> > > filters. I'm going to need something better.
> >
> > do you force learn those spam that get through the cracks?  I get about 20
> > or 30 messages that slip through the cracks, which I process through with
> > sa-learn nightly ...
>
> i have been doing that some -- but i still get about 200 false
> negatives per day. takes too much time to run 'sa-learn' all the
> time when it seems like spam #n is an awful lot like spam #n-1.

I'm down to ~20 false positives right now ... usually spent my last half
hour in front of the tv at night sorting them out and filtering them
through bayes ...

My spam filters right now are picking up between 2000->3000 messages per
day which aren't getting into my main folders ...


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [OT] Tom's/Marc's spam filters?

From
Michael Chaney
Date:
On Mon, Apr 19, 2004 at 09:19:05PM -0700, Joe Conway wrote:
> Marc G. Fournier wrote:
> >Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
> >enabled ...
>
> I use exactly the same setup. But recently I've noticed that the
> spammers are getting smarter -- I think 20% of it is slipping by the
> filters. I'm going to need something better.

No offense, but that means you're not doing it right.  I use SA with
Bayes (and everything else), and I'm getting better than 98% with no
false positives.  Yesterday I had 823 spams (you read that correctly)
with 9 that made it through.  When I woke up this morning, I had 334
spams with 2 that made it through.

I constantly train my Bayesian filter by using an email address I set
up where I forward all false-negatives.  So the few that get through
won't be doing that again.  It simply runs them through sa-learn.  If I
get some time, I'll post the code to my web site.

Spammers cannot outsmart a Bayesian filter.  It's game-over.  You don't
need to upgrade, you need to figure out how to make your current setup
work.

Make sure you have the latest SA and make sure that Bayesian filtering
is turned on and working, and make sure to train the filter.  Reply to
me offlist if you need a group of 5000 or so spams to help train it.

Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/

Re: [OT] Tom's/Marc's spam filters?

From
Michael Chaney
Date:
On Tue, Apr 20, 2004 at 01:30:59PM -0300, Marc G. Fournier wrote:
> Also check to make sure that you don't have autolearn disabled ... you
> would have had to do it manually, as it is enabled by default, but, for
> instance, if you are a user on a system, the site-wide may be set to
> disable autolearn, so you'd have to enable it yourself ...
>
> I'm looking forward to 3.x coming out, as the Bayes stuff will be able to
> run out of an SQL database instead of flat files ... so servers running
> Cyrus IMAPd, where there are no physical user accounts, will be able to
> start makng use of Bayes as well ...

You should look into MailScanner, at www.mailscanner.info.  I use it as
the framework for running SA and anti-virus software, using Exim as my
mail server.  There are no physical user accounts; all virtual stuff.
MailScanner let's SA, along with the Bayesian filter, work for all email
coming through.

Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/

Re: [OT] Tom's/Marc's spam filters?

From
"Marc G. Fournier"
Date:
On Wed, 21 Apr 2004, Michael Chaney wrote:

> On Tue, Apr 20, 2004 at 01:30:59PM -0300, Marc G. Fournier wrote:
> > Also check to make sure that you don't have autolearn disabled ... you
> > would have had to do it manually, as it is enabled by default, but, for
> > instance, if you are a user on a system, the site-wide may be set to
> > disable autolearn, so you'd have to enable it yourself ...
> >
> > I'm looking forward to 3.x coming out, as the Bayes stuff will be able to
> > run out of an SQL database instead of flat files ... so servers running
> > Cyrus IMAPd, where there are no physical user accounts, will be able to
> > start makng use of Bayes as well ...
>
> You should look into MailScanner, at www.mailscanner.info.  I use it as
> the framework for running SA and anti-virus software, using Exim as my
> mail server.  There are no physical user accounts; all virtual stuff.
> MailScanner let's SA, along with the Bayesian filter, work for all email
> coming through.

Does it allow for per user preferences?  I haven't found a clean way to do
that yet, other using using the spamcheck.py lmtpproxy ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [OT] Tom's/Marc's spam filters?

From
Michael Chaney
Date:
On Wed, Apr 21, 2004 at 02:11:16PM -0300, Marc G. Fournier wrote:
> > You should look into MailScanner, at www.mailscanner.info.  I use it as
> > the framework for running SA and anti-virus software, using Exim as my
> > mail server.  There are no physical user accounts; all virtual stuff.
> > MailScanner let's SA, along with the Bayesian filter, work for all email
> > coming through.
>
> Does it allow for per user preferences?  I haven't found a clean way to do
> that yet, other using using the spamcheck.py lmtpproxy ...

Yes, MailScanner allows per-user and per-domain preferences.

Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/

Re: [OT] Tom's/Marc's spam filters?

From
Joe Conway
Date:
Michael Chaney wrote:
> Make sure you have the latest SA and make sure that Bayesian filtering
> is turned on and working, and make sure to train the filter.  Reply to
> me offlist if you need a group of 5000 or so spams to help train it.

I've got the latest SA and I'm using Bayesian filtering, autolearn,
razor2, dcc, and pyzor. I'm also using relays.ordb.org,
sbl.spamhaus.org, bl.spamcop.net, and blackholes.five-ten-sg.com
(although I just added that last one yesterday). I've verified that
autolearn is working. I have my threshold set downward, from the default
of 5.0, to 2.5.

I get a comparible amount of spam (~600 to 1000 per day) and my setup
*was* about 98% effective until a month or so ago. These days it is more
like 80%. I've noticed many of the spam getting through appears
specifically targeted at getting by SA -- no HTML, a paragraph of
nonsense (or sometimes out of some public domain book), and a one liner
trying to sell me a mortgage or something.

The one thing I had *not* been doing, but started to do as of last
night, is to use the false-negatives to explicitly train the Bayesian
filter.  It was easy enough to set up. I created an hourly cron job as
follows:

   /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox

Now I just drop all false negatives into that mailbox, and clean them
out periodically. Hopefully that will make a significant improvement.

Joe

Re: [OT] Tom's/Marc's spam filters?

From
Bruce Momjian
Date:
Joe Conway wrote:
> I get a comparible amount of spam (~600 to 1000 per day) and my setup
> *was* about 98% effective until a month or so ago. These days it is more
> like 80%. I've noticed many of the spam getting through appears
> specifically targeted at getting by SA -- no HTML, a paragraph of
> nonsense (or sometimes out of some public domain book), and a one liner
> trying to sell me a mortgage or something.
>
> The one thing I had *not* been doing, but started to do as of last
> night, is to use the false-negatives to explicitly train the Bayesian
> filter.  It was easy enough to set up. I created an hourly cron job as
> follows:
>
>    /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox
>
> Now I just drop all false negatives into that mailbox, and clean them
> out periodically. Hopefully that will make a significant improvement.

I can tell you it certainly will.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [OT] Tom's/Marc's spam filters?

From
"Marc G. Fournier"
Date:
On Wed, 21 Apr 2004, Joe Conway wrote:

> The one thing I had *not* been doing, but started to do as of last
> night, is to use the false-negatives to explicitly train the Bayesian
> filter.  It was easy enough to set up. I created an hourly cron job as
> follows:
>
>    /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox
>
> Now I just drop all false negatives into that mailbox, and clean them
> out periodically. Hopefully that will make a significant improvement.

This, for me, has made the big difference, since the false-negatives don't
get autolearned :(

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [OT] Tom's/Marc's spam filters?

From
Joe Conway
Date:
Marc G. Fournier wrote:
> On Wed, 21 Apr 2004, Joe Conway wrote:
>>   /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox
>>
>>Now I just drop all false negatives into that mailbox, and clean them
>>out periodically. Hopefully that will make a significant improvement.
>
> This, for me, has made the big difference, since the false-negatives don't
> get autolearned :(

Actually, even much of what does (correctly) get marked as spam, ends up
with autolearn=no, because it seems SpamAssassin is somewhat
conservative with autolearning. I just sent this off list to Michael Chaney:
---------------------------------------------------------------------

I've noticed that the threshold for autolearn seems too high, i.e. a
high proportion of email correctly marked as spam, has autolearn=no.
Here's an example:

X-Spam-Status: Yes, hits=3.7 required=2.5
tests=BAYES_44,HTML_FONT_INVISIBLE, HTML_IMAGE_ONLY_04,
       HTML_MESSAGE,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY,
       MIME_HTML_ONLY_MULTI autolearn=no version=2.63

Now in /etc/mail/spamassassin/local.cf I have this setting:

   # Enable Bayes auto-learning
   auto_learn              1
   bayes_auto_learn_threshold_spam    6

 From the SA docs, I get the impression that autolearn cannot be made
more aggressive.

So in order to counteract that, I just made an additional change -- I
put in a mail filter rule that automatically forwards any mail marked as
spam, but with autolearn=no, to false-neg.mbox. This should help too, I
think.

Joe


Re: [OT] Tom's/Marc's spam filters?

From
Mike Mascari
Date:
Bruce Momjian wrote:

> Joe Conway wrote:
>
>>The one thing I had *not* been doing, but started to do as of last
>>night, is to use the false-negatives to explicitly train the Bayesian
>>filter.  It was easy enough to set up. I created an hourly cron job as
>>follows:
>>
>>   /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox
>>
>>Now I just drop all false negatives into that mailbox, and clean them
>>out periodically. Hopefully that will make a significant improvement.
>
> I can tell you it certainly will.

Doesn't sa-learn also require you to teach it Ham as well? My
problem has been that sa-learn appears to ignore white-listed emails
and therefore can't learn from 90% of my Ham. Meanwhile, I get spam
that slips through SA that my Mozilla client *correctly* identifies
as Junk. Once a week, I take that Junk email, along with all Ham and
run sa-learn with the appropriate --spam/--ham switch. But it
doesn't seem to be improving. I still get spam which SA fails to
identify but which, 95% of the time, Mozilla correctly identifies.

Mike Mascari



Re: [OT] Tom's/Marc's spam filters?

From
jseymour@LinxNet.com (Jim Seymour)
Date:
Joe Conway <mail@joeconway.com> wrote:
>
[snip]
>
> The one thing I had *not* been doing, but started to do as of last
> night, is to use the false-negatives to explicitly train the Bayesian
> filter.
[snip]

As you've discovered, the hard way, one must constantly train Bayesian
filters.  This means that every false positive has to be fed back
through it with whatever means your version uses to tell it "No, this
was *not* spam," and every false negative, the converse.

--
Jim Seymour                | Spammers sue anti-spammers:
jseymour@LinxNet.com       |     http://www.LinxNet.com/misc/spam/slapp.php
http://jimsun.LinxNet.com  | Please donate to the SpamCon Legal Fund:
                           |     http://www.spamcon.org/legalfund/

Re: [OT] Tom's/Marc's spam filters?

From
"Marc G. Fournier"
Date:
On Wed, 21 Apr 2004, Mike Mascari wrote:

> Doesn't sa-learn also require you to teach it Ham as well? My problem
> has been that sa-learn appears to ignore white-listed emails and
> therefore can't learn from 90% of my Ham. Meanwhile, I get spam that
> slips through SA that my Mozilla client *correctly* identifies as Junk.
> Once a week, I take that Junk email, along with all Ham and run sa-learn
> with the appropriate --spam/--ham switch. But it doesn't seem to be
> improving. I still get spam which SA fails to identify but which, 95% of
> the time, Mozilla correctly identifies.

I'm finding it gets better over time ... a few always slip through the
crack, but not near as many today as yesterday ... as for Ham, I have a
mailbox that I save all my 'Answered Emails' to (from friends, lists, etc)
that I periodically run through as --ham

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [OT] Tom's/Marc's spam filters?

From
Lincoln Yeoh
Date:
At 10:08 AM 4/20/2004 +0200, Karel Zak wrote:
> >
> > 4. Very long list of procmail filters on header and body patterns.
>
>  It must  be pretty  difficult maintain these  header and  body patterns
>  and  the  others  lists.  I  had  same  problem  and  I resolve  if  by
>  "spamassassin", it  knows learn  and it's  more simple  than procmailrc
>  coding. Now I have cca 5% of all spams in my INBOX.

My spam:ham ratio is about 98:2 (98% spam), excluding mailing lists.

So far its manageable though rather annoying - fortunately in my situation
I can regard as spam emails that are in html (or have HTML) and not in my
whitelist. That gets rid of about 50% of the spam, the other 40% or so get
filtered via another simple filter.

My situation=I don't really have to answer messages to my personal email
account from ignorant strangers that send me html email. Your situation may
be different.

So far I haven't seen any html emails that were really worth reading, even
the one or two from relatives (who I white-list to not be rude ;) ). I go
through that folder once in a while and it works for me - so far I don't
recall having HTML emails from strangers that weren't spam.

I've had plain text messages from silly strangers (and a silly colleague)
that used lots of !!!! and stupid subject lines - actual content barely
worth replying to. e.g. Help!!!!!

Situation is different at work. But company pays for antispam software.
Ironically while we sell Sophos Puremessage (which seems to be pretty
good), it's for larger companies/orgs than us (>1000 users). ;).

The backup MX thing is not very useful in most cases. Seems similar for DNS
- doesn't appear that useful to have your names resolvable while your site
is unreachable. OK the error messages may be slightly less embarassing?

Regards,
Link.

Re: [OT] Tom's/Marc's spam filters?

From
Joe Conway
Date:
Marc G. Fournier wrote:
> On Mon, 19 Apr 2004, Joe Conway wrote:
>>Marc G. Fournier wrote:
>>>Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
>>>enabled ...
>>
>>I use exactly the same setup. But recently I've noticed that the
>>spammers are getting smarter -- I think 20% of it is slipping by the
>>filters. I'm going to need something better.
>
> do you force learn those spam that get through the cracks?  I get about 20
> or 30 messages that slip through the cracks, which I process through with
> sa-learn nightly ...

Sorry to drag this OT thread on even longer, but it seems to be a topic
many are interested in ;-)

I wanted to report back that after just 2 days of forced (supervised)
learning, the bayesian filter is now nailing about 99% of all spam.
*Many, many, thanks* for the suggestion.

But I wonder why the autolearn feature is so conservative? At this point
I'm getting lots of stuff like this:

X-Spam-Status: Yes, hits=5.8 required=2.5 tests=BAYES_99,HTML_FONT_BIG,
    HTML_MESSAGE autolearn=no version=2.63
X-Spam-Report:
    *  0.1 HTML_MESSAGE BODY: HTML included in message
    *  0.3 HTML_FONT_BIG BODY: HTML has a big font
    *  5.4 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
    *      [score: 1.0000]

Notice that, even though I get a hit on BAYES_99, I still get
autolearn=no. Ah well, I guess I should be asking that question of the
SpamAssassin guys. Also notice that this sucker would have gotten
through with a score of only 0.4 had it not been for the bayesian filter.

Again, thanks.

Joe


Re: [OT] Tom's/Marc's spam filters?

From
Alvar Freude
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

- -- Joe Conway <mail@joeconway.com> wrote:

> I use exactly the same setup. But recently I've noticed that the spammers
> are getting smarter -- I think 20% of it is slipping by the filters. I'm
> going to need something better.

I recently rebuild by bayes database because it was corrupted; feeded it with
about 1000 low-point-spam and nowabout two spams slipping by the filter in
one day while 200 to 300 are catched.


Ciao
  Alvar

- --
** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/
** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
** ODEM.org-Tour: http://tour.odem.org/
** 5 Jahre Blaster: http://www.a-blast.de/ | http://www.a-blast.de/statistik/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQFAijSAOndlH63J86wRAnQCAJ0SiuIkCu9iRKBXk9XY0IKE0glgFgCdHJl0
KVN3aQfw34S+IWokGX60OFA=
=hkKo
-----END PGP SIGNATURE-----


Re: [OT] Tom's/Marc's spam filters?

From
"Marc G. Fournier"
Date:
On Fri, 23 Apr 2004, Joe Conway wrote:

> Marc G. Fournier wrote:
> > On Mon, 19 Apr 2004, Joe Conway wrote:
> >>Marc G. Fournier wrote:
> >>>Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
> >>>enabled ...
> >>
> >>I use exactly the same setup. But recently I've noticed that the
> >>spammers are getting smarter -- I think 20% of it is slipping by the
> >>filters. I'm going to need something better.
> >
> > do you force learn those spam that get through the cracks?  I get about 20
> > or 30 messages that slip through the cracks, which I process through with
> > sa-learn nightly ...
>
> Sorry to drag this OT thread on even longer, but it seems to be a topic
> many are interested in ;-)
>
> I wanted to report back that after just 2 days of forced (supervised)
> learning, the bayesian filter is now nailing about 99% of all spam.
> *Many, many, thanks* for the suggestion.
>
> But I wonder why the autolearn feature is so conservative? At this point
> I'm getting lots of stuff like this:
>
> X-Spam-Status: Yes, hits=5.8 required=2.5 tests=BAYES_99,HTML_FONT_BIG,
>     HTML_MESSAGE autolearn=no version=2.63
> X-Spam-Report:
>     *  0.1 HTML_MESSAGE BODY: HTML included in message
>     *  0.3 HTML_FONT_BIG BODY: HTML has a big font
>     *  5.4 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
>     *      [score: 1.0000]
>
> Notice that, even though I get a hit on BAYES_99, I still get
> autolearn=no. Ah well, I guess I should be asking that question of the
> SpamAssassin guys. Also notice that this sucker would have gotten
> through with a score of only 0.4 had it not been for the bayesian filter.

BAYES_99 means that its already been found in the bayes filter, so why
would it once more autolearn it? :)


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [OT] Tom's/Marc's spam filters?

From
Gregory Wood
Date:
Marc G. Fournier wrote:
> BAYES_99 means that its already been found in the bayes filter, so why
> would it once more autolearn it? :)

To add more spam words to its vocabulary of course. Learning works both
ways...

Greg

Re: [OT] Tom's/Marc's spam filters?

From
Martijn van Oosterhout
Date:
On Tue, Apr 20, 2004 at 01:06:18AM -0400, Tom Lane wrote:
> 3. I have noticed that bouncing any machine that sends "HELO
> sss.pgh.pa.us" gets rid of a ton of spam and viruses.  I don't know of
> any real clean way to do this, but I have a sendmail.cf hack for it.

By the way, thanks very much for this tip. This almost in one hit made
a many of our spam and virus filters redundant. Very nice on the load.
I'd noticed that some perl mail modules appear to get this wrong but it
efficiently catches our customers sending viruses and spam through our
relay too.

I'm using Exim 3 so I can only pick this up after the mail has been
received but with Exim 4 I should be able to kill the email in SMTP
stage.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment