Re: Spam filtering on the mailing lists - Mailing list pgsql-www

From Greg Sabino Mullane
Subject Re: Spam filtering on the mailing lists
Date
Msg-id d7865a888f6738c980a725cbe912f8c3@biglumber.com
Whole thread Raw
In response to Re: Spam filtering on the mailing lists  ("Marc G. Fournier" <scrappy@hub.org>)
Responses Re: Spam filtering on the mailing lists  ("Marc G. Fournier" <scrappy@hub.org>)
List pgsql-www
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Its sad how this is such an ongoing problem, but this is the first that I hear
> that ppl are having problems ... looking at the message headers for a random
> few, I notice that they are scoring >4, but just below 5:
...
> I can change the quarantining to be >4 if ppl want, which should greatly reduce
> the # of messages going through ...

I think that would be a good start, but there are definitely some other problems.
First, the example you gave:

> X-Spam-Status: No, hits=4.855 tagged_above=0 required=5 tests=AWL=-1.994,
> DCC_CHECK=1.37, DIGEST_MULTIPLE=0.001, HTML_MESSAGE=0.001,
> MIME_HTML_ONLY=1.672, RAZOR2_CHECK=0.5, RCVD_IN_BL_SPAMCOP_NET=2.188,
> RCVD_IN_SORBS_WEB=1.117

A score of 0.001 for HTML_MESSAGE? Might as well not have the check at all. Same
with things like DIGEST_MULTIPLE. I think we need more checks, and much higher
scores for many of them.

I grabbed a few random messages from the bugs list last night. Most interesting
was that some had no X-Spam-Status headers at all - does this mean they slipped
through the spam filtering entirely? Here's one of them:

===
Return-Path: <owner-pgsql-bugs-postgresql.org@postgresql.org>
Delivered-To: pgsql-bugs-postgresql.org@postgresql.org
Received: from localhost (unknown [200.46.204.183])       by postgresql.org (Postfix) with ESMTP id C3148650275
for<pgsql-bugs-postgresql.org@postgresql.org>; Wed, 16 Jul 2008 15:40:45 -0300 (ADT)
 
Received: from postgresql.org ([200.46.204.86])by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port
10024)withESMTP id 48600-04-3 for <pgsql-bugs-postgresql.org@postgresql.org>;Wed, 16 Jul 2008 15:40:43 -0300 (ADT)
 
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from wwwmaster.postgresql.org (wwwmaster.postgresql.org [217.196.146.204])       by postgresql.org (Postfix)
withESMTP id AB1D565026D       for <pgsql-bugs@postgresql.org>; Wed, 16 Jul 2008 15:40:44 -0300 (ADT)
 
Received: from wwwmaster.postgresql.org (wwwmaster.postgresql.org [217.196.146.204])       by wwwmaster.postgresql.org
(8.13.8/8.13.8)with ESMTP id m6GIehuA007983       for <pgsql-bugs@postgresql.org>; Wed, 16 Jul 2008 18:40:43 GMT
(envelope-fromwww@wwwmaster.postgresql.org)
 
Received: (from www@localhost)       by wwwmaster.postgresql.org (8.13.8/8.13.8/Submit) id m6GIehIP007982;       Wed,
16Jul 2008 18:40:43 GMT       (envelope-from www)
 
Date: Wed, 16 Jul 2008 18:40:43 GMT
Message-Id: <200807161840.m6GIehIP007982@wwwmaster.postgresql.org>
To: pgsql-bugs@postgresql.org
Subject: BUG #4310: PkMERMInZQ
From: "make money on line" <makemoney@money2009.com>
Content-Type: text/plain; charset=utf-8
X-Virus-Scanned: Maia Mailguard 1.0.1


The following bug has been logged online:

Bug reference:      4310
Logged by:          make money on line
Email address:      makemoney@money2009.com
PostgreSQL version: IUrjkiPgQkQXNgo
Operating system:   aJzBuaSGetA
Description:        PkMERMInZQ
Details:

<a href=" http://www.divinecaroline.com/public/user/profile?user_id=83997
">work at home jobs 101waystoincome.com</a>

====

Did it get whitelisted because it came from our form? I still think we
should scan it  - the "make money on line" is a dead giveaway, and
when I ran a local spamassassin on it, I even found:
2.0 URIBL_BLACK            Contains an URL listed in the URIBL blacklist                           [URIs:
101waystoincome.com]


Here's another one from last night that did have a spam header. I apologize
for how long this post is getting, but I'm trying to provide some hard data:
===

Return-Path: <owner-pgsql-hackers-postgresql.org@postgresql.org>
Delivered-To: pgsql-hackers-postgresql.org@postgresql.org
Received: from localhost (unknown [200.46.204.183])       by postgresql.org (Postfix) with ESMTP id AFB3A64FD01
for<pgsql-hackers-postgresql.org@postgresql.org>; Wed, 16 Jul 2008 23:15:20 -0300 (ADT)
 
Received: from postgresql.org ([200.46.204.86])by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port
10024)withESMTP id 35883-07 for <pgsql-hackers-postgresql.org@postgresql.org>;Wed, 16 Jul 2008 23:15:11 -0300 (ADT)
 
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from smtp1web.tin.it (smtp1web.tin.it [212.216.176.195])       by postgresql.org (Postfix) with ESMTP id
8ECBB64FCE4      for <pgsql-hackers@postgresql.org>; Wed, 16 Jul 2008 23:15:17 -0300 (ADT)
 
Received: from pswm6.cp.tin.it (192.168.70.26) by smtp1web.tin.it (8.0.016.5)       id 48623AD8015C5727; Thu, 17 Jul
200803:59:43 +0200
 
Message-ID: <11b2ebe81d4.clementetajana@virgilio.it>
Date: Thu, 17 Jul 2008 02:59:41 +0100 (GMT+01:00)
From: "Tajana for(Mrs. Lucy Berg)" <clementetajana@virgilio.it>
Reply-To: cpinans@users.sourceforge.net
Subject: REMINDER NOTIFICATION
Mime-Version: 1.0
Content-Type: text/plain;charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Originating-IP: 62.163.243.54
To: undisclosed-recipients:;
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=1.806 tagged_above=0 required=5tests=SUBJ_ALL_CAPS=1.806
X-Spam-Level: *

REMINDER NOTIFICATION

This email is to notify you that your Email
Address attached to a
Ticket Number(140408) has won an Award Sum of
($500,000.00)(Five
Hundred Thousand Dollars)In an Email Sweepstakes
program held in
The Netherlands these year 2008.Please contact the
claim officer
through the below given contact information.

MR.HANSON
CHRIS.
TEL. +31-643-502-787.
FAX: +31-847-290-539.
E-mail:cpinans@aol.
nl

WINNING INFORMATIONS
Ref Number:Nl50286
lucky Numbers:
07,12,24,36,45
Batch Number:EU-175508
Ticket Number:360208

Please
forward the above stated winning information to your Claim
Agent and do
include the following,

Your Name:
Telephone Number:

Congratulations!!!

Yours Sincerely,
Mrs. Lucy Berg.
Public Relation
Officer.

===

The only spam trigger found by postgresql.org was:

X-Spam-Status: No, hits=1.806 tagged_above=0 required=5tests=SUBJ_ALL_CAPS=1.806

There are numerous triggers in the body of the email that should
have boosted the score up. Personally, I'd also like to see
SUBJ_ALL_CAPS raised to 3 or 4.

So, to reiterate, I'd like to request the following:

1) Spam filtering is run on all messages
2) The default to reject is lowered to at least 4
3) The values get raised significantly for some tests
4) More SA tests get added (are we at least cronning sa-update?)
5) If 3 and 4 are too much trouble to maintain, outsource the
filtering to someone who does have the time, or who specializes
in it (economies of scale)

I did #5 myself years ago, after getting tired of updating SA rules,
messing with DNS lookups, blacklists, etc. and now just let
maillaunder.com handle it all.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200807171149
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkh/atsACgkQvJuQZxSWSsjKKwCg4Pc0SNrYjfUZuJRQZjU6jDHR
oc0An0vTdKzfIJ3+CxQXpw7TZyWu0Tb6
=a3/E
-----END PGP SIGNATURE-----




pgsql-www by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: Spam filtering on the mailing lists
Next
From: Alvaro Herrera
Date:
Subject: Re: Spam filtering on the mailing lists