Thread: Spam filters on the mailing lists

Spam filters on the mailing lists

From
"Dave Page"
Date:
Hi Marc,

For the last couple of weeks the amount of spam hitting the mailing
list moderation queues has been getting higher and higher. This
morning I've had something like 50 moderation messages to wade
through, of which only a couple were real mail.

Probably 50% - 75% of these are in 100% cyrillic text. Please modify
the spam filtering to reject these outright.

There were others which were just a single inline image. I don't see
any reason why we should allow messages llike that on the list from
non-subscribers (maybe subscribers want to send a screenshot)

There were others with basic spam indicators - words such as VPXL and
f****d (with other sexual references), and messages in almost entirely
upper-case.

In my experience even a simple spamassassin install without any
bayesian filtering should be able to knock out this kind of stuff - is
there something wrong with whats currently being run? In any case,
please take a look because this is starting to get really annoying
now.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Oracle-compatible database company


Re: Spam filters on the mailing lists

From
Devrim GÜNDÜZ
Date:
Hi,

On Tue, 2008-02-19 at 09:45 +0000, Dave Page wrote:
> This morning I've had something like 50 moderation messages to wade
> through, of which only a couple were real mail.

Yeah, I'm getting 10x spam to my @postgresql.org account, too.

Regards,
--
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- --On Tuesday, February 19, 2008 23:38:18 -0800 Devrim GÜNDÜZ
<devrim@CommandPrompt.com> wrote:

> Hi,
>
> On Tue, 2008-02-19 at 09:45 +0000, Dave Page wrote:
>> This morning I've had something like 50 moderation messages to wade
>> through, of which only a couple were real mail.
>
> Yeah, I'm getting 10x spam to my @postgresql.org account, too.

Can you check your settings?  According to what I can see here, you don't
quarantine anything on the server, you pass it all through ... and the last 12
messages through maia-4 for you have scored >8:

Feb 20 00:48:48 maia-4 amavis[8925]: (08925-02-2) Passed CLEAN,
[219.134.139.111] <websitour0220@aol.com> -> <devrim@postgresql.org>,
Message-ID: <20080220044833.75C702E0060@postgresql.org>, Hits: 21.937, 6658 ms
Feb 20 02:45:50 maia-4 amavis[38817]: (38817-02) Passed CLEAN, [116.30.182.135]
<decontaminatesbts1@health4lifeonline.com> -> <devrim@postgresql.org>,
Message-ID: <01c873cf$413cfe80$87b61e74@decontaminatesbts1>, Hits: 15.776, 3286
ms
Feb 20 04:18:28 maia-4 amavis[67040]: (67040-07) Passed CLEAN, [85.105.24.252]
<appana@vintagemotorsports.net> -> <devrim@postgresql.org>, Message-ID:
<20080220121834.4952.qmail@dsl.static8510524252.ttnet.net.tr>, Hits: 20.115,
3929 ms
Feb 20 05:03:56 maia-4 amavis[78971]: (78971-01-6) Passed CLEAN,
[203.121.23.108] <planck0@davidturner.net> -> <devrim@postgresql.org>,
Message-ID: <01c873e2$29bb4100$6c1779cb@planck0>, Hits: 8.566, 7316 ms
Feb 20 05:16:14 maia-4 amavis[81884]: (81884-03) Passed SPAM, [83.10.59.4]
<interposingv@kvsw.de> -> <br@postgresql.org>,<devrim@postgresql.org>,
Message-ID: <181160859.33113011819727@kvsw.de>, Hits: 15.869, 6694 ms
Feb 20 05:32:38 maia-4 amavis[86112]: (86112-02) Passed CLEAN, [218.174.109.18]
<groggier@corrosionsource.com> -> <devrim@postgresql.org>, Message-ID:
<6686915931.20080220092426@corrosionsource.com>, Hits: 5.783, 3091 ms
Feb 20 05:33:28 maia-4 amavis[85402]: (85402-07) Passed CLEAN, [124.236.253.86]
<webmaster@promote-biz.net> -> <devrim@postgresql.org>, Message-ID:
<20080220013329.7A388AF10DA37051@from.header.has.no.domain>, Hits: 27.431, 7074
ms
Feb 20 05:33:46 maia-4 amavis[86184]: (86184-02) Passed CLEAN, [212.34.47.210]
<fletcher@dnr.state.oh.us> -> <devrim@postgresql.org>, Message-ID:
<20080220153811.8269.qmail@potdo.sitek.net>, Hits: 20.015, 2878 ms
Feb 20 05:50:44 maia-4 amavis[90595]: (90595-02) Passed CLEAN, [81.168.178.107]
<flanne71@students.rowan.edu> -> <devrim@postgresql.org>, Message-ID:
<20080220115029.2711.qmail@xdsl-363.lodz.dialog.net.pl>, Hits: 20.524, 2450 ms
Feb 20 06:37:39 maia-4 amavis[4907]: (04907-02) Passed CLEAN, [41.211.226.242]
<antwon@raymondmississippi.com> -> <devrim@postgresql.org>, Message-ID:
<01c873b6$87c9a380$f2e2d329@antwon>, Hits: 19.11, 19743 ms
Feb 20 07:13:18 maia-4 amavis[14278]: (14278-10) Passed CLEAN, [89.37.127.2]
<undeservedll1@psadvertise.com> -> <devrim@postgresql.org>, Message-ID:
<01c873c2$57908d80$027f2559@undeservedll1>, Hits: 37.075, 5647 ms
Feb 20 07:33:44 maia-4 amavis[21107]: (21107-03) Passed CLEAN, [82.193.131.234]
<ducal8@meyer-rochow.org> -> <devrim@postgresql.org>, Message-ID:
<01c873de$53aa9000$ea83c152@ducal8>, Hits: 25.119, 11950 ms

So, I'm guessing that you are filtering this on your side, and somehow the
filtering is failing?  But it definitely looks like our end is scoring it high
...


- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHvBas4QvfyHIvDvMRAgXzAKDY2UCJSLZV5sq+2lzKgCLTFrwwUwCfaU53
SQDFVLwv7bfJomNcLQ/icZU=
=SMfj
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
Alvaro Herrera
Date:
Marc G. Fournier wrote:

> - --On Tuesday, February 19, 2008 23:38:18 -0800 Devrim GÜNDÜZ 
> <devrim@CommandPrompt.com> wrote:
> 
> > On Tue, 2008-02-19 at 09:45 +0000, Dave Page wrote:
> >> This morning I've had something like 50 moderation messages to wade
> >> through, of which only a couple were real mail.
> >
> > Yeah, I'm getting 10x spam to my @postgresql.org account, too.
> 
> Can you check your settings?  According to what I can see here, you don't 
> quarantine anything on the server, you pass it all through ... and the last 12 
> messages through maia-4 for you have scored >8:

Hmm, how does one quarantine stuff on the server?  Is this documented
somewhere?


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Spam filters on the mailing lists

From
Devrim GÜNDÜZ
Date:
Hi,

On Wed, 2008-02-20 at 08:01 -0400, Marc G. Fournier wrote:
> So, I'm guessing that you are filtering this on your side, and somehow
> the filtering is failing?  But it definitely looks like our end is
> scoring it high

I have logged into Maia web interface once in the last year, and did not
change anything actually -- if that is what you are meaning.

Regards,

--
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Login to http://webmail.postgresql.org/maia as your mailbox (ie.
scrappy@postgresql.org) and password ... go under Settings (ignore that main
page 'settings box', not sure why its there, its never worked) ... under there,
you can set threshold, whether to quarantine or just pass through, etc ...

- --On Wednesday, February 20, 2008 09:28:44 -0300 Alvaro Herrera
<alvherre@CommandPrompt.com> wrote:

> Marc G. Fournier wrote:
>
>> - --On Tuesday, February 19, 2008 23:38:18 -0800 Devrim GÜNDÜZ
>> <devrim@CommandPrompt.com> wrote:
>>
>> > On Tue, 2008-02-19 at 09:45 +0000, Dave Page wrote:
>> >> This morning I've had something like 50 moderation messages to wade
>> >> through, of which only a couple were real mail.
>> >
>> > Yeah, I'm getting 10x spam to my @postgresql.org account, too.
>>
>> Can you check your settings?  According to what I can see here, you don't
>> quarantine anything on the server, you pass it all through ... and the last
>> 12  messages through maia-4 for you have scored >8:
>
> Hmm, how does one quarantine stuff on the server?  Is this documented
> somewhere?
>
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.



- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHvB8A4QvfyHIvDvMRAtZfAKCp6fP+iWJ88PABs3Sz4+2A/fFCiACgq5m2
DTAWTvO/Vlldobm30N7873k=
=vVV5
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


According to your settings in MAIA, you have it set to something like
'Quarantine if score over 999', so even if it scores 25, its sending it to you
... I figure you set it thta wayso that you could do the filtering on your end
based on the headers?

If that isn't the case, login and check your settings ...

- --On Wednesday, February 20, 2008 04:35:10 -0800 Devrim GÜNDÜZ
<devrim@CommandPrompt.com> wrote:

> Hi,
>
> On Wed, 2008-02-20 at 08:01 -0400, Marc G. Fournier wrote:
>> So, I'm guessing that you are filtering this on your side, and somehow
>> the filtering is failing?  But it definitely looks like our end is
>> scoring it high
>
> I have logged into Maia web interface once in the last year, and did not
> change anything actually -- if that is what you are meaning.
>
> Regards,
>
> --
> Devrim GÜNDÜZ , RHCE
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> Managed Services, Shared and Dedicated Hosting
> Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/



- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHvB9b4QvfyHIvDvMRAq6HAKDh8Rk4HNyI1MowVwPpWS4E6SJlWgCgylyg
MnDx9TmT996QOQMQXZIUnDU=
=GoIi
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
Devrim GÜNDÜZ
Date:
Hi,

On Wed, 2008-02-20 at 08:38 -0400, Marc G. Fournier wrote:
> According to your settings in MAIA, you have it set to something like
> 'Quarantine if score over 999', so even if it scores 25, its sending
> it to you ...

Well, I did not change *anything* actually (so I can't explain why the #
of spams increased)-- but ok, I'll login and check Maia settings again.

Thanks Marc.

Regards,
--
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

Re: Spam filters on the mailing lists

From
Andrew Sullivan
Date:
On Wed, Feb 20, 2008 at 08:01:40AM -0400, Marc G. Fournier wrote:
> Can you check your settings? 

[stuff about devrim's account]

None of that is relevant to the problem on the lists: moderation cost is
going through the roof for me, for instance, because of the amount of spam
that oughta be obviously rejectable.  Something with no ascii characters
cannot be listmail.

A



Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- --On Thursday, February 21, 2008 10:01:12 -0500 Andrew Sullivan 
<ajs@crankycanuck.ca> wrote:

> On Wed, Feb 20, 2008 at 08:01:40AM -0400, Marc G. Fournier wrote:
>> Can you check your settings?
>
> [stuff about devrim's account]
>
> None of that is relevant to the problem on the lists: moderation cost is
> going through the roof for me, for instance, because of the amount of spam
> that oughta be obviously rejectable.  Something with no ascii characters
> cannot be listmail.

Are there any X-Spam headers being added to the messages?  Are those messsages 
scoring low?  All @postgresql.org *should* be being scored, and looking through 
MAIA, I'm finding a load of stuff quarantined, and I'm bayes training those 
that I'm finding that is marked as 'non-spam' ...

- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHviS64QvfyHIvDvMRAjH+AJ46lQwR+GxdUpvsSEqsbDgXv/t1/wCdGfv6
Jh0uxYdFgrItqH35EC31LEw=
=lasq
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
"Dave Page"
Date:
On Fri, Feb 22, 2008 at 1:26 AM, Marc G. Fournier <scrappy@hub.org> wrote:
> Are there any X-Spam headers being added to the messages?  Are those messsages
> scoring low?  All @postgresql.org *should* be being scored, and looking through
> MAIA, I'm finding a load of stuff quarantined, and I'm bayes training those
> that I'm finding that is marked as 'non-spam' ...

Here's some examples, from the actual messages attached to the
moderation messages. This one from a Canada based pharmacy email:

X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=4.366 tagged_above=0 required=5 tests=DCC_CHECK=1.37,RCVD_IN_XBL=2.896, RDNS_NONE=0.1
X-Spam-Level: ****

This one from one which is just garbage ascii text:

X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=2.996 tagged_above=0 required=5tests=RCVD_IN_XBL=2.896, RDNS_NONE=0.1
X-Spam-Level: **


This one from an all-cyrillic one:

X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=4.979 tagged_above=0 required=5 tests=DCC_CHECK=1.37,DIGEST_MULTIPLE=0.001,
FH_HELO_EQ_D_D_D_D=0.498,HTML_MESSAGE=0.001,RAZOR2_CF_RANGE_51_100=0.5, RAZOR2_CF_RANGE_E4_51_100=1.5,
RAZOR2_CHECK=0.5,RCVD_IN_PBL=0.509,RDNS_NONE=0.1
 
X-Spam-Level: ****

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Oracle-compatible database company


Re: Spam filters on the mailing lists

From
Alvaro Herrera
Date:
Marc G. Fournier wrote:

> Are there any X-Spam headers being added to the messages?  Are those
> messsages scoring low?  All @postgresql.org *should* be being scored,
> and looking through MAIA, I'm finding a load of stuff quarantined, and
> I'm bayes training those that I'm finding that is marked as 'non-spam'
> ...

They are going through Maia, but the score is low.  And I don't see
mentioning that it is graylisted.  Two different headers below:

Return-Path: owner-pgsql-hackers-postgresql.org@postgresql.org
Delivered-To: pgsql-hackers-postgresql.org@postgresql.org
Received: from localhost (unknown [200.46.204.184])       by postgresql.org (Postfix) with ESMTP id 18B962E0046
for<pgsql-hackers-postgresql.org@postgresql.org>; Tue, 19 Feb 2008 06:11:54 -0400 (AST)
 
Received: from postgresql.org ([200.46.204.71])       by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port
10024)      with ESMTP id 36871-07 for <pgsql-hackers-postgresql.org@postgresql.org>;       Tue, 19 Feb 2008 06:11:48
-0400(AST)
 
Received: from s328.xrea.com (s328.xrea.com [210.196.169.200])       by postgresql.org (Postfix) with SMTP id
B2E882E0040      for <pgsql-hackers@postgresql.org>; Tue, 19 Feb 2008 06:11:52 -0400 (AST)
 
Received: (qmail 11565 invoked by uid 10258); 19 Feb 2008 18:09:38 +0900
Date: 19 Feb 2008 18:09:38 +0900
Message-ID: <20080219090938.11564.qmail@s328.xrea.com>
X-Mailer: perl-sendmail
To: pgsql-hackers@postgresql.org
From: "office@gdi.cute.bz" <office@gdi.cute.bz>
Subject: 【重要】相互リンクのお願い
Content-Transfer-Encoding: 7bit
Content-type: text/plain; charset=iso-2022-jp
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=2.899 tagged_above=0 required=5       tests=TVD_SPACE_RATIO=2.899
X-Spam-Level: **



Return-Path: owner-pgsql-hackers-postgresql.org@postgresql.org
Delivered-To: pgsql-hackers-postgresql.org@postgresql.org
Received: from localhost (unknown [200.46.204.184])       by postgresql.org (Postfix) with ESMTP id 0EA092E0041
for<pgsql-hackers-postgresql.org@postgresql.org>; Tue, 19 Feb 2008 08:28:15 -0400 (AST)
 
Received: from postgresql.org ([200.46.204.71])       by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port
10024)      with ESMTP id 09866-08 for <pgsql-hackers-postgresql.org@postgresql.org>;       Tue, 19 Feb 2008 08:28:05
-0400(AST)
 
Received: from catv-5063165c.catv.broadband.hu (catv-5063165c.catv.broadband.hu [80.99.22.92])       by postgresql.org
(Postfix)with ESMTP id 27E802E0050       for <pgsql-hackers@postgresql.org>; Tue, 19 Feb 2008 08:28:11 -0400 (AST)
 
Message-ID: <000601c872f2$0559d236$482b8d8e@eibkm>
From: edsel celia <frerichs@qwest.net>
To: pgsql-hackers@postgresql.org
Subject: Приглашаем Вас на обучение иностранным языкам
Date: Tue, 19 Feb 2008 10:40:48 +0000
MIME-Version: 1.0
Content-Type: multipart/alternative;       boundary="----=_NextPart_000_0003_01C872F2.055923C9"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=4.168 tagged_above=0 required=5 tests=DCC_CHECK=1.37,       HTML_MESSAGE=0.001,
RCVD_IN_BL_SPAMCOP_NET=2.188,RCVD_IN_PBL=0.509,       RDNS_DYNAMIC=0.1
 
X-Spam-Level: ****


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Spam filters on the mailing lists

From
Andrew Sullivan
Date:
On Thu, Feb 21, 2008 at 09:26:18PM -0400, Marc G. Fournier wrote:
> 
> Are there any X-Spam headers being added to the messages?  Are those
> messsages scoring low?

Yes and yes.  I just got two more (casino ones, in this case) with spam
scores similar to what others have posted.

I'm wondering whether something simple and rule-based might help catch a lot
of this before we start Bayes work on them.  I mean, "Casino slots" in the
title is a pretty easy catch :)

A



Re: Spam filters on the mailing lists

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> I'm wondering whether something simple and rule-based might
> help catch a lot of this before we start Bayes work on them.
> I mean, "Casino slots" in the title is a pretty easy catch :)

+1. SpamAssassin on a very conservative setting would catch
well over 90% of the crap I have to reject-quiet in -general.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200802220942
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAke+33kACgkQvJuQZxSWSshrAQCg8FL/Go9M0I8RailTiBg4Rxt3
OQIAnAk29/PxacDjZwf1e8vQX+ao2Khq
=sETF
-----END PGP SIGNATURE-----




Re: Spam filters on the mailing lists

From
Kenneth Marshall
Date:
We currently use DSPAM to front our trouble ticket system and
have found that a statistical filter is way, way less work and
less prone to false positives than ad hoc rule systems. It also
supports our favorite backend database. It sounds good to just
say ""Casino slots" in the title is a pretty easy catch..." but
when you start having thousands of rules, it gets unmanageable
pretty quickly. My two cents on rule-based versus statistical
filtering.

Cheers,
Ken

On Fri, Feb 22, 2008 at 09:25:33AM -0500, Andrew Sullivan wrote:
> On Thu, Feb 21, 2008 at 09:26:18PM -0400, Marc G. Fournier wrote:
> > 
> > Are there any X-Spam headers being added to the messages?  Are those
> > messsages scoring low?
> 
> Yes and yes.  I just got two more (casino ones, in this case) with spam
> scores similar to what others have posted.
> 
> I'm wondering whether something simple and rule-based might help catch a lot
> of this before we start Bayes work on them.  I mean, "Casino slots" in the
> title is a pretty easy catch :)
> 
> A
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 


Re: Spam filters on the mailing lists

From
Alvaro Herrera
Date:
Greg Sabino Mullane wrote:

> > I'm wondering whether something simple and rule-based might
> > help catch a lot of this before we start Bayes work on them.
> > I mean, "Casino slots" in the title is a pretty easy catch :)
> 
> +1. SpamAssassin on a very conservative setting would catch
> well over 90% of the crap I have to reject-quiet in -general.

Graylisting would reject a lot of the crap without any extra effort,
*before* it hits the spam filter.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Spam filters on the mailing lists

From
"Joshua D. Drake"
Date:
Alvaro Herrera wrote:
> Greg Sabino Mullane wrote:
> 
>>> I'm wondering whether something simple and rule-based might
>>> help catch a lot of this before we start Bayes work on them.
>>> I mean, "Casino slots" in the title is a pretty easy catch :)
>> +1. SpamAssassin on a very conservative setting would catch
>> well over 90% of the crap I have to reject-quiet in -general.
> 
> Graylisting would reject a lot of the crap without any extra effort,
> *before* it hits the spam filter.

It is my understanding that the mailing lists do greylist.

Joshua D. Drake




Re: Spam filters on the mailing lists

From
Alvaro Herrera
Date:
Joshua D. Drake wrote:
> Alvaro Herrera wrote:

>> Graylisting would reject a lot of the crap without any extra effort,
>> *before* it hits the spam filter.
>
> It is my understanding that the mailing lists do greylist.

Do they?  I have been asking this for a week.  If they did, shouldn't
the headers I showed have a X-Graylist line or something?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Odd, someone looks to have disabled it at some point ... I've just re-enabled 
it ...

- --On Friday, February 22, 2008 13:23:18 -0300 Alvaro Herrera 
<alvherre@commandprompt.com> wrote:

> Joshua D. Drake wrote:
>> Alvaro Herrera wrote:
>
>>> Graylisting would reject a lot of the crap without any extra effort,
>>> *before* it hits the spam filter.
>>
>> It is my understanding that the mailing lists do greylist.
>
> Do they?  I have been asking this for a week.  If they did, shouldn't
> the headers I showed have a X-Graylist line or something?
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend



- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHvvhT4QvfyHIvDvMRAokHAJ9uxn1AYMUgql1lXeju19TlVJ0ovACgyI1c
yAfE/2163SEelIaoGjjZ7m0=
=zItl
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
Andrew Sullivan
Date:
On Fri, Feb 22, 2008 at 10:18:33AM -0800, Josh Berkus wrote:
> The alternative ought to let us send a moderation message straight to bayes 
> filtering.

Now _that_ would be excellent.  (The remark upthread about how piles of
rules get hard to maintain is certainly true.  I wasn't suggesting a rule
list was a good solution; just that it's at least well-known and trivial to
install.)

> Oh, and I don't think we can automatically filter non-ASCII mail; we have 
> non-English mailing lists.

But we can surely filter cyrillic email with Windows code points being sent
to -advocacy, no?  We don't have the ability to answer them on that list,
AFAIK.

A



Re: Spam filters on the mailing lists

From
Josh Berkus
Date:
> Are there any X-Spam headers being added to the messages?  Are those
> messsages scoring low?  All @postgresql.org *should* be being scored,
> and looking through MAIA, I'm finding a load of stuff quarantined, and
> I'm bayes training those that I'm finding that is marked as 'non-spam'

Maia's pretty user-hostile; I long ago gave up on its quarantine 
management.  Can we -- as a WWW team -- look hard for an alternative?

The alternative ought to let us send a moderation message straight to bayes 
filtering.

Oh, and I don't think we can automatically filter non-ASCII mail; we have 
non-English mailing lists.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- --On Friday, February 22, 2008 16:12:05 -0500 Andrew Sullivan 
<ajs@crankycanuck.ca> wrote:

> On Fri, Feb 22, 2008 at 10:18:33AM -0800, Josh Berkus wrote:
>> The alternative ought to let us send a moderation message straight to bayes
>> filtering.
>
> Now _that_ would be excellent.  (The remark upthread about how piles of
> rules get hard to maintain is certainly true.  I wasn't suggesting a rule
> list was a good solution; just that it's at least well-known and trivial to
> install.)
>
>> Oh, and I don't think we can automatically filter non-ASCII mail; we have
>> non-English mailing lists.
>
> But we can surely filter cyrillic email with Windows code points being sent
> to -advocacy, no?  We don't have the ability to answer them on that list,
> AFAIK.

If someone can come up with a perl regex for this, its easy to add to the 
access_rules to have it auto-reject before the moderation queue ... and we can 
put that on all the "english lists" ...

- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHv3+Z4QvfyHIvDvMRAuHCAKDP0INMLlV2cUyZOw1NOspSBlqKOACfd3/4
5ppVHr8cDuRVpLD4l7lcRmc=
=XwDa
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- --On Friday, February 22, 2008 10:18:33 -0800 Josh Berkus <josh@agliodbs.com> 
wrote:

>
>> Are there any X-Spam headers being added to the messages?  Are those
>> messsages scoring low?  All @postgresql.org *should* be being scored,
>> and looking through MAIA, I'm finding a load of stuff quarantined, and
>> I'm bayes training those that I'm finding that is marked as 'non-spam'
>
> Maia's pretty user-hostile; I long ago gave up on its quarantine
> management.

Opinions vary ... I login, click on 'Suspected Non-Spam', click the checkbox 
for 'Spam' for those messages that are spam, hit submit and MAIA does the rest 
... If anyone with an @postgresql.org account is interested in helping out with 
'point-n-click', let me know and I can get you setup easily enough ... I went 
through about 30k messages last night ...

- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHv4C84QvfyHIvDvMRAludAKDQnVl12VkfU/IK8uzyB7sDTbeqeQCgliYq
8aLbS1v0QeXDWpvfhlwXHe0=
=zBbh
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I suspect what happened is we were having some delays on the lists, and I 
removed greylisting to make sure it wasn't a choke point, and failed to 
re-enable it after getting things back to normal ...

That is re-enabled now ...

- --On Friday, February 22, 2008 07:51:43 -0800 "Joshua D. Drake" 
<jd@commandprompt.com> wrote:

> Alvaro Herrera wrote:
>> Greg Sabino Mullane wrote:
>>
>>>> I'm wondering whether something simple and rule-based might
>>>> help catch a lot of this before we start Bayes work on them.
>>>> I mean, "Casino slots" in the title is a pretty easy catch :)
>>> +1. SpamAssassin on a very conservative setting would catch
>>> well over 90% of the crap I have to reject-quiet in -general.
>>
>> Graylisting would reject a lot of the crap without any extra effort,
>> *before* it hits the spam filter.
>
> It is my understanding that the mailing lists do greylist.
>
> Joshua D. Drake
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings



- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHv4LW4QvfyHIvDvMRAoixAKDbv1PfaD30Lr0l+HJDJK6XU6SAjACcCsva
5lMWwh5acl5Y940SNX84TQ0=
=WExX
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- --On Friday, February 22, 2008 09:25:33 -0500 Andrew Sullivan 
<ajs@crankycanuck.ca> wrote:

> On Thu, Feb 21, 2008 at 09:26:18PM -0400, Marc G. Fournier wrote:
>>
>> Are there any X-Spam headers being added to the messages?  Are those
>> messsages scoring low?
>
> Yes and yes.  I just got two more (casino ones, in this case) with spam
> scores similar to what others have posted.
>
> I'm wondering whether something simple and rule-based might help catch a lot
> of this before we start Bayes work on them.  I mean, "Casino slots" in the
> title is a pretty easy catch :)

We actually have Majordomo setup with a list of 'taboo_subject':

configset DEFAULT taboo_headers <<ENDAAB
/^X-Spam-Status: Yes/ 10,spamassassin
/^X-Spam-Status: Yes, hits=([6-9]|\d\d+)/ 20,spamassassin
/^To: .+\@yahoogroups.com/ 1,to_header
/^Subject: Virus/i 1,virus_subject
/^Subject: Virenchecker Information/i 1,virus_subject
/^Subject: Content Violation/i 1,subject
/^Subject: WARNING! Blocked mail/i 1,subject
ENDAAB

so, we can definitely add rules to this, and, in fact, any list owner should be 
able to do that for their list (ie. non-english lists) ... if you look in 
access_rules, you will see a rule for $taboo_subject, which is defined by tthe 
1,subject part of the line ...

So, adding to that list is definitely easy enough ...


- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHv4SE4QvfyHIvDvMRArpsAKCNG4E362XmzPtJf/aoMYm9ORCfvQCgic4Z
5BQf4A4a6NsEL+MjIdswQOo=
=cHWu
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


If someone can suggest something with spamassassin I'm missing, I'm all ears
... we have RAZOR, Pyzor and DCC configured in, plus Bayes ... and I try and
upgrade to latest spamassasin shortly after each release ...

Is there something more that can be added to improve the scores?


- --On Friday, February 22, 2008 09:43:57 -0300 Alvaro Herrera
<alvherre@commandprompt.com> wrote:

> Marc G. Fournier wrote:
>
>> Are there any X-Spam headers being added to the messages?  Are those
>> messsages scoring low?  All @postgresql.org *should* be being scored,
>> and looking through MAIA, I'm finding a load of stuff quarantined, and
>> I'm bayes training those that I'm finding that is marked as 'non-spam'
>> ...
>
> They are going through Maia, but the score is low.  And I don't see
> mentioning that it is graylisted.  Two different headers below:
>
> Return-Path: owner-pgsql-hackers-postgresql.org@postgresql.org
> Delivered-To: pgsql-hackers-postgresql.org@postgresql.org
> Received: from localhost (unknown [200.46.204.184])
>         by postgresql.org (Postfix) with ESMTP id 18B962E0046
>         for <pgsql-hackers-postgresql.org@postgresql.org>; Tue, 19 Feb 2008
> 06:11:54 -0400 (AST) Received: from postgresql.org ([200.46.204.71])
>         by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)
>         with ESMTP id 36871-07 for
> <pgsql-hackers-postgresql.org@postgresql.org>;         Tue, 19 Feb 2008
> 06:11:48 -0400 (AST)
> Received: from s328.xrea.com (s328.xrea.com [210.196.169.200])
>         by postgresql.org (Postfix) with SMTP id B2E882E0040
>         for <pgsql-hackers@postgresql.org>; Tue, 19 Feb 2008 06:11:52 -0400
> (AST) Received: (qmail 11565 invoked by uid 10258); 19 Feb 2008 18:09:38 +0900
> Date: 19 Feb 2008 18:09:38 +0900
> Message-ID: <20080219090938.11564.qmail@s328.xrea.com>
> X-Mailer: perl-sendmail
> To: pgsql-hackers@postgresql.org
> From: "office@gdi.cute.bz" <office@gdi.cute.bz>
> Subject: 【重要】相互リンクのお願い
> Content-Transfer-Encoding: 7bit
> Content-type: text/plain; charset=iso-2022-jp
> X-Virus-Scanned: Maia Mailguard 1.0.1
> X-Spam-Status: No, hits=2.899 tagged_above=0 required=5
>         tests=TVD_SPACE_RATIO=2.899
> X-Spam-Level: **
>
>
>
> Return-Path: owner-pgsql-hackers-postgresql.org@postgresql.org
> Delivered-To: pgsql-hackers-postgresql.org@postgresql.org
> Received: from localhost (unknown [200.46.204.184])
>         by postgresql.org (Postfix) with ESMTP id 0EA092E0041
>         for <pgsql-hackers-postgresql.org@postgresql.org>; Tue, 19 Feb 2008
> 08:28:15 -0400 (AST) Received: from postgresql.org ([200.46.204.71])
>         by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)
>         with ESMTP id 09866-08 for
> <pgsql-hackers-postgresql.org@postgresql.org>;         Tue, 19 Feb 2008
> 08:28:05 -0400 (AST)
> Received: from catv-5063165c.catv.broadband.hu
> (catv-5063165c.catv.broadband.hu [80.99.22.92])         by postgresql.org
> (Postfix) with ESMTP id 27E802E0050
>         for <pgsql-hackers@postgresql.org>; Tue, 19 Feb 2008 08:28:11 -0400
> (AST) Message-ID: <000601c872f2$0559d236$482b8d8e@eibkm>
> From: edsel celia <frerichs@qwest.net>
> To: pgsql-hackers@postgresql.org
> Subject: Приглашаем Вас на обучение
> иностранным языкам Date: Tue, 19 Feb 2008 10:40:48 +0000
> MIME-Version: 1.0
> Content-Type: multipart/alternative;
>         boundary="----=_NextPart_000_0003_01C872F2.055923C9"
> X-Priority: 3
> X-MSMail-Priority: Normal
> X-Mailer: Microsoft Outlook Express 6.00.2900.3138
> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
> X-Virus-Scanned: Maia Mailguard 1.0.1
> X-Spam-Status: No, hits=4.168 tagged_above=0 required=5 tests=DCC_CHECK=1.37,
>         HTML_MESSAGE=0.001, RCVD_IN_BL_SPAMCOP_NET=2.188, RCVD_IN_PBL=0.509,
>         RDNS_DYNAMIC=0.1
> X-Spam-Level: ****
>
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support



- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHv4U94QvfyHIvDvMRAojAAJ4wTrRTKu5Rkdre8CpGb8dk10jbxQCeLMoF
LxkOEajcTqopC8RXCXTLg/o=
=9IYR
-----END PGP SIGNATURE-----



Re: Spam filters on the mailing lists

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 22 Feb 2008 22:30:21 -0400
"Marc G. Fournier" <scrappy@hub.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> If someone can suggest something with spamassassin I'm missing, I'm
> all ears ... we have RAZOR, Pyzor and DCC configured in, plus
> Bayes ... and I try and upgrade to latest spamassasin shortly after
> each release ...
> 
> Is there something more that can be added to improve the scores?

Actually :)
http://ist.uwaterloo.ca/~dkeenan/talks/spamassassin/config.html

Spamassassin has the ability to specify which locales are allowed to
send email to us.

I think that we could pick quite a few that are obvious not allowed in
our various realms.

Joshua D. Drake


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHv4qEATb/zqfZUUQRAqg+AJ0Y8ohKHhQlZQZ0iTb0LCIVhbvCcwCfWUHi
ml8eaC1xpJ6mnR4aIYb39oQ=
=VBm2
-----END PGP SIGNATURE-----

Re: Spam filters on the mailing lists

From
"Dave Page"
Date:
On Sat, Feb 23, 2008 at 2:20 AM, Marc G. Fournier <scrappy@hub.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
>  Hash: SHA1
>
>
>  I suspect what happened is we were having some delays on the lists, and I
>  removed greylisting to make sure it wasn't a choke point, and failed to
>  re-enable it after getting things back to normal ...
>
>  That is re-enabled now ...

And for the last few days the spam levels in the moderation queues has
dropped massively.

Thanks :-)

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Oracle-compatible database company


Re: Spam filters on the mailing lists

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I've been going in nightly and adding more to the Bayes database to help 
improve the scoring ... I'm not sure how much weight, overall, Bayes adds, but 
it can't hurt :)


- --On Tuesday, February 26, 2008 09:15:35 +0000 Dave Page <dpage@pgadmin.org> 
wrote:

> On Sat, Feb 23, 2008 at 2:20 AM, Marc G. Fournier <scrappy@hub.org> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>>  Hash: SHA1
>>
>>
>>  I suspect what happened is we were having some delays on the lists, and I
>>  removed greylisting to make sure it wasn't a choke point, and failed to
>>  re-enable it after getting things back to normal ...
>>
>>  That is re-enabled now ...
>
> And for the last few days the spam levels in the moderation queues has
> dropped massively.
>
> Thanks :-)
>
> --
> Dave Page
> EnterpriseDB UK: http://www.enterprisedb.com
> The Oracle-compatible database company



- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD4DBQFHxMdP4QvfyHIvDvMRApzyAJ46hDgRzmQAMcNR3hXuCca0+LaJmQCXUmQa
4EnZiqz17aBrJ7zXGVHwcQ==
=slWa
-----END PGP SIGNATURE-----