Re: Stopping link spam on the lists - Mailing list pgsql-www

From Stefan Kaltenbrunner
Subject Re: Stopping link spam on the lists
Date
Msg-id 4F85D024.6020509@kaltenbrunner.cc
Whole thread Raw
In response to Re: Stopping link spam on the lists  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Stopping link spam on the lists  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-www
On 04/08/2012 05:14 AM, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> The remaining question, in my mind, is: is there a way to reliably
>> detect that link spam is just link spam and reject it altogether in
>> Spamassassin?  If that's the case, then we could do it at that level and
>> save the work downstream.  This is something that Stefan would have to
>> answer.
> 
> FWIW, all the examples I have seen recently bore all of these traits:
> 
> * empty subject line (other than the [LISTNAME] prefix attached by our
>   own forwarding code)
> * no content to speak of except the payload link
> * To: addressed to multiple unrelated addresses

well in principle there is no reason why we cannot give more weight to
mails given that description in our inbound mail system, which would
probably push those in a relative selective way over the current
hard-inbound-reject threshold (which atm is fairly conservative given we
are still kinda finetuning the "new" system).

> 
> I'm not sure how much the last point helps, unfortunately, because a
> heck of a lot of what passes through our lists has multiple To:, and
> I doubt it's practical for the spam filter to test how many of the
> target addresses are people subscribed to the lists.  The empty subject
> would be easy to test for, but surely the spammers will figure out
> not to do that soon.
> 
> Anyway, what I've been seeing lately has all had X-pg-spam-score 3.5 or
> more, which is what made me suggest that moderating on that basis would
> improve matters.

any chance you can provide us with some pointers to these kind of mails,
I don't really have the bandwidth to follow that many lists and I don't
think I have seen one coming by on the lists I actually read regulary...

One important point to note is that only ~2% of our rejects are actually
based by heavy-style contentfiltering (based on SA and clamav) the
remaining 98% are getting dealt much earlier in the pipeline and using
much lighter weight stuff.

FWIW we actually passed approximatly ~10000 mails (excluding traffic we
get from hub.org back as bounces) back to the actual listserver on April
10th.
Out of that a total of 140 mails would have exceeded a X-Pg-Spam-Score
of 3.5(across all lists).
I have no idea whether making those "moderated by default" that would
put an enormous amount of additional burden on the moderators or not,
given I have no idea what kind of mails need to get dealt with on a
typical day.



Stefan


pgsql-www by date:

Previous
From: "Greg Sabino Mullane"
Date:
Subject: Re: Stopping link spam on the lists
Next
From: Alvaro Herrera
Date:
Subject: Re: Archives policy