Thread: Tom Lane heads up
Just dropping a quick not for Tom Lane. I sent a personal message today, but I wasn't sure if you'd get it after I remembered all of the spam filters you've got set up. Sorry for the off topic post.
DeJuan Jackson wrote: > Just dropping a quick not for Tom Lane. I sent a personal message > today, but I wasn't sure if you'd get it after I remembered all of the > spam filters you've got set up. > > Sorry for the off topic post. That's ok. He is only filtering me :-) Actually, you get a rejection notice if his spam filters catch you. If you didn't get one, your'e ok. Shachar -- Shachar Shemesh Lingnu Open Systems Consulting http://www.lingnu.com/
On Fri, Mar 05, 2004 at 01:19:13PM +0200, Shachar Shemesh wrote: > DeJuan Jackson wrote: > >I sent a personal message today, but I wasn't sure if you'd > >get it after I remembered all of the spam filters you've got > >set up. > > Actually, you get a rejection notice if his spam filters catch > you. If you didn't get one, you're ok. is there some way of getting a look at tom's or marc's filters? i could sure use a bit of help there. lordy, we're close to drowing in the stuff! if it's in the archives already, i apparently didn't hit the right search string. a quickie pointer is all i need... thanks in advance! -- "Why did they hard code that value into the program?". "My only guess would be to maximize suckage." http://suso.suso.org/docs/apache_and_frontpage/htmldocs/part4-2.phtml
On Mon, 19 Apr 2004, Will Trillich wrote: > On Fri, Mar 05, 2004 at 01:19:13PM +0200, Shachar Shemesh wrote: > > DeJuan Jackson wrote: > > >I sent a personal message today, but I wasn't sure if you'd > > >get it after I remembered all of the spam filters you've got > > >set up. > > > > Actually, you get a rejection notice if his spam filters catch > > you. If you didn't get one, you're ok. > > is there some way of getting a look at tom's or marc's filters? i could > sure use a bit of help there. lordy, we're close to drowing in the > stuff! Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all enabled ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Marc G. Fournier wrote: > Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all > enabled ... I use exactly the same setup. But recently I've noticed that the spammers are getting smarter -- I think 20% of it is slipping by the filters. I'm going to need something better. Joe
Joe Conway wrote: > Marc G. Fournier wrote: > > Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all > > enabled ... > > I use exactly the same setup. But recently I've noticed that the > spammers are getting smarter -- I think 20% of it is slipping by the > filters. I'm going to need something better. Here is what I use: http://candle.pha.pa.us/main/writings/spam/ I get 98% blockage with no false positives, or at least only 1-2 a year (that folks tell me about). :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
When grilled further on (Mon, 19 Apr 2004 21:19:05 -0700), Joe Conway <mail@joeconway.com> confessed: > Marc G. Fournier wrote: > > Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all > > enabled ... > > I use exactly the same setup. But recently I've noticed that the > spammers are getting smarter -- I think 20% of it is slipping by the > filters. I'm going to need something better. > Have you played with the "spamassassin --report" feature? Works fairly well if you can integrate it into your e-mail client and report a bunch of messages as spam. It trains the Bayes filter and reports to Razor (at the least). Sylpheed Claws has actions (you use "spamassassin--report %F" as the action), and it'll batch the report on all selected messages. I find that after a 10-20 messages, it starts finding the ones that were slipping through. Since February, I have 200 missed out of 4200. Cheers, Rob -- 22:28:27 up 3 days, 2:06, 3 users, load average: 3.24, 3.08, 3.45 Linux 2.6.5-01 #5 SMP Tue Apr 6 21:32:39 MDT 2004
Attachment
Will Trillich <will@serensoft.com> writes: > is there some way of getting a look at tom's or marc's filters? i could > sure use a bit of help there. lordy, we're close to drowing in the > stuff! Tell me about it :-( I currently use four levels of filtering: 1. DNSBL lists: blackholes.five-ten-sg.com, bl.spamcop.net, relays.ordb.org (there are others out there, but these seem to have a good impedance match to my personal spam load). 2. Private blacklist of IP ranges that have sent me too much spam. sendmail has a pretty easy mechanism to support this, although it only seems to support /8 /16 or /24 ranges which is a bit coarse. (If you've gotten a "Go away spammer" bounce from me, you were caught by this filter --- let me know and I'll tighten the ranges.) 3. I have noticed that bouncing any machine that sends "HELO sss.pgh.pa.us" gets rid of a ton of spam and viruses. I don't know of any real clean way to do this, but I have a sendmail.cf hack for it. 4. Very long list of procmail filters on header and body patterns. #2 and #4 are fairly personal, in the sense that they have a decent success/failure ratio for the junk mail I get. I wouldn't recommend that someone else try my lists, and in any case they take a heck of a lot of hand maintenance. I've been looking into more automated methods such as CRM114 but haven't made the jump yet. regards, tom lane
Tom Lane said: > Will Trillich <will@serensoft.com> writes: > > is there some way of getting a look at tom's or marc's filters? i could > > sure use a bit of help there. lordy, we're close to drowing in the > > stuff! > > Tell me about it :-( > > I currently use four levels of filtering: > > 1. DNSBL lists: blackholes.five-ten-sg.com, bl.spamcop.net, relays.ordb.org > (there are others out there, but these seem to have a good impedance > match to my personal spam load). > > 2. Private blacklist of IP ranges that have sent me too much spam. > sendmail has a pretty easy mechanism to support this, although it > only seems to support /8 /16 or /24 ranges which is a bit coarse. > (If you've gotten a "Go away spammer" bounce from me, you were caught > by this filter --- let me know and I'll tighten the ranges.) > > 3. I have noticed that bouncing any machine that sends "HELO > sss.pgh.pa.us" gets rid of a ton of spam and viruses. I don't know of > any real clean way to do this, but I have a sendmail.cf hack for it. > > 4. Very long list of procmail filters on header and body patterns. > > #2 and #4 are fairly personal, in the sense that they have a decent > success/failure ratio for the junk mail I get. I wouldn't recommend > that someone else try my lists, and in any case they take a heck of a > lot of hand maintenance. I've been looking into more automated methods > such as CRM114 but haven't made the jump yet. Yes they sure are. I tried my personal blacklist on a client's server one time after they complained of seeing dozens a minute slipping by. It did just about nothing, but it got them started on their own. #3 looks interesting though... Best regards, Jim Wilson
Quoting Tom Lane <tgl@sss.pgh.pa.us>: > Will Trillich <will@serensoft.com> writes: > > is there some way of getting a look at tom's or marc's filters? i could > > sure use a bit of help there. lordy, we're close to drowing in the > > stuff! > > Tell me about it :-( > > I currently use four levels of filtering: > > 1. DNSBL lists: blackholes.five-ten-sg.com, bl.spamcop.net, relays.ordb.org > (there are others out there, but these seem to have a good impedance > match to my personal spam load). > > 2. Private blacklist of IP ranges that have sent me too much spam. > sendmail has a pretty easy mechanism to support this, although it > only seems to support /8 /16 or /24 ranges which is a bit coarse. > (If you've gotten a "Go away spammer" bounce from me, you were caught > by this filter --- let me know and I'll tighten the ranges.) There is a sendmail script for this called "cidrexpand" that allows you to put in CIDR blocks- i.e. things like 216.185.96.0/19 can be put into the sendmail access file. <--stuff deleted--> -- Keith C. Perry, MS E.E. Director of Networks & Applications VCSN, Inc. http://vcsn.com ____________________________________ This email account is being host by: VCSN, Inc : http://vcsn.com
On Tue, Apr 20, 2004 at 01:06:18AM -0400, Tom Lane wrote: > Will Trillich <will@serensoft.com> writes: > > is there some way of getting a look at tom's or marc's filters? i could > > sure use a bit of help there. lordy, we're close to drowing in the > > stuff! > > Tell me about it :-( > > I currently use four levels of filtering: > > 1. DNSBL lists: blackholes.five-ten-sg.com, bl.spamcop.net, relays.ordb.org > (there are others out there, but these seem to have a good impedance > match to my personal spam load). > > 2. Private blacklist of IP ranges that have sent me too much spam. > sendmail has a pretty easy mechanism to support this, although it > only seems to support /8 /16 or /24 ranges which is a bit coarse. > (If you've gotten a "Go away spammer" bounce from me, you were caught > by this filter --- let me know and I'll tighten the ranges.) > > 3. I have noticed that bouncing any machine that sends "HELO > sss.pgh.pa.us" gets rid of a ton of spam and viruses. I don't know of > any real clean way to do this, but I have a sendmail.cf hack for it. > > 4. Very long list of procmail filters on header and body patterns. It must be pretty difficult maintain these header and body patterns and the others lists. I had same problem and I resolve if by "spamassassin", it knows learn and it's more simple than procmailrc coding. Now I have cca 5% of all spams in my INBOX. Karel -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/
On Mon, 19 Apr 2004, Joe Conway wrote: > Marc G. Fournier wrote: > > Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all > > enabled ... > > I use exactly the same setup. But recently I've noticed that the > spammers are getting smarter -- I think 20% of it is slipping by the > filters. I'm going to need something better. do you force learn those spam that get through the cracks? I get about 20 or 30 messages that slip through the cracks, which I process through with sa-learn nightly ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Karel Zak wrote: > It must be pretty difficult maintain these header and body patterns > and the others lists. I had same problem and I resolve if by > "spamassassin", it knows learn and it's more simple than procmailrc > coding. Now I have cca 5% of all spams in my INBOX. It's not that difficult here but I'm using Postfix, which has built in pattern checking. Because my mail server also hosts a bunch of topical internet mailing lists (mainly motorcycle and bass player stuff) and all of their admin addresses were harvested by spammers long ago, I don't just get one copy of spam. I usually get several because each of those admin addresses eventually alias back to me. I don't use SpamAssassin or Razor but I manage to kill 95% of spam at the SMTP stage, before the message is accepted for delivery. This works better than a delivery stage mail processor like procmail because it bounces the spam back to the server actually sending it. It's easy to see from the maillogs what IPs are regularly sending me this crap so they can be blackholed permanently. I think I've got most of CHINANET in the bit bucket now <g>.
On Tue, Apr 20, 2004 at 05:35:51AM -0000 I heard the voice of Jim Wilson, and lo! it spake thus: > Tom Lane said: > > > > 3. I have noticed that bouncing any machine that sends "HELO > > sss.pgh.pa.us" gets rid of a ton of spam and viruses. I don't know of > > any real clean way to do this, but I have a sendmail.cf hack for it. > > #3 looks interesting though... I've been blocking HELO as anything under my domain, as well as my IP address (as well as any bare IP addresses) for a while, and it certainly drops a fair bit. And I maintain a long list of HELO names, AND IP ranges, AND sending hostnames, AND senders domains, plus all the filtering I do after accepting the mail... Wacky. If we just renamed 'spam' to 'justifiable homicide'... -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ "The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet"
Marc G. Fournier wrote: > do you force learn those spam that get through the cracks? I get about 20 > or 30 messages that slip through the cracks, which I process through with > sa-learn nightly ... No, I haven't been doing that, but I guess I ought to start. Thanks for the suggestion! Joe
On Tue, 20 Apr 2004, Joe Conway wrote: > Marc G. Fournier wrote: > > do you force learn those spam that get through the cracks? I get about 20 > > or 30 messages that slip through the cracks, which I process through with > > sa-learn nightly ... > > No, I haven't been doing that, but I guess I ought to start. Thanks for > the suggestion! Also check to make sure that you don't have autolearn disabled ... you would have had to do it manually, as it is enabled by default, but, for instance, if you are a user on a system, the site-wide may be set to disable autolearn, so you'd have to enable it yourself ... I'm looking forward to 3.x coming out, as the Bayes stuff will be able to run out of an SQL database instead of flat files ... so servers running Cyrus IMAPd, where there are no physical user accounts, will be able to start makng use of Bayes as well ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Quoting "Matthew D. Fuller" <fullermd@over-yonder.net>: > On Tue, Apr 20, 2004 at 05:35:51AM -0000 I heard the voice of > Jim Wilson, and lo! it spake thus: > > Tom Lane said: > > > > > > 3. I have noticed that bouncing any machine that sends "HELO > > > sss.pgh.pa.us" gets rid of a ton of spam and viruses. I don't know of > > > any real clean way to do this, but I have a sendmail.cf hack for it. > > > > #3 looks interesting though... > > I've been blocking HELO as anything under my domain, as well as my IP > address (as well as any bare IP addresses) for a while, and it > certainly drops a fair bit. And I maintain a long list of HELO names, > AND IP ranges, AND sending hostnames, AND senders domains, plus all > the filtering I do after accepting the mail... Wacky. If we just > renamed 'spam' to 'justifiable homicide'... > > > -- > Matthew Fuller (MF4839) | fullermd@over-yonder.net > Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ > > "The only reason I'm burning my candle at both ends, is because I > haven't figured out how to light the middle yet" > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > We could only wish for "justifiable homicide". Now there's a law I would support! :) Are you guys miltering to drop the messages with those HELO patterns? I'm nailing 80%+ across all my clients and I may get 20 to 50 spams/day (down from 200+) which is acceptable but I was going to start using some netfilter hooks (i.e. Linux firewall code) to inspect mail traffic and apply some more patterns. If you guys are getting 95%+ via miltering then thats definitely the way to go. -- Keith C. Perry, MS E.E. Director of Networks & Applications VCSN, Inc. http://vcsn.com ____________________________________ This email account is being host by: VCSN, Inc : http://vcsn.com
Tom Lane <tgl@sss.pgh.pa.us> wrote: > [snip] > > 3. I have noticed that bouncing any machine that sends "HELO > sss.pgh.pa.us" gets rid of a ton of spam and viruses. IOW: Anything the HELOs with your mail server's own hostname. If you can do it: Changing that to anything that HELOs with your domain name (that's not supposed to) and you'll catch still more. Add to that anything HELOing with your mail server's IP address and you'll catch more yet. > I don't know of > any real clean way to do this, but I have a sendmail.cf hack for it. [snip] Postfix, which is what I use, has built-in support for HELO checks. -- Jim Seymour | Spammers sue anti-spammers: jseymour@LinxNet.com | http://www.LinxNet.com/misc/spam/slapp.php http://jimsun.LinxNet.com | Please donate to the SpamCon Legal Fund: | http://www.spamcon.org/legalfund/
On Tue, Apr 20, 2004 at 10:17:05AM -0300, Marc G. Fournier wrote: > On Mon, 19 Apr 2004, Joe Conway wrote: > > > Marc G. Fournier wrote: > > > Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all > > > enabled ... > > > > I use exactly the same setup. But recently I've noticed that the > > spammers are getting smarter -- I think 20% of it is slipping by the > > filters. I'm going to need something better. > > do you force learn those spam that get through the cracks? I get about 20 > or 30 messages that slip through the cracks, which I process through with > sa-learn nightly ... i have been doing that some -- but i still get about 200 false negatives per day. takes too much time to run 'sa-learn' all the time when it seems like spam #n is an awful lot like spam #n-1. -- "Why did they hard code that value into the program?". "My only guess would be to maximize suckage." http://suso.suso.org/docs/apache_and_frontpage/htmldocs/part4-2.phtml
On Tue, 20 Apr 2004, Will Trillich wrote: > On Tue, Apr 20, 2004 at 10:17:05AM -0300, Marc G. Fournier wrote: > > On Mon, 19 Apr 2004, Joe Conway wrote: > > > > > Marc G. Fournier wrote: > > > > Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all > > > > enabled ... > > > > > > I use exactly the same setup. But recently I've noticed that the > > > spammers are getting smarter -- I think 20% of it is slipping by the > > > filters. I'm going to need something better. > > > > do you force learn those spam that get through the cracks? I get about 20 > > or 30 messages that slip through the cracks, which I process through with > > sa-learn nightly ... > > i have been doing that some -- but i still get about 200 false > negatives per day. takes too much time to run 'sa-learn' all the > time when it seems like spam #n is an awful lot like spam #n-1. I'm down to ~20 false positives right now ... usually spent my last half hour in front of the tv at night sorting them out and filtering them through bayes ... My spam filters right now are picking up between 2000->3000 messages per day which aren't getting into my main folders ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Mon, Apr 19, 2004 at 09:19:05PM -0700, Joe Conway wrote: > Marc G. Fournier wrote: > >Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all > >enabled ... > > I use exactly the same setup. But recently I've noticed that the > spammers are getting smarter -- I think 20% of it is slipping by the > filters. I'm going to need something better. No offense, but that means you're not doing it right. I use SA with Bayes (and everything else), and I'm getting better than 98% with no false positives. Yesterday I had 823 spams (you read that correctly) with 9 that made it through. When I woke up this morning, I had 334 spams with 2 that made it through. I constantly train my Bayesian filter by using an email address I set up where I forward all false-negatives. So the few that get through won't be doing that again. It simply runs them through sa-learn. If I get some time, I'll post the code to my web site. Spammers cannot outsmart a Bayesian filter. It's game-over. You don't need to upgrade, you need to figure out how to make your current setup work. Make sure you have the latest SA and make sure that Bayesian filtering is turned on and working, and make sure to train the filter. Reply to me offlist if you need a group of 5000 or so spams to help train it. Michael -- Michael Darrin Chaney mdchaney@michaelchaney.com http://www.michaelchaney.com/
On Tue, Apr 20, 2004 at 01:30:59PM -0300, Marc G. Fournier wrote: > Also check to make sure that you don't have autolearn disabled ... you > would have had to do it manually, as it is enabled by default, but, for > instance, if you are a user on a system, the site-wide may be set to > disable autolearn, so you'd have to enable it yourself ... > > I'm looking forward to 3.x coming out, as the Bayes stuff will be able to > run out of an SQL database instead of flat files ... so servers running > Cyrus IMAPd, where there are no physical user accounts, will be able to > start makng use of Bayes as well ... You should look into MailScanner, at www.mailscanner.info. I use it as the framework for running SA and anti-virus software, using Exim as my mail server. There are no physical user accounts; all virtual stuff. MailScanner let's SA, along with the Bayesian filter, work for all email coming through. Michael -- Michael Darrin Chaney mdchaney@michaelchaney.com http://www.michaelchaney.com/
On Wed, 21 Apr 2004, Michael Chaney wrote: > On Tue, Apr 20, 2004 at 01:30:59PM -0300, Marc G. Fournier wrote: > > Also check to make sure that you don't have autolearn disabled ... you > > would have had to do it manually, as it is enabled by default, but, for > > instance, if you are a user on a system, the site-wide may be set to > > disable autolearn, so you'd have to enable it yourself ... > > > > I'm looking forward to 3.x coming out, as the Bayes stuff will be able to > > run out of an SQL database instead of flat files ... so servers running > > Cyrus IMAPd, where there are no physical user accounts, will be able to > > start makng use of Bayes as well ... > > You should look into MailScanner, at www.mailscanner.info. I use it as > the framework for running SA and anti-virus software, using Exim as my > mail server. There are no physical user accounts; all virtual stuff. > MailScanner let's SA, along with the Bayesian filter, work for all email > coming through. Does it allow for per user preferences? I haven't found a clean way to do that yet, other using using the spamcheck.py lmtpproxy ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Wed, Apr 21, 2004 at 02:11:16PM -0300, Marc G. Fournier wrote: > > You should look into MailScanner, at www.mailscanner.info. I use it as > > the framework for running SA and anti-virus software, using Exim as my > > mail server. There are no physical user accounts; all virtual stuff. > > MailScanner let's SA, along with the Bayesian filter, work for all email > > coming through. > > Does it allow for per user preferences? I haven't found a clean way to do > that yet, other using using the spamcheck.py lmtpproxy ... Yes, MailScanner allows per-user and per-domain preferences. Michael -- Michael Darrin Chaney mdchaney@michaelchaney.com http://www.michaelchaney.com/
Michael Chaney wrote: > Make sure you have the latest SA and make sure that Bayesian filtering > is turned on and working, and make sure to train the filter. Reply to > me offlist if you need a group of 5000 or so spams to help train it. I've got the latest SA and I'm using Bayesian filtering, autolearn, razor2, dcc, and pyzor. I'm also using relays.ordb.org, sbl.spamhaus.org, bl.spamcop.net, and blackholes.five-ten-sg.com (although I just added that last one yesterday). I've verified that autolearn is working. I have my threshold set downward, from the default of 5.0, to 2.5. I get a comparible amount of spam (~600 to 1000 per day) and my setup *was* about 98% effective until a month or so ago. These days it is more like 80%. I've noticed many of the spam getting through appears specifically targeted at getting by SA -- no HTML, a paragraph of nonsense (or sometimes out of some public domain book), and a one liner trying to sell me a mortgage or something. The one thing I had *not* been doing, but started to do as of last night, is to use the false-negatives to explicitly train the Bayesian filter. It was easy enough to set up. I created an hourly cron job as follows: /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox Now I just drop all false negatives into that mailbox, and clean them out periodically. Hopefully that will make a significant improvement. Joe
Joe Conway wrote: > I get a comparible amount of spam (~600 to 1000 per day) and my setup > *was* about 98% effective until a month or so ago. These days it is more > like 80%. I've noticed many of the spam getting through appears > specifically targeted at getting by SA -- no HTML, a paragraph of > nonsense (or sometimes out of some public domain book), and a one liner > trying to sell me a mortgage or something. > > The one thing I had *not* been doing, but started to do as of last > night, is to use the false-negatives to explicitly train the Bayesian > filter. It was easy enough to set up. I created an hourly cron job as > follows: > > /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox > > Now I just drop all false negatives into that mailbox, and clean them > out periodically. Hopefully that will make a significant improvement. I can tell you it certainly will. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Wed, 21 Apr 2004, Joe Conway wrote: > The one thing I had *not* been doing, but started to do as of last > night, is to use the false-negatives to explicitly train the Bayesian > filter. It was easy enough to set up. I created an hourly cron job as > follows: > > /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox > > Now I just drop all false negatives into that mailbox, and clean them > out periodically. Hopefully that will make a significant improvement. This, for me, has made the big difference, since the false-negatives don't get autolearned :( ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Marc G. Fournier wrote: > On Wed, 21 Apr 2004, Joe Conway wrote: >> /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox >> >>Now I just drop all false negatives into that mailbox, and clean them >>out periodically. Hopefully that will make a significant improvement. > > This, for me, has made the big difference, since the false-negatives don't > get autolearned :( Actually, even much of what does (correctly) get marked as spam, ends up with autolearn=no, because it seems SpamAssassin is somewhat conservative with autolearning. I just sent this off list to Michael Chaney: --------------------------------------------------------------------- I've noticed that the threshold for autolearn seems too high, i.e. a high proportion of email correctly marked as spam, has autolearn=no. Here's an example: X-Spam-Status: Yes, hits=3.7 required=2.5 tests=BAYES_44,HTML_FONT_INVISIBLE, HTML_IMAGE_ONLY_04, HTML_MESSAGE,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY, MIME_HTML_ONLY_MULTI autolearn=no version=2.63 Now in /etc/mail/spamassassin/local.cf I have this setting: # Enable Bayes auto-learning auto_learn 1 bayes_auto_learn_threshold_spam 6 From the SA docs, I get the impression that autolearn cannot be made more aggressive. So in order to counteract that, I just made an additional change -- I put in a mail filter rule that automatically forwards any mail marked as spam, but with autolearn=no, to false-neg.mbox. This should help too, I think. Joe
Bruce Momjian wrote: > Joe Conway wrote: > >>The one thing I had *not* been doing, but started to do as of last >>night, is to use the false-negatives to explicitly train the Bayesian >>filter. It was easy enough to set up. I created an hourly cron job as >>follows: >> >> /usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox >> >>Now I just drop all false negatives into that mailbox, and clean them >>out periodically. Hopefully that will make a significant improvement. > > I can tell you it certainly will. Doesn't sa-learn also require you to teach it Ham as well? My problem has been that sa-learn appears to ignore white-listed emails and therefore can't learn from 90% of my Ham. Meanwhile, I get spam that slips through SA that my Mozilla client *correctly* identifies as Junk. Once a week, I take that Junk email, along with all Ham and run sa-learn with the appropriate --spam/--ham switch. But it doesn't seem to be improving. I still get spam which SA fails to identify but which, 95% of the time, Mozilla correctly identifies. Mike Mascari
Joe Conway <mail@joeconway.com> wrote: > [snip] > > The one thing I had *not* been doing, but started to do as of last > night, is to use the false-negatives to explicitly train the Bayesian > filter. [snip] As you've discovered, the hard way, one must constantly train Bayesian filters. This means that every false positive has to be fed back through it with whatever means your version uses to tell it "No, this was *not* spam," and every false negative, the converse. -- Jim Seymour | Spammers sue anti-spammers: jseymour@LinxNet.com | http://www.LinxNet.com/misc/spam/slapp.php http://jimsun.LinxNet.com | Please donate to the SpamCon Legal Fund: | http://www.spamcon.org/legalfund/
On Wed, 21 Apr 2004, Mike Mascari wrote: > Doesn't sa-learn also require you to teach it Ham as well? My problem > has been that sa-learn appears to ignore white-listed emails and > therefore can't learn from 90% of my Ham. Meanwhile, I get spam that > slips through SA that my Mozilla client *correctly* identifies as Junk. > Once a week, I take that Junk email, along with all Ham and run sa-learn > with the appropriate --spam/--ham switch. But it doesn't seem to be > improving. I still get spam which SA fails to identify but which, 95% of > the time, Mozilla correctly identifies. I'm finding it gets better over time ... a few always slip through the crack, but not near as many today as yesterday ... as for Ham, I have a mailbox that I save all my 'Answered Emails' to (from friends, lists, etc) that I periodically run through as --ham ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
At 10:08 AM 4/20/2004 +0200, Karel Zak wrote: > > > > 4. Very long list of procmail filters on header and body patterns. > > It must be pretty difficult maintain these header and body patterns > and the others lists. I had same problem and I resolve if by > "spamassassin", it knows learn and it's more simple than procmailrc > coding. Now I have cca 5% of all spams in my INBOX. My spam:ham ratio is about 98:2 (98% spam), excluding mailing lists. So far its manageable though rather annoying - fortunately in my situation I can regard as spam emails that are in html (or have HTML) and not in my whitelist. That gets rid of about 50% of the spam, the other 40% or so get filtered via another simple filter. My situation=I don't really have to answer messages to my personal email account from ignorant strangers that send me html email. Your situation may be different. So far I haven't seen any html emails that were really worth reading, even the one or two from relatives (who I white-list to not be rude ;) ). I go through that folder once in a while and it works for me - so far I don't recall having HTML emails from strangers that weren't spam. I've had plain text messages from silly strangers (and a silly colleague) that used lots of !!!! and stupid subject lines - actual content barely worth replying to. e.g. Help!!!!! Situation is different at work. But company pays for antispam software. Ironically while we sell Sophos Puremessage (which seems to be pretty good), it's for larger companies/orgs than us (>1000 users). ;). The backup MX thing is not very useful in most cases. Seems similar for DNS - doesn't appear that useful to have your names resolvable while your site is unreachable. OK the error messages may be slightly less embarassing? Regards, Link.
Marc G. Fournier wrote: > On Mon, 19 Apr 2004, Joe Conway wrote: >>Marc G. Fournier wrote: >>>Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all >>>enabled ... >> >>I use exactly the same setup. But recently I've noticed that the >>spammers are getting smarter -- I think 20% of it is slipping by the >>filters. I'm going to need something better. > > do you force learn those spam that get through the cracks? I get about 20 > or 30 messages that slip through the cracks, which I process through with > sa-learn nightly ... Sorry to drag this OT thread on even longer, but it seems to be a topic many are interested in ;-) I wanted to report back that after just 2 days of forced (supervised) learning, the bayesian filter is now nailing about 99% of all spam. *Many, many, thanks* for the suggestion. But I wonder why the autolearn feature is so conservative? At this point I'm getting lots of stuff like this: X-Spam-Status: Yes, hits=5.8 required=2.5 tests=BAYES_99,HTML_FONT_BIG, HTML_MESSAGE autolearn=no version=2.63 X-Spam-Report: * 0.1 HTML_MESSAGE BODY: HTML included in message * 0.3 HTML_FONT_BIG BODY: HTML has a big font * 5.4 BAYES_99 BODY: Bayesian spam probability is 99 to 100% * [score: 1.0000] Notice that, even though I get a hit on BAYES_99, I still get autolearn=no. Ah well, I guess I should be asking that question of the SpamAssassin guys. Also notice that this sucker would have gotten through with a score of only 0.4 had it not been for the bayesian filter. Again, thanks. Joe
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, - -- Joe Conway <mail@joeconway.com> wrote: > I use exactly the same setup. But recently I've noticed that the spammers > are getting smarter -- I think 20% of it is slipping by the filters. I'm > going to need something better. I recently rebuild by bayes database because it was corrupted; feeded it with about 1000 low-point-spam and nowabout two spams slipping by the filter in one day while 200 to 300 are catched. Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** 5 Jahre Blaster: http://www.a-blast.de/ | http://www.a-blast.de/statistik/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQFAijSAOndlH63J86wRAnQCAJ0SiuIkCu9iRKBXk9XY0IKE0glgFgCdHJl0 KVN3aQfw34S+IWokGX60OFA= =hkKo -----END PGP SIGNATURE-----
On Fri, 23 Apr 2004, Joe Conway wrote: > Marc G. Fournier wrote: > > On Mon, 19 Apr 2004, Joe Conway wrote: > >>Marc G. Fournier wrote: > >>>Huh? I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all > >>>enabled ... > >> > >>I use exactly the same setup. But recently I've noticed that the > >>spammers are getting smarter -- I think 20% of it is slipping by the > >>filters. I'm going to need something better. > > > > do you force learn those spam that get through the cracks? I get about 20 > > or 30 messages that slip through the cracks, which I process through with > > sa-learn nightly ... > > Sorry to drag this OT thread on even longer, but it seems to be a topic > many are interested in ;-) > > I wanted to report back that after just 2 days of forced (supervised) > learning, the bayesian filter is now nailing about 99% of all spam. > *Many, many, thanks* for the suggestion. > > But I wonder why the autolearn feature is so conservative? At this point > I'm getting lots of stuff like this: > > X-Spam-Status: Yes, hits=5.8 required=2.5 tests=BAYES_99,HTML_FONT_BIG, > HTML_MESSAGE autolearn=no version=2.63 > X-Spam-Report: > * 0.1 HTML_MESSAGE BODY: HTML included in message > * 0.3 HTML_FONT_BIG BODY: HTML has a big font > * 5.4 BAYES_99 BODY: Bayesian spam probability is 99 to 100% > * [score: 1.0000] > > Notice that, even though I get a hit on BAYES_99, I still get > autolearn=no. Ah well, I guess I should be asking that question of the > SpamAssassin guys. Also notice that this sucker would have gotten > through with a score of only 0.4 had it not been for the bayesian filter. BAYES_99 means that its already been found in the bayes filter, so why would it once more autolearn it? :) ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Marc G. Fournier wrote: > BAYES_99 means that its already been found in the bayes filter, so why > would it once more autolearn it? :) To add more spam words to its vocabulary of course. Learning works both ways... Greg
On Tue, Apr 20, 2004 at 01:06:18AM -0400, Tom Lane wrote: > 3. I have noticed that bouncing any machine that sends "HELO > sss.pgh.pa.us" gets rid of a ton of spam and viruses. I don't know of > any real clean way to do this, but I have a sendmail.cf hack for it. By the way, thanks very much for this tip. This almost in one hit made a many of our spam and virus filters redundant. Very nice on the load. I'd noticed that some perl mail modules appear to get this wrong but it efficiently catches our customers sending viruses and spam through our relay too. I'm using Exim 3 so I can only pick this up after the mail has been received but with Exim 4 I should be able to kill the email in SMTP stage. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.