librelist archives

« back to archive

Spam Filter Reset

Spam Filter Reset

From:
Zed A. Shaw
Date:
2009-12-15 @ 17:26
Hi Everyone,

Over the weekend I had turned on spam filtering and trained the filter
with some spam, but it turned out to be much much too aggressive.  I've
since scaled it back and made the following changes:

* It now only filters spam from confirmed subscribers.  Previously it
filtered on the confirmation step which was catching empty emails as
spam.
* It is retrained with more extreme examples of spam.  Sadly, some of
your legitimate emails look a lot like spam :-).
* It's now keeping all messages it encounters.  Previously it had thrown
a few out before I caught it, so now it keeps all filtered messages and
then I can retrain false positives.

I'll be watching it all day and tweaking it, so hopefully nobody's email
gets caught, but if it does just wait a bit and it'll go through.

Sorry for the trouble.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: Spam Filter Reset

From:
Zed A. Shaw
Date:
2009-12-15 @ 17:39
On Tue, Dec 15, 2009 at 09:26:06AM -0800, Zed A. Shaw wrote:
> Hi Everyone,
> 
> Over the weekend I had turned on spam filtering and trained the filter
> with some spam, but it turned out to be much much too aggressive.  I've
> since scaled it back and made the following changes:

Scratch that, spam filter is totally disabled.  Spambayes is a piece of
junk that has way too many false positives, so I'll be looking for an
alternative.

Spam currently isn't a problem, since most spam bots can't figure out
the subscribe process, but it'll be needed in the near future based on
what I'm seeing.

Anyway, thanks for your patience in this.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: Spam Filter Reset

From:
Luke S Crawford
Date:
2009-12-16 @ 05:30
"Zed A. Shaw" <zedshaw@zedshaw.com> writes:

> On Tue, Dec 15, 2009 at 09:26:06AM -0800, Zed A. Shaw wrote:
> > Hi Everyone,
> > 
> > Over the weekend I had turned on spam filtering and trained the filter
> > with some spam, but it turned out to be much much too aggressive.  I've
> > since scaled it back and made the following changes:
> 
> Scratch that, spam filter is totally disabled.  Spambayes is a piece of
> junk that has way too many false positives, so I'll be looking for an
> alternative.

The best filtering I've ever gotten was from dspam;  but the problem
is that it required manual training.   If you made sure to mark every spam
message as spam, it was really awesome.   Almost no false positives
and well under one percent false negatives.  

The problem was feeding the thing.  the false negative rate would go up
fairly noticeably if you didn't mark spam.  Also, this was four years ago
that I used it. 

I also like a project I helped work on while at MAPS[1], DCC, and it's 
a bit more 'fire and forget'  -  If you reject HTML mail, dcc catches 
most of the rest.   Now, the problem with DCC is that it doesn't detect 
spam, it detects 'bulkness'  -  the idea is that it takes a cryptographic 
checksum (like md5sum) and passes that around to others who use dcc, so
you can then see if anyone else has seen the same message as you. 

Obviously, it marks legitimate mailing lists as 'bulk'  - normally
you just whitelist what you want.   For this application, though, that 
might not be a problem.  I mean, you don't want any mailing lists mailing 
your lists, right?

If you want a 'good enough' solution right now, Mark Perkel of 
junkemailfilter.com is getting some free hosting for one of his
backup boxes from me, so if you want I can get you free spamfiltering 
with that.  I'm using it now, and it's not as good as dspam,
but it's 'good enough' and god damn, it's easy.  you just set your
MXs to the junkemailfilter MXs and they forward it to your real server
mailhub style, so it's easy to switch out if you want.    It's probably 
not the best you could do for a long-term solution, if you wanted
to put some time into it, but it sure is easy if you need to stop the 
deluge of spam right now.  


[1] I say I helped Vernon, and I did, but it might have been more in the 
way a kid might help his dad fix a car;  I was pretty young.  I did some 
m4 macros for the sendmail config, though.

-- 
Luke S. Crawford
http://prgmr.com/xen/         -   Hosting for the technically adept
http://nostarch.com/xen.htm   -   We don't assume you are stupid.  

Re: Spam Filter Reset

From:
Eric Wong
Date:
2009-12-15 @ 20:10
"Zed A. Shaw" <zedshaw@zedshaw.com> wrote:
> On Tue, Dec 15, 2009 at 09:26:06AM -0800, Zed A. Shaw wrote:
> > Hi Everyone,
> > 
> > Over the weekend I had turned on spam filtering and trained the filter
> > with some spam, but it turned out to be much much too aggressive.  I've
> > since scaled it back and made the following changes:
> 
> Scratch that, spam filter is totally disabled.  Spambayes is a piece of
> junk that has way too many false positives, so I'll be looking for an
> alternative.

Hi Zed, I've had good experiences with SpamAssassin (spamc + spamd).

I like there being a combination manual rules in addition to Bayes,
so a weakness in one approach can get covered by the other and vice
versa.

In my experience, the default threshold score of 5.00 is a bit low for
new installations, so I initially set it to 9.00 and gradually decreased
it over time as the Bayes filter got trained.

-- 
Eric Wong

Re: Spam Filter Reset

From:
Mauricio Pasquier
Date:
2009-12-16 @ 04:12
On Tue, Dec 15, 2009 at 18:10, Eric Wong <normalperson@yhbt.net> wrote:
> "Zed A. Shaw" <zedshaw@zedshaw.com> wrote:
>> On Tue, Dec 15, 2009 at 09:26:06AM -0800, Zed A. Shaw wrote:
>> > Hi Everyone,
>> >
>> > Over the weekend I had turned on spam filtering and trained the filter
>> > with some spam, but it turned out to be much much too aggressive.  I've
>> > since scaled it back and made the following changes:
>>
>> Scratch that, spam filter is totally disabled.  Spambayes is a piece of
>> junk that has way too many false positives, so I'll be looking for an
>> alternative.
>
> Hi Zed, I've had good experiences with SpamAssassin (spamc + spamd).

I've heard some very good things about MailAvenger[0], which can be
used in combination with SpamAssassin and bayesian filters too.

By the way, thanks for the effort Zed!

[0]: http://www.mailavenger.org/

> I like there being a combination manual rules in addition to Bayes,
> so a weakness in one approach can get covered by the other and vice
> versa.
>
> In my experience, the default threshold score of 5.00 is a bit low for
> new installations, so I initially set it to 9.00 and gradually decreased
> it over time as the Bayes filter got trained.
>
> --
> Eric Wong
>