librelist archives

« back to archive

Hello (using Lamson as a spam filter)

Hello (using Lamson as a spam filter)

From:
Morten W. Petersen
Date:
2009-09-08 @ 19:35
Hi  :)

I love Python, and discovered the Lamson project a couple of days ago..  
I'm
interested in using Lamson as a frontend for our mail service, filtering 
spam
and enabling users to train the system as to what's spam and what's ham.

I am tired of these quirky mail systems with their .. rough syntax and
hard-to-learn systems.

So, what I need Lamson to do is:

Act as a gateway in front of the real email system.

Do a remote callout to the final destination, to check that the mail 
address
is accepted.  Is it possible to setup Lamson so that this check is done at
RCPT stage, so that a message can be rejected if there are unknown
email addresses?

Trap messages which are believed to be spam;  then deliver these
later if a user after getting a report of spam messages approves a
message.

BTW, has Lamson been stress-tested?  Is it known to be stable?

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: morten@nidelven-it.no

Re: Hello (using Lamson as a spam filter)

From:
Morten W. Petersen
Date:
2009-09-14 @ 14:58
Hi again,

I've gotten Lamson running now, but I still feel like I'm missing some 
pieces of the puzzle.

I've modified the sample.py so that it looks like this:

import logging
from lamson.routing import route, route_like, stateless
from config.settings import relay
from lamson import view


@route(".+")
def START(message, address=None, host=None):
    return NEW_USER


@route_like(START)
def NEW_USER(message, address=None, host=None):
    return NEW_USER


@route_like(START)
def END(message, address=None, host=None):
    return NEW_USER(message, address, host)


@route_like(START)
@stateless
def FORWARD(message, address=None, host=None):
    relay.deliver(message)


Just changing the @route statement in the beginning.  I've looked at
the spam filtering docs, but they also don't clearly define how to setup
spam filtering.

What do I have to do to sample.py so that it 1) checks recipient
against final destination and 2) runs spambayes?

Thanks,

Morten

Morten W. Petersen skrev:
> Hi  :)
>
> I love Python, and discovered the Lamson project a couple of days ago..  
> I'm
> interested in using Lamson as a frontend for our mail service, filtering 
> spam
> and enabling users to train the system as to what's spam and what's ham.
>
> I am tired of these quirky mail systems with their .. rough syntax and
> hard-to-learn systems.
>
> So, what I need Lamson to do is:
>
> Act as a gateway in front of the real email system.
>
> Do a remote callout to the final destination, to check that the mail 
> address
> is accepted.  Is it possible to setup Lamson so that this check is done at
> RCPT stage, so that a message can be rejected if there are unknown
> email addresses?
>
> Trap messages which are believed to be spam;  then deliver these
> later if a user after getting a report of spam messages approves a
> message.
>
> BTW, has Lamson been stress-tested?  Is it known to be stable?
>
> -Morten
>
>   


-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: morten@nidelven-it.no

Re: Hello (using Lamson as a spam filter)

From:
Zed A. Shaw
Date:
2009-09-14 @ 15:57
On Mon, Sep 14, 2009 at 04:58:31PM +0200, Morten W. Petersen wrote:
> Hi again,
> 
> I've gotten Lamson running now, but I still feel like I'm missing some 
> pieces of the puzzle.

Cool, so this sample.py is in app/handlers after you generated it using:

lamson gen -project myproject

Right?

> Just changing the @route statement in the beginning.  I've looked at
> the spam filtering docs, but they also don't clearly define how to setup
> spam filtering.
> 
> What do I have to do to sample.py so that it 1) checks recipient
> against final destination and 2) runs spambayes?

For #1, just write code that does any of the checks you need at each of
the states you've got.  You probably want to use message.route_from and
message.route_to for any comparisons of recipient or sender.  These
two variables are normalized and don't have any first+last name parts,
just an address.

For #2, just read through this:

http://lamsonproject.org/docs/filtering_spam.html

Also there's lots of other docs on specific topics here:

http://lamsonproject.org/docs/

Let me know how that works.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: Hello (using Lamson as a spam filter)

From:
Morten W. Petersen
Date:
2009-09-14 @ 16:20
Zed A. Shaw skrev:
> On Mon, Sep 14, 2009 at 04:58:31PM +0200, Morten W. Petersen wrote:
>   
>> Hi again,
>>
>> I've gotten Lamson running now, but I still feel like I'm missing some 
>> pieces of the puzzle.
>>     
>
> Cool, so this sample.py is in app/handlers after you generated it using:
>
> lamson gen -project myproject
>
> Right?
>   

Right.

>> Just changing the @route statement in the beginning.  I've looked at
>> the spam filtering docs, but they also don't clearly define how to setup
>> spam filtering.
>>
>> What do I have to do to sample.py so that it 1) checks recipient
>> against final destination and 2) runs spambayes?
>>     
>
> For #1, just write code that does any of the checks you need at each of
> the states you've got.  You probably want to use message.route_from and
> message.route_to for any comparisons of recipient or sender.  These
> two variables are normalized and don't have any first+last name parts,
> just an address.
>   

OK.  So something like this should work:

@route(".+")
def START(message, address=None, host=None):
    if check_remote_recipient(message.route_to):
        return NEW_USER
    else:
        raise 'Error'

How do I signal a failure?  Can I for example raise a 550 SMTP error 
when the
recipient isn't accepted?

> For #2, just read through this:
>
> http://lamsonproject.org/docs/filtering_spam.html
>
> Also there's lots of other docs on specific topics here:
>
> http://lamsonproject.org/docs/
>
> Let me know how that works

What do the different "commands" START, POSTING, etc. mean?  Is there
a standard way these are run when a message is received?

-Morten


-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: morten@nidelven-it.no

Re: Hello (using Lamson as a spam filter)

From:
Zed A. Shaw
Date:
2009-09-14 @ 19:21
On Mon, Sep 14, 2009 at 06:20:48PM +0200, Morten W. Petersen wrote:
> Zed A. Shaw skrev:
> OK.  So something like this should work:
> 
> @route(".+")
> def START(message, address=None, host=None):
>     if check_remote_recipient(message.route_to):
>         return NEW_USER
>     else:
>         raise 'Error'
> 
> How do I signal a failure?  Can I for example raise a 550 SMTP error 
> when the
> recipient isn't accepted?

Two things, first, just raise a lamson.server.SMTPError(550) and it'll
return that to the client.  BUT, only if you're running with the
SMTPReciever.

Second, this is probably a bad idea.  While the standard says you
*should* do this, it leaks out information about who's in your email
system or not.  It's better to not return any information at all, and
just drop the email into a queue for checking on later.

> What do the different "commands" START, POSTING, etc. mean?  Is there
> a standard way these are run when a message is received?

They are mostly arbitrary names for states that the sender is in with
your application.  If you don't really know what that means, then read:

http://lamsonproject.org/docs/introduction_to_finite_state_machines.html

And in fact, by you asking this question I can kind of tell you haven't
read through the docs or checked out any of the examples.  Go grab the
source from http://lamsonproject.org/releases/ and check out the
examples/ directory and check out the code in there.  It'll show you
what's going on pretty quickly.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: Hello (using Lamson as a spam filter)

From:
Morten W. Petersen
Date:
2009-09-15 @ 19:49
Zed A. Shaw skrev:
> On Mon, Sep 14, 2009 at 06:20:48PM +0200, Morten W. Petersen wrote:
>   
>> Zed A. Shaw skrev:
>> OK.  So something like this should work:
>>
>> @route(".+")
>> def START(message, address=None, host=None):
>>     if check_remote_recipient(message.route_to):
>>         return NEW_USER
>>     else:
>>         raise 'Error'
>>
>> How do I signal a failure?  Can I for example raise a 550 SMTP error 
>> when the
>> recipient isn't accepted?
>>     
>
> Two things, first, just raise a lamson.server.SMTPError(550) and it'll
> return that to the client.  BUT, only if you're running with the
> SMTPReciever.
>
> Second, this is probably a bad idea.  While the standard says you
> *should* do this, it leaks out information about who's in your email
> system or not.  It's better to not return any information at all, and
> just drop the email into a queue for checking on later.
>   

OK, I get your point.  :)

>> What do the different "commands" START, POSTING, etc. mean?  Is there
>> a standard way these are run when a message is received?
>>     
>
> They are mostly arbitrary names for states that the sender is in with
> your application.  If you don't really know what that means, then read:
>
> http://lamsonproject.org/docs/introduction_to_finite_state_machines.html
>
> And in fact, by you asking this question I can kind of tell you haven't
> read through the docs or checked out any of the examples.  Go grab the
> source from http://lamsonproject.org/releases/ and check out the
> examples/ directory and check out the code in there.  It'll show you
> what's going on pretty quickly

Yes, I've been looking through the examples, and I think I'm making more
sense of it. 

In fact, I have a problem :)  Calling str(message) on a message object
renders the following error:

Traceback (most recent call last):
  File 

"/opt/python262/lib/python2.6/site-packages/lamson-1.0pre5-py2.6.egg/lamson/routing.py",

line 373, in call_safely
    func(message, **kwargs)
  File 

"/opt/python262/lib/python2.6/site-packages/lamson-1.0pre5-py2.6.egg/lamson/routing.py",

line 494, in routing_wrapper
    next_state = func(message, *args, **kw)
  File "/var/lamson/app/handlers/sample.py", line 34, in START
    file.write(str(message))
  File 

"/opt/python262/lib/python2.6/site-packages/lamson-1.0pre5-py2.6.egg/lamson/mail.py",

line 110, in __str__
    return encoding.to_string(self.base)
  File 

"/opt/python262/lib/python2.6/site-packages/lamson-1.0pre5-py2.6.egg/lamson/encoding.py",

line 288, in to_string
    return to_message(mail).as_string(envelope_header)
  File 

"/opt/python262/lib/python2.6/site-packages/lamson-1.0pre5-py2.6.egg/lamson/encoding.py",

line 271, in to_message
    (ctype, params))
EncodingError: Content-Type malformed, not allowed: 'multipart/related'; 
{'type': 'multipart/alternative'}

Got any ideas what's up here?

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: morten@nidelven-it.no

Re: Hello (using Lamson as a spam filter)

From:
Zed A. Shaw
Date:
2009-09-15 @ 20:08
On Tue, Sep 15, 2009 at 09:49:55PM +0200, Morten W. Petersen wrote:
> 
"/opt/python262/lib/python2.6/site-packages/lamson-1.0pre5-py2.6.egg/lamson/encoding.py",

> line 271, in to_message
>     (ctype, params))
> EncodingError: Content-Type malformed, not allowed: 'multipart/related'; 
> {'type': 'multipart/alternative'}
> 
> Got any ideas what's up here?

Do you have the message that's causing this?  If you can, put it in a
.zip and send it to me.

What's probably happening is you've got something sending that
particular multipart type, which Lamson doesn't have code to handle.
Hopefully with the sample I can put in support, or explicitly say that
it won't be handled.

Any info you got will help.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: Hello (using Lamson as a spam filter)

From:
Morten W. Petersen
Date:
2009-09-15 @ 20:36
Zed A. Shaw skrev:
> On Tue, Sep 15, 2009 at 09:49:55PM +0200, Morten W. Petersen wrote:
>   
>> 
"/opt/python262/lib/python2.6/site-packages/lamson-1.0pre5-py2.6.egg/lamson/encoding.py",

>> line 271, in to_message
>>     (ctype, params))
>> EncodingError: Content-Type malformed, not allowed: 'multipart/related'; 
>> {'type': 'multipart/alternative'}
>>
>> Got any ideas what's up here?
>>     
>
> Do you have the message that's causing this?  If you can, put it in a
> .zip and send it to me.
>
> What's probably happening is you've got something sending that
> particular multipart type, which Lamson doesn't have code to handle.
> Hopefully with the sample I can put in support, or explicitly say that
> it won't be handled.
>
> Any info you got will help

Yep, I've added some code that pickles the messages that fail.  But
it would be better to have the, raw, original version of the message.

Is there a way to get the raw message in a handler's START function?

Also, I find it a bit scary that Lamson can reject messages like that.. if
this were a live server the system would swallow the message and the
only trace of it would be from the logs (which aren't necessarily read).

Is it such a good idea to have a wrapper around the messages that
needs to be .. 'intelligent'?

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: morten@nidelven-it.no

Re: Hello (using Lamson as a spam filter)

From:
Zed A. Shaw
Date:
2009-09-15 @ 22:01
On Tue, Sep 15, 2009 at 10:36:19PM +0200, Morten W. Petersen wrote:
> Yep, I've added some code that pickles the messages that fail.  But
> it would be better to have the, raw, original version of the message.
>
> Is there a way to get the raw message in a handler's START function?

There's a setting a way to set an undeliverable queue, but that works
for messages that route badly, not ones that abort.  I'll look at adding
a queue for "total screwups" for situations like this.
 
> Also, I find it a bit scary that Lamson can reject messages like that.. if
> this were a live server the system would swallow the message and the
> only trace of it would be from the logs (which aren't necessarily read).

"Scary"?  That's a little harsh don't ya think? :-)

It turns out that either Lamson rejects a very tiny percentage of email,
or it's a small change for some weird server/client that Lamson needs to
handle anyway.  For example, I haven't ran it against any hardcore
Exchange server traffic yet, so I'm sure it will have problems.

> Is it such a good idea to have a wrapper around the messages that
> needs to be .. 'intelligent'?

Yes, you *must* have this wrapper, because every email you receive needs
to be converted into something that your language can use.  If Lamson
didn't do this in a consistent way, you'd end up writing it yourself and
probably getting it wrong.  Instead, Lamson does the conversion and the
goal is to make the conversion handle all of the email that isn't
violating the standard in wildly wrong ways.  It's already pretty close,
with the exception of a few odd parts of the standard rarely used.

The result is that you get very clean email, reject most horribly
formatted emails and spam, and end up having a better end user
experience.  The down side is that you might miss a rare email that is
sent by a poorly done client or server.

It looks like the simple solution to that disadvantage is to just save
crap email into a queue for later inspection.  I think once it does that
you'll have the best of all worlds:  nice clean email with an ability to
check on exceptional cases.

Sound reasonable?

-- 
Zed A. Shaw
http://zedshaw.com/

Re: Hello (using Lamson as a spam filter)

From:
Morten W. Petersen
Date:
2009-09-15 @ 22:56
Zed A. Shaw skrev:
> On Tue, Sep 15, 2009 at 10:36:19PM +0200, Morten W. Petersen wrote:
>   
>> Yep, I've added some code that pickles the messages that fail.  But
>> it would be better to have the, raw, original version of the message.
>>
>> Is there a way to get the raw message in a handler's START function?
>>     
>
> There's a setting a way to set an undeliverable queue, but that works
> for messages that route badly, not ones that abort.  I'll look at adding
> a queue for "total screwups" for situations like this.
>   

OK.

>> Also, I find it a bit scary that Lamson can reject messages like that.. if
>> this were a live server the system would swallow the message and the
>> only trace of it would be from the logs (which aren't necessarily read).
>>     
>
> "Scary"?  That's a little harsh don't ya think? :-)
>
> It turns out that either Lamson rejects a very tiny percentage of email,
> or it's a small change for some weird server/client that Lamson needs to
> handle anyway.  For example, I haven't ran it against any hardcore
> Exchange server traffic yet, so I'm sure it will have problems.
>   

Yeah.  Well, there are myriads of email 'clients' out there, I
think it is a tall order to expect handling all of those and their quirks.

>> Is it such a good idea to have a wrapper around the messages that
>> needs to be .. 'intelligent'?
>>     
>
> Yes, you *must* have this wrapper, because every email you receive needs
> to be converted into something that your language can use.  If Lamson
> didn't do this in a consistent way, you'd end up writing it yourself and
> probably getting it wrong.  Instead, Lamson does the conversion and the
> goal is to make the conversion handle all of the email that isn't
> violating the standard in wildly wrong ways.  It's already pretty close,
> with the exception of a few odd parts of the standard rarely used.
>
> The result is that you get very clean email, reject most horribly
> formatted emails and spam, and end up having a better end user
> experience.  The down side is that you might miss a rare email that is
> sent by a poorly done client or server.
>
> It looks like the simple solution to that disadvantage is to just save
> crap email into a queue for later inspection.  I think once it does that
> you'll have the best of all worlds:  nice clean email with an ability to
> check on exceptional cases.
>
> Sound reasonable?
>   

If I could get my hands on a raw message, that sounds perfect. 

Here's how I'm using Lamson now:  I simply have it running on a
port 25 accepting emails, and then dumping all of those emails to
a given spool directory.

 From there, a python script (which I wrote today) will be
checking each message to see if the final destination accepts
it.  As you said, it is better to accept everything that comes in
and have processes running in the background that can .. in
their due time, process the messages.

So, if the final destination accepts the recipient, a message
will be moved to another queue where another script will do
ham/spam testing.

And so on..

I'm also going to develop a web frontend so that users can
train the spam filtering process and release false positives
from their quarantine.

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: morten@nidelven-it.no

Re: Hello (using Lamson as a spam filter)

From:
Zed A. Shaw
Date:
2009-09-15 @ 23:13
On Wed, Sep 16, 2009 at 12:56:13AM +0200, Morten W. Petersen wrote:
> If I could get my hands on a raw message, that sounds perfect. 
> 
> Here's how I'm using Lamson now:  I simply have it running on a
> port 25 accepting emails, and then dumping all of those emails to
> a given spool directory.

You don't want Lamson for this.  It's kind of a waste to have Lamson
accept email and then have another Python script run on that email.
Either configure a postfix server to dump to a directory and just run
Lamson on that, or have your Lamson server do all the processing right
there.
 
>  From there, a python script (which I wrote today) will be
> checking each message to see if the final destination accepts
> it.  As you said, it is better to accept everything that comes in
> and have processes running in the background that can .. in
> their due time, process the messages.
> 
> So, if the final destination accepts the recipient, a message
> will be moved to another queue where another script will do
> ham/spam testing.

Yeah, you *really* need to setup a postfix server to do this, and just
use Lamson.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: Hello (using Lamson as a spam filter)

From:
Morten W. Petersen
Date:
2009-09-15 @ 23:20
Zed A. Shaw skrev:
> On Wed, Sep 16, 2009 at 12:56:13AM +0200, Morten W. Petersen wrote:
>   
>> If I could get my hands on a raw message, that sounds perfect. 
>>
>> Here's how I'm using Lamson now:  I simply have it running on a
>> port 25 accepting emails, and then dumping all of those emails to
>> a given spool directory.
>>     
>
> You don't want Lamson for this.  It's kind of a waste to have Lamson
> accept email and then have another Python script run on that email.
> Either configure a postfix server to dump to a directory and just run
> Lamson on that, or have your Lamson server do all the processing right
> there.
>   

Well, hmm.  I've tried some of the different MTAs out there, and they
are hard to use, hard to configure and not much fun at all.

There might be some greylisting etc. in the future, so I'd like to have
an easy to configure/program MTA to deal with.

Anyway, raw message content would be great.  :)

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: morten@nidelven-it.no