librelist archives

« back to archive

éscaping in headers in the archives

éscaping in headers in the archives

From:
Eric Wong
Date:
2009-08-30 @ 00:53
I was just looking at my automatic mbox archives for librelist and
noticed some messages sent by Favio Manríquez León had mangled "From:"
headers.  The original messages that got delivered to me via SMTP look
fine, however.

Here's how it was delivered to my inbox (looks correct to me):

  From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com>


Here's how it looks in the archives[1]:

  From: 
=?utf-8?q?=22Favio_Manr=C3=ADquez_Le=C3=B3n=22_=3Cfavio-AT-favrik=2Ecom=3E?=

It ends up looking like crap (as-is in escaped form) in mutt[2] for me.

I've also added some UTF-8 characters to my headers in this message
so we'll see if it affects other things, too :)

[1] 
http://librelist.com/archives/meta/2009/08/20/queue/new/1250793630.M797990P5031Q29.09c5769d5b9f3d575cefc2ccb51877ec
[2] mutt 1.5.18-6 from Debian 5.0

-- 
Eric Wong

Re: éscaping in headers in the archives

From:
Zed A. Shaw
Date:
2009-09-05 @ 21:59
On Sat, Aug 29, 2009 at 05:53:00PM -0700, Eric Wong wrote:
> I was just looking at my automatic mbox archives for librelist and
> noticed some messages sent by Favio Manríquez León had mangled "From:"
> headers.  The original messages that got delivered to me via SMTP look
> fine, however.
> 
> Here's how it was delivered to my inbox (looks correct to me):
> 
>   From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com>


Aha! I figured it out, man that was a tough one to trace.  So, it's in
this code in lamson/encoding.py:

def properly_encode_header(value, encoder):
    try:
        return value.encode("ascii")
    except UnicodeEncodeError:
        if '@' in value:
            # this could have an email address, make sure we don't screw
            # it up
            name, address = parseaddr(value)
            return '"%s" <%s>' % 
(encoder.header_encode(name.encode("utf-8")), address)

        return encoder.header_encode(value.encode("utf-8"))

Notice how I look for an '@' to see if it's an email address, and
therefore needs that special encoding for an email address.  Well that
all work, until Librelist removes the '@' and replaces it with a -AT-
to help with spam bots.  Then it isn't an email address anymore, so
Lamson encodes the whole string like normal.

This means that either I have to write a more robust "could be an email
address" check, which could get really hairy, or just have Librelist
mangle the address in a way that keep the @.

Got any opinions?

-- 
Zed A. Shaw
http://zedshaw.com/

Re: éscaping in headers in the archives

From:
Eric Wong
Date:
2009-09-05 @ 23:26
"Zed A. Shaw" <zedshaw@zedshaw.com> wrote:
> On Sat, Aug 29, 2009 at 05:53:00PM -0700, Eric Wong wrote:
> > I was just looking at my automatic mbox archives for librelist and
> > noticed some messages sent by Favio Manríquez León had mangled "From:"
> > headers.  The original messages that got delivered to me via SMTP look
> > fine, however.
> > 
> > Here's how it was delivered to my inbox (looks correct to me):
> > 
> >   From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com>
> 
> 
> Aha! I figured it out, man that was a tough one to trace.  So, it's in
> this code in lamson/encoding.py:
> 
> def properly_encode_header(value, encoder):
>     try:
>         return value.encode("ascii")
>     except UnicodeEncodeError:
>         if '@' in value:
>             # this could have an email address, make sure we don't screw
>             # it up
>             name, address = parseaddr(value)
>             return '"%s" <%s>' % 
(encoder.header_encode(name.encode("utf-8")), address)
> 
>         return encoder.header_encode(value.encode("utf-8"))
> 
> Notice how I look for an '@' to see if it's an email address, and
> therefore needs that special encoding for an email address.  Well that
> all work, until Librelist removes the '@' and replaces it with a -AT-
> to help with spam bots.  Then it isn't an email address anymore, so
> Lamson encodes the whole string like normal.
> 
> This means that either I have to write a more robust "could be an email
> address" check, which could get really hairy, or just have Librelist
> mangle the address in a way that keep the @.
> 
> Got any opinions?

I don't think the s/@/-AT-/ mangling is an effective defense at all[1].
It might stop the least sophisticated spammers but can also make life
harder for everyone else.  So I'd say just leave the email address
as-is.  I'm no fan of the mangling that Gmane does, either.

I assume anybody who posts on public mailing lists (especially technical
ones) already has decent defense against spammers.  I actually don't
even notice spam anymore, my well-trained SpamAssassin setup catches
nearly all of them and I can quickly train the rest.

[1] - of course I'm not a spammer and don't scrape for addresses :)

-- 
Eric Wong

Re: éscaping in headers in the archives

From:
Zed A. Shaw
Date:
2009-08-30 @ 05:24
On Sat, Aug 29, 2009 at 05:53:00PM -0700, Eric Wong wrote:
> I was just looking at my automatic mbox archives for librelist and
> noticed some messages sent by Favio Manríquez León had mangled "From:"
> headers.  The original messages that got delivered to me via SMTP look
> fine, however.
> 
> Here's how it was delivered to my inbox (looks correct to me):
> 
>   From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com>
> 
> 
> Here's how it looks in the archives[1]:
> 
>   From: 
=?utf-8?q?=22Favio_Manr=C3=ADquez_Le=C3=B3n=22_=3Cfavio-AT-favrik=2Ecom=3E?=

Odd, it was delivered to you correct but then wrong in the archive?  It
should be the same.  I'll take a look.  At first glance it looks like
it's double escaped.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: éscaping in headers in the archives

From:
Eric Wong
Date:
2009-08-31 @ 02:04
"Zed A. Shaw" <zedshaw@zedshaw.com> wrote:
> On Sat, Aug 29, 2009 at 05:53:00PM -0700, Eric Wong wrote:
> > I was just looking at my automatic mbox archives for librelist and
> > noticed some messages sent by Favio Manríquez León had mangled "From:"
> > headers.  The original messages that got delivered to me via SMTP look
> > fine, however.
> > 
> > Here's how it was delivered to my inbox (looks correct to me):
> > 
> >   From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com>
> > 
> > 
> > Here's how it looks in the archives[1]:
> > 
> >   From: 
=?utf-8?q?=22Favio_Manr=C3=ADquez_Le=C3=B3n=22_=3Cfavio-AT-favrik=2Ecom=3E?=
> 
> Odd, it was delivered to you correct but then wrong in the archive?  It
> should be the same.  I'll take a look.  At first glance it looks like
> it's double escaped.

Exactly.

The archive doesn't seem to be double-escaped, it's just the entire
header value is escaped, not just the parts that need to be.  The " is
being escaped to =22 as is the email address portion (which afaik,
shouldn't be escaped).

I don't think anything on my end is somehow fixing the header
before it gets to me, either:

I've just piped the output of

 curl 
http://librelist.com/archives/meta/2009/08/20/queue/new/1250793630.M797990P5031Q29.09c5769d5b9f3d575cefc2ccb51877ec

to each of the following commands (individually)

1) spamc -E -s 256000 --headers
2) /usr/local/libexec/dovecot/deliver -m INBOX
3) /usr/sbin/sendmail -oi ew
4) msmtp -f normalperson@yhbt.net normalperson@yhbt.net

And none of them seemed to repair the escaping portion for me...

-- 
Eric Wong