I was just looking at my automatic mbox archives for librelist and noticed some messages sent by Favio Manríquez León had mangled "From:" headers. The original messages that got delivered to me via SMTP look fine, however. Here's how it was delivered to my inbox (looks correct to me): From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com> Here's how it looks in the archives[1]: From: =?utf-8?q?=22Favio_Manr=C3=ADquez_Le=C3=B3n=22_=3Cfavio-AT-favrik=2Ecom=3E?= It ends up looking like crap (as-is in escaped form) in mutt[2] for me. I've also added some UTF-8 characters to my headers in this message so we'll see if it affects other things, too :) [1] http://librelist.com/archives/meta/2009/08/20/queue/new/1250793630.M797990P5031Q29.09c5769d5b9f3d575cefc2ccb51877ec [2] mutt 1.5.18-6 from Debian 5.0 -- Eric Wong
On Sat, Aug 29, 2009 at 05:53:00PM -0700, Eric Wong wrote: > I was just looking at my automatic mbox archives for librelist and > noticed some messages sent by Favio Manríquez León had mangled "From:" > headers. The original messages that got delivered to me via SMTP look > fine, however. > > Here's how it was delivered to my inbox (looks correct to me): > > From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com> Aha! I figured it out, man that was a tough one to trace. So, it's in this code in lamson/encoding.py: def properly_encode_header(value, encoder): try: return value.encode("ascii") except UnicodeEncodeError: if '@' in value: # this could have an email address, make sure we don't screw # it up name, address = parseaddr(value) return '"%s" <%s>' % (encoder.header_encode(name.encode("utf-8")), address) return encoder.header_encode(value.encode("utf-8")) Notice how I look for an '@' to see if it's an email address, and therefore needs that special encoding for an email address. Well that all work, until Librelist removes the '@' and replaces it with a -AT- to help with spam bots. Then it isn't an email address anymore, so Lamson encodes the whole string like normal. This means that either I have to write a more robust "could be an email address" check, which could get really hairy, or just have Librelist mangle the address in a way that keep the @. Got any opinions? -- Zed A. Shaw http://zedshaw.com/
"Zed A. Shaw" <zedshaw@zedshaw.com> wrote: > On Sat, Aug 29, 2009 at 05:53:00PM -0700, Eric Wong wrote: > > I was just looking at my automatic mbox archives for librelist and > > noticed some messages sent by Favio Manríquez León had mangled "From:" > > headers. The original messages that got delivered to me via SMTP look > > fine, however. > > > > Here's how it was delivered to my inbox (looks correct to me): > > > > From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com> > > > Aha! I figured it out, man that was a tough one to trace. So, it's in > this code in lamson/encoding.py: > > def properly_encode_header(value, encoder): > try: > return value.encode("ascii") > except UnicodeEncodeError: > if '@' in value: > # this could have an email address, make sure we don't screw > # it up > name, address = parseaddr(value) > return '"%s" <%s>' % (encoder.header_encode(name.encode("utf-8")), address) > > return encoder.header_encode(value.encode("utf-8")) > > Notice how I look for an '@' to see if it's an email address, and > therefore needs that special encoding for an email address. Well that > all work, until Librelist removes the '@' and replaces it with a -AT- > to help with spam bots. Then it isn't an email address anymore, so > Lamson encodes the whole string like normal. > > This means that either I have to write a more robust "could be an email > address" check, which could get really hairy, or just have Librelist > mangle the address in a way that keep the @. > > Got any opinions? I don't think the s/@/-AT-/ mangling is an effective defense at all[1]. It might stop the least sophisticated spammers but can also make life harder for everyone else. So I'd say just leave the email address as-is. I'm no fan of the mangling that Gmane does, either. I assume anybody who posts on public mailing lists (especially technical ones) already has decent defense against spammers. I actually don't even notice spam anymore, my well-trained SpamAssassin setup catches nearly all of them and I can quickly train the rest. [1] - of course I'm not a spammer and don't scrape for addresses :) -- Eric Wong
On Sat, Aug 29, 2009 at 05:53:00PM -0700, Eric Wong wrote: > I was just looking at my automatic mbox archives for librelist and > noticed some messages sent by Favio Manríquez León had mangled "From:" > headers. The original messages that got delivered to me via SMTP look > fine, however. > > Here's how it was delivered to my inbox (looks correct to me): > > From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com> > > > Here's how it looks in the archives[1]: > > From: =?utf-8?q?=22Favio_Manr=C3=ADquez_Le=C3=B3n=22_=3Cfavio-AT-favrik=2Ecom=3E?= Odd, it was delivered to you correct but then wrong in the archive? It should be the same. I'll take a look. At first glance it looks like it's double escaped. -- Zed A. Shaw http://zedshaw.com/
"Zed A. Shaw" <zedshaw@zedshaw.com> wrote: > On Sat, Aug 29, 2009 at 05:53:00PM -0700, Eric Wong wrote: > > I was just looking at my automatic mbox archives for librelist and > > noticed some messages sent by Favio Manríquez León had mangled "From:" > > headers. The original messages that got delivered to me via SMTP look > > fine, however. > > > > Here's how it was delivered to my inbox (looks correct to me): > > > > From: "=?utf-8?q?Favio_Manr=C3=ADquez_Le=C3=B3n?=" <favio@favrik.com> > > > > > > Here's how it looks in the archives[1]: > > > > From: =?utf-8?q?=22Favio_Manr=C3=ADquez_Le=C3=B3n=22_=3Cfavio-AT-favrik=2Ecom=3E?= > > Odd, it was delivered to you correct but then wrong in the archive? It > should be the same. I'll take a look. At first glance it looks like > it's double escaped. Exactly. The archive doesn't seem to be double-escaped, it's just the entire header value is escaped, not just the parts that need to be. The " is being escaped to =22 as is the email address portion (which afaik, shouldn't be escaped). I don't think anything on my end is somehow fixing the header before it gets to me, either: I've just piped the output of curl http://librelist.com/archives/meta/2009/08/20/queue/new/1250793630.M797990P5031Q29.09c5769d5b9f3d575cefc2ccb51877ec to each of the following commands (individually) 1) spamc -E -s 256000 --headers 2) /usr/local/libexec/dovecot/deliver -m INBOX 3) /usr/sbin/sendmail -oi ew 4) msmtp -f normalperson@yhbt.net normalperson@yhbt.net And none of them seemed to repair the escaping portion for me... -- Eric Wong