开发者

How does encoding in email subjects work? (Django/ Python)

开发者 https://www.devze.com 2023-02-18 00:09 出处:网络
I am sending email with EmailMessage object to Gmail box. The subject of an email looks something like this:

I am sending email with EmailMessage object to Gmail box.

The subject of an email looks something like this: u"You got a letter from Daėrius ęėįęėįęįėęįę---reply3_433441"

W开发者_开发技巧hen i receive an email, looking at the message info i can see that Subject line looks like this:

Subject: =?utf-8?b?WW91IGdvdCBhIGxldHRlciBmcm9tIERhxJdyaXVzIMSZxJfEr8SZxJfEr8SZ?= =?utf-8?b?xK/El8SZxK/EmS0tLXJlcGx5M180MzM0NDE=?=

How to decode this subject line?

I have sucesfully decoded email body (tex/plain) with this:

for part in msg.walk():
  if part.get_content_type() == 'text/plain':
    msg_encoding = part.get_content_charset()
    msg_text = part.get_payload().decode('quoted-printable')
msg_text = smart_unicode(msg_text, encoding=msg_encoding, strings_only=False, errors='strict') 


See RFC 2047 for a complete description of the format of internationalized email headers. The basic format is "=?" charset "?" encoding "?" encoded-text "?=". So in your case, you have a base-64 encoded UTF-8 string.

You can use the email.header.decode_header and str.decode functions to decode it and get a proper Unicode string:

>>> import email.header
>>> x = email.header.decode_header('=?utf-8?b?WW91IGdvdCBhIGxldHRlciBmcm9tIERhxJdyaXVzIMSZxJfEr8SZxJfEr8SZ?=')
>>> x
[('You got a letter from Da\xc4\x97rius \xc4\x99\xc4\x97\xc4\xaf\xc4\x99\xc4\x97\xc4\xaf\xc4\x99', 'utf-8')]
>>> x[0][0].decode(x[0][1])
u'You got a letter from Da\u0117rius \u0119\u0117\u012f\u0119\u0117\u012f\u0119'


You should look at the email.header module in the Python standard library. In particular, at the end of the documentation, there's a decode_header() function you can use to do most of the hard work for you.


the subject line is utf8 but you're reading it as ASCII, you're safest reading it all as utf8, as ASCII is effectively only as subset of utf8.

0

精彩评论

暂无评论...
验证码 换一张
取 消