i'm currently creating a little mail client and facing a problem with c开发者_如何学Pythonharset. I use indy's TIdIMAP4 component to retrieve data from mail-server. When i try to retrieve mail bodies then accent letters like ä, ü etc are converted to =E4, =FC respectively as it is using charset ISO-8859-1.
Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable
How can i make server to send me data in another charset, like utf-8? What would be the best solution for that problem?
Thanks in advance!
It is not the charset
that is producing strings like =E4
and =FC
, it is the Content-Transfer-Encoding
instead. $E4
and $FC
are the binary representations of ä
and ü
in ISO-8859-1, but they are 8-bit values. Email is still largely a 7-bit environment. Unless both clients and servers negotiate 8-bit transfers during their communications, then byte octets above $7F
have to be encoded in a 7-bit compatible manner to pass through email gateways safely, especially legacy ones that still exist. quoted-printable
is a commonly used 7-bit byte encoding in email for textual content. base64
is another one, but it is not human-readible so it tends to be used for binary data instead of textual data (though it can be used for text).
In any case, you cannot make the server deliver the email data to you in another encoding. The server is merely delivering the original email data as-is that was originally delivered to it by the sender. If you want the data in UTF-8, then you have to re-encode it yourself after downloading it. Indy will handle the decoding for you.
精彩评论