开发者

Decoding not reversing unicode encoding in Django/Python

开发者 https://www.devze.com 2022-12-26 08:06 出处:网络
Ok, I have a hardcoded string I declare like this name = u\"Par Catégorie\" I have a # -- coding: utf-8 -- magic header, so I am guessing it\'sconverted to utf-8

Ok, I have a hardcoded string I declare like this

name = u"Par Catégorie"

I have a # -- coding: utf-8 -- magic header, so I am guessing it's converted to utf-8

Down the road it's outputted to xml through

xml_output.toprettyxml(indent='....', encoding='utf-8')

And I get a

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

Most of my data is in French and is ouputted correctly in CDATA nodes, but that one harcoded string keep ... I开发者_JAVA百科 don't see why an ascii codec is called.

what's wrong ?


The coding header in your source file tells Python what encoding your source is in. It's the encoding Python uses to decode the source of the unicode string literal (u"Par Catégorie") into a unicode object. The unicode object itself has no encoding; it's raw unicode data. (Internally, Python will use one of two encodings, depending on how it was configured, but Python code shouldn't worry about that.)

The UnicodeDecodeError you get means that somewhere, you are mixing unicode strings and bytestrings (normal strings.) When mixing them together (concatenating, performing string interpolation, et cetera) Python will try to convert the bytestring into a unicode string by decoding the bytestring using the default encoding, ASCII. If the bytestring contains non-ASCII data, this will fail with the error you see. The operation being done may be in a library somewhere, but it still means you're mixing inputs of different types.

Unfortunately the fact that it'll work just fine as long as the bytestrings contain just ASCII data means this type of error is all too frequent even in library code. Python 3.x solves that problem by getting rid of the implicit conversion between unicode strings (just str in 3.x) and bytestrings (the bytes type in 3.x.)


Wrong parameter name? From the doc, I can see the keyword argument name is supposed to be encoding and not coding.

0

精彩评论

暂无评论...
验证码 换一张
取 消