开发者

XML UTF-8 data being written differently

开发者 https://www.devze.com 2023-01-26 09:18 出处:网络
Unfortunately I\'m working in an obscure platform called uniPaaS so I\'m probably after some platform-agnostic advice.

Unfortunately I'm working in an obscure platform called uniPaaS so I'm probably after some platform-agnostic advice.

I've got a Web Service request where the XML document contains those irritating smart quotes. The byt开发者_运维百科e data for the character is E2 80 99 (which is a 00002019 RIGHT SINGLE QUOTATION MARK)

XML UTF-8 data being written differently

When I write the XML file to disk on our staging server, it writes it correctly. When I write it on our production server, it totally changes the values of those bytes and malforms the XML document:

XML UTF-8 data being written differently

E2 80 99 becomes 92. Has anyone ever seen this sort of behaviour before? It seems to only be that one byte string (but the SOAP resonse is 50Mb large, so I haven't had a chance to diff the entire file).


It's encoding it as CP1251.

>>> '\x92'.decode('cp1251').encode('utf-8')
'\xe2\x80\x99'
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号