开发者

Django legacy database encoding

开发者 https://www.devze.com 2022-12-20 01:02 出处:网络
I\'m sure this question is not specific to django, but since I couldn\'t find any solution for my problem in other questions about python and encodings, I\'m goi开发者_高级运维ng to ask this.

I'm sure this question is not specific to django, but since I couldn't find any solution for my problem in other questions about python and encodings, I'm goi开发者_高级运维ng to ask this. I need to add new features to existing website which is written in PHP using MySQL as backend. I inspected the database and created models for tables I am going to use. However, there is a problem with the existing data- half of it is in russian, and (at least it seems to me) it's in utf-8 encoding. When I show that data in django's admin, it doesn't appear right.

In [52]: p.name
Out[52]: u'\xd0\u02dc\xd0\xb3\xd0\xbe\xd1\u20ac\xd1\u0152 '

In [53]: repr(p.name)
Out[53]: "u'\\xd0\\u02dc\\xd0\\xb3\\xd0\\xbe\\xd1\\u20ac\\xd1\\u0152 '"

In django admin it displays like this:

Игорь

Encodings are still a little bit mythical for me, but if I understand this output correctly, basically those are utf-8 bytes in unicode object.

The question: is it possible to fix this in django's database layer? I'm going to update existing content in these tables, and I need the existing PHP front-end to be compatible with both the new data and old one.

When I add these database options data is displayed in admin correctly, however, I get UnicodeEncode error when saving something.

DATABASE_OPTIONS = {
    'charset': 'latin1',
    'use_unicode': False,
}

Name returned in this case is:

In [2]: p2.name
Out[2]: '\xd0\x9b\xd0\xae\xd0\xa1\xd0\xaf'

I checked with utf-8 character table, and those are correct characters for the data stored in that row.


Check your mysql connection parameters. Also, You can specify DATABASE_OPTIONS:

DATABASE_OPTIONS = {
    "charset": "utf8",
    "init_command": "SET storage_engine=InnoDB",
}

But check out if it's really utf-8. Also note that connection and server encoding must be in sync.


Actually this problem was the database's previous character set and collation- it was latin1, but data was inserted using utf-8 charset. It was solved by exporting data using latin1 charset, replacing all occurences of latin1 with utf8 and importing data again. This answer shows how to do this: MySQL Convert latin1 data to UTF8

0

精彩评论

暂无评论...
验证码 换一张
取 消