开发者

MySQL storing ââ¬Å in field

开发者 https://www.devze.com 2023-01-26 05:59 出处:网络
Hay, I\'ve made a simple application where users upload content to a MySQL database, however some entities are not being encoded. Things like this appear

Hay, I've made a simple application where users upload content to a MySQL database, however some entities are not being encoded. Things like this appear

ââ¬Å

instead of quotes and what not. I know is should had encoded these values into HTML entities when they were being inserted into the database, however there is now a few thousand rows containing data.

Do we have a PHP function to encode these values when the data is returned back to the browser?


Alex Brown update. Since encoding may be an issue here, too,开发者_如何学运维 here's what I see:

MySQL storing ââ¬Å in field


Your users' browsers are submitting data that is UTF-8 encoded, but you are connecting to a database which 1) presumes you are submitting latin1-encoded data 2) stores your data as latin1-encoded strings. Your data is therefore stored mis-encoded.

When you retrieve data from the database, it will be supplied as latin1 strings but because the latin1 strings are just mis-encoded UTF-8, and PHP naively treats all strings as binary strings, you just have your original UTF-8 string back again. When you echo this string onto an HTML page that is again declared as UTF-8 encoded, the string appears as it was submitted by the user.

However, if you view the contents of the database using software that has a sophisticated understanding of MySQL's treatment of character sets, that software will identify the column character sets as latin1 and make sure that the characters it displays are indeed the characters of the latin1 string, which as we noted was mis-encoded UTF-8. You therefore see mis-encoded data.

There are a number of steps you should take to remedy this. First, you are expecting to handle UTF-8-encoded data (as declared at the level of HTML), so you should make sure that that is how you are communicating with MySQL too. You do this by issuing SET NAMES 'utf8' whenever you form a connection to the database. (Note: your database API may provide a special function for changing a connection character set. I think the mysql API doesn't, but I'm not sure.)

Second, you should make sure that you are storing your data UTF-8 encoded. That means that your database columns' character sets should be utf8. A column character set can be changed using the ALTER TABLE ... MODIFY statement. Don't forget to also change the table default character set, too (this sets the character set for new columns added to the table for which you don't explicitly specify a character set). And change the database character set while you are at it.

However, when you change a column character set, MySQL presumes that the data already stored in rows of that table is not mis-encoded, but valid data stored in the old character set. It will therefore convert your mis-encoded UTF-8 data (which it treats as latin1-encoded data) into UTF-8-encoded data, so that you end up with data that is double-UTF-8-encoded. There is a trick to get around this: first convert the column character set to the binary character set, then from binary to utf8. That way, MySQL does not change the binary form of the data, because you are converting it via a format in which it is treated as arbitrary binary strings.

Good luck!

0

精彩评论

暂无评论...
验证码 换一张
取 消