We restored from a backup in a different format to a new MySQL structure (which is setup correctly for UTF-8 support). We have weird characters showing in the browser, but we're not sure what they're called so we can find a master list of what they translate to.
I have noticed that they do, in fact, correlate to a specific character. For example:
â„¢ always translates to ™
— always translates to —
• always translates to ·
I referenced this post, which got me started, but this is far from a complete list. Either I'm not searching for the correct name, or the "master list" of these bad-to-good conversions as a reference doesn't exist.
Reference: Detecting utf8 broken characters in MySQL
开发者_如何学JAVAAlso, when trying to search via MySQL query, if I search for â, I always get MySQL treating it as an "a". Is there any way to tweak my MySQL queries so that they are more literal searches? We don't use internationalization much so I can safely assume any fields containing the â character is considered to be a problematic entry, which would need to be remedied by our "fixit" script we're building.
Instead of designing a "fixit" script to go through and replace this data, I think it would be better to simply fix the issue directly. It seems like the data was originally stored in a different format than UTF-8 so that when you brought it into the table that was set up for UTF-8
, it garbled the text. If you have the opportunity, go back to your original backup to determine the format the data was stored in. If you can't do that, you will probably need to do a bit of trial and error to figure out which format the data is in. However, once you know that, conversion is easy. Read the following article's section on Repairing:
http://www.istognosis.com/en/mysql/35-garbled-data-set-utf8-characters-to-mysql-
Basically you are going to set the column to BINARY
and then set it to the original charset. That should make the text appear properly (a good check to know you are using the correct charset). Once that is done, set the column to UTF-8
. This will convert the data properly and it will correct the problems you are currently experiencing.
精彩评论