开发者

Is it OK to fix a character encoding error using SQL REPLACE?

开发者 https://www.devze.com 2022-12-14 07:20 出处:网络
I have a (Wordpress) blog and some of my older posts have a character encoding problem where £ displays as £ (i.e. a pound sign prepended with a capital \'A\' with a hat on).

I have a (Wordpress) blog and some of my older posts have a character encoding problem where £ displays as £ (i.e. a pound sign prepended with a capital 'A' with a hat on).

The problem is at the DB level, so I was going to run the following SQL statement:

update wp_posts set post_content = replace(post_content, ‘£’, ‘£’);

Would thi开发者_如何学运维s be foolish?


Background info (not required to read):

How did this problem happen? I don't know. The blog has been though various updates (including from Wordpress Version 2.1.3 when the default table CHARSET changed from latin1 to utf8) and been migrated to and from various machines and I guess at some point Wordpress must have written UTF-8 encoded characters into the Database that had a CHARSET of latin1, or vice-versa. I know I should have been more careful (yes I have read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)).

How have I ensured that this doesn't happen again? I have made sure my encodings are consistent. All MySQL tables use CHARSET utf-8 and the HEAD section of blog pages set <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />


It should be ok. The best thing is the following:

  • Make a dump of your blog db
  • Load it to another db
  • Perform the replace on the temporary db
  • Check!
  • If all goes well, perform it on the production db as well.


Well, I would say that it would probably be the best "solution" to the problem.

As the data has been stored using the wrong encoding somewhere along the line, the original data is lost and there is no real solution. You just have to try to salvage what you can from the corrupt data that you have.

If it's only isolated to a single character, you are lucky. There may be byte codes that didn't translate into any available character, so if that happened anywhere you wouldn't have a character combination that is possible to identify, you would just have a character replaced by another or a missing character. It would only be possible to spot that manually.


Sure you have data in one encoding and the table with another one. You can fix this within mysql. Check here


Don't do that! Use a trigger on update/insert if you really need to.

EDIT: hmm, after reading your situation, I would suggest making a backup copy of the DB and trying what you said. I think it would work, as long as you're not planning to ever do it again (which seems to be the case)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号