My web app is breaking when I try edit a certain content type and I'm pretty sure it is开发者_StackOverflow社区 because of some weird characters in my database. So when I do:
SELECT body FROM message WHERE id = 666
it returns:
<p>⢠<span></span></p><p><br /></p><p><em><strong>NOTE:</strong> Please remember to use your to participate in the discussion.</em></p>
However when I try to count how many documents have those characters postgres complains:
foo_450_prod=# SELECT COUNT(*) FROM message WHERE body LIKE'%â¢%';
ERROR: invalid byte sequence for encoding "UTF8": 0xe2a225
HINT: This error can also happen if the byte sequence does not match the encodi
Does anybody know what the issue is and how I can query for those funny characters?
Thanks in advance!
It appears that your SELECT
statement is being interpreted as ISO-8859-1 or windows-1252. In those encodings, 'â' == 0xE2, '¢' == 0xA2, and '%' == 0x25, which explains the 0xe2a225 byte sequence mentioned in the error message.
What's hard to figure out is why your first SELECT
is returning an â¢
to begin with. It's an unlikely character combination to use on purpose, but it's also not a typical case of UTF-8/windows-1252 mojibake because E2 A2 isn't valid UTF-8. It could be the first 2 bytes of a character, but that character would be a Braille dot pattern (U+2880 to U+28BF), which doesn't make sense there either.
there's already a long way between your DB and printing some data from it in your webpage : your DB encoding may be ok, but you're probably trying here to print something originally in UTF-8 in ISO-8859-1 (and not "funny" characters). do you have something like :
<meta content="text/html; charset=UTF-8" http-equiv="content-type" />
in the <head>
tag of your HTML page?
also, are you setting SET NAMES 'utf8'
when connecting to your DB?
精彩评论