I did the following things:
- I have a spreadsheet with data. One of the rows has a ü character in it.
- I save this as a CSV file in OpenOffice.org. When it asks me for a character encoding, I choose UTF-8.
- I use Navicat to create a MySQL database table, In开发者_运维知识库noDB with UTF-8 utf8_general encoding and import the CSV.
- I try to use PHP function
htmlspecialchars($string, ENT_COMPAT, 'UTF-8')
where$string
is the string containing the special ü character.
It gives me an error: Invalid multibyte sequence in argument. When I change 'UTF-8'
with 'ISO8859-1'
, no error is thrown, but the incorrect character is shown. (The 'unknown character' character, looks like <?>
)
If I use an HTML form to update the string in the database, the error disappears and the character is displayed correctly, however, when I then look at the record in Navicat, it looks two characters:
[1/4][A with some thing on top of it]
Some multibyte that isn't seen as one character.`
What is going on, where are things going wrong, and what can I do about it?
Although I don't understand where the "invalid multibyte" error comes from, I'm pretty sure htmlspecialchars()
is not your culprit:
For the purposes of this function, the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, as the characters affected by htmlspecialchars() occupy the same positions in all of these charsets.
In my understanding, htmlspecialchars()
should work fine for a UTF-8 string without specifying a character set. My bet would be that either the HTML page containing the form, or the database connection you use is not UTF-8 encoded. For the latter, try sending a
SET NAMES utf8;
to mySQL before doing the insert.
精彩评论