开发者

MySQL data does not appear to be in the character encoding it should be in

开发者 https://www.devze.com 2023-04-06 17:45 出处:网络
First off, thanks to whomever is reading this. I have a very strange problem with character encoding in a MySQL database that I am using PHP\'s PDO to interface with. The tables are all encoded using

First off, thanks to whomever is reading this. I have a very strange problem with character encoding in a MySQL database that I am using PHP's PDO to interface with. The tables are all encoded using UTF8, the webapp uses utf-8, but it seems that the data stored in the database is not actually utf-8 but latin-1.

Things had been working fine for quite some time, but this is causing issues with when importing utf-8 encoded data files or conducting fulltext searches that contain special characters such as "é" or "ë".

EDIT:

some replies have suggested that this is an issue with my terminal. It is not:

foreach($dbh->query("SELECT c FROM t") as $row){
    echo $row['c'] ."\n";
    echo urlencode($row['c'])."\n";
}
$dbh->exec("SET NAMES 'latin1'");
foreach($dbh->query("SELECT c FROM t") as $row){
    echo $row['c'] ."\n";
    echo urlencode($row['c'])."\n";
}
$dbh->exec("SET NAMES 'utf8'");
foreach($dbh->query("SELECT c FROM t") as $row){
    echo $row['c'] ."\n";
    echo urlencode($row['c'])."\n";
}

Outputs the following:

é
%C3%A9f
é
%C3%A9f
é
%C3%83%C2%A9f

Thanks to everyone so far.

END EDIT

So, first I check that the tables are working as they should:

USE information_s开发者_开发技巧chema;

mysql> SELECT table_collation FROM tables WHERE table_schema="mydb" and table_name="mytable";
+-----------------+
| table_collation |
+-----------------+
| utf8_general_ci |
+-----------------+
1 row in set (0.00 sec)

mysql> SELECT character_set_name,collation_name FROM information_schema.columns WHERE table_schema="mydb" and table_name="t" and column_name="c";
+--------------------+-----------------+
| character_set_name | collation_name  |
+--------------------+-----------------+
| utf8               | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)

However, the data does not appear to be utf-8, but latin-1:

mysql> use mydb;
Database changed

mysql> SET NAMES 'latin1';
Query OK, 0 rows affected (0.00 sec)

mysql> select c from t;
+---+
| c |
+---+
| é |
+---+
1 row in set (0.00 sec)

mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)

mysql> select c from t;
+----+
| c  |
+----+
| é |
+----+
1 row in set (0.00 sec)

So I have two questions:

1) Most importantly, what can I do about the data already in the DB?

2) Is there a way to set up the DB so that it actually uses utf-8 when I connect or do I have to execute the SET NAMES query every time?

Many thanks for your time and help,

Matt


it's your terminal appeared to be in latin1, not data :)

Is there a way to set up the DB so that it actually uses utf-8

you have set it up already.
only thing you need is to set up client encoding, which is done by SET NAMES 'utf8'

in fact, by using SET NAMES, you can make your data appear in whatever encoding you set. that's the only purpose of the SET NAMES magic word.

in case you have some encoding issues not covered in this question,

what can I do about the data already in the DB?

whatever you wish, as long as your db returns no ? marks.
to recover your data you have to set names to the data encoding set for the table. this will prevent mysql from recoding the data. so, you can fetch or dump it and then load it again with proper settings.

EDIT

after some consideration, i am going to say that your data is in utf8 while table encoding is set to latin1 somehow.

%C3%A9 is a perfectly valid utf-8 representation of the é character. (dunno where did you get trailing f though)

while %C3%83%C2%A9 is utf-f encoded version of %C3%A9. So, it seems your database thinks that your data is in latin1 and encode it in utf8.

so, when you set names to latin1 it don't mind and don't recode.

Conclusion:

  1. doublecheck your table (and fields) encoding. it should be latin1

  2. yes, to save your data you have to do something like

***God I HATE this autoformatting issue which prevents me from posting the code right after list item

mysqldump --default_charset=latin1 mydb > mydb.sql

then check this dump and change every appearance of latin1 to utf8.
then load it back.

do not forget to backup your data first!


In case you received data from a HTML page and saved this to your DB, don't forget to set the correct text encoding in the HTML's head section:

  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>

If you don't set the text encoding, browsers may return text differently encoded.

Regarding 1): Have a look at PHP's

string utf8_decode ( string $data )

function described here. Retrieve records from your DB, transcode to the desired encoding and write it back.


use

SET character_set_client = "UTF-8";

when connecting with your client. in PHP you will achieve this by using function:

set_charset($encoding)

before you do any actual data insert / update


  1. Connect to your Database

  2. Set connection to UTF-8

    SET NAMES 'utf8';

In your HTML files:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
0

精彩评论

暂无评论...
验证码 换一张
取 消