Would anyone know of a reliable method (with mySQL or otherwise) to select rows in a database that contain Japanese characters? I have a lot of rows in my开发者_Go百科 database, some of which only have alphanumeric characters, some of which have Japanese characters.
Rules when you have problem with character sets:
While creating database use utf8 encoding:
CREATE DATABASE _test DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
Make sure all text fields (varchar and text) are using UTF-8:
CREATE TABLE _test.test ( id INT NOT NULL AUTO_INCREMENT, name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, PRIMARY KEY (`id`) ) ENGINE = MyISAM;
When you make a connection do this before you query/update the database:
SET NAMES utf8;
With phpMyAdmin - Choose UTF-8 when you login.
set web page encoding to utf-8 to make sure all post/get data will be in UTF-8 (or you'll have to since converting is painful..). PHP code (first line in the php file or at least before any output):
header('Content-Type: text/html; charset=UTF-8');
Make sure all your queries are written in UTF8 encoding. If using PHP:
6.1. If PHP supports code in UTF-8 - just write your files in UTF-8.
6.2. If php is compiled without UTF-8 support - convert your strings to UTF-8 like this:
$str = mb_convert_encoding($str, 'UTF-8', '<put your file encoding here');
$query = 'SELECT * FROM test WHERE name = "' . $str . '"';
That should make it work.
Following on to the helpful answer NickSoft, i had to set the encoding on the db connection to get it to work.
&characterEncoding=UTF8
Then the SET NAMES utf8; seemed to be redundant
As teneff stated, just use SELECT
.
When installing MySQL, use UTF-8 as charset. Then, choosing utf8_general_ci
as collation should do the work.
As Frosty stated, just use SELECT.
Look up the lowest and highest valued Japanese characters in the Unicode charts at http://www.unicode.org/roadmaps/bmp/ and use REGEXP. It may use several different regions of characters to get the whole Japanese character set. As long as you use the UTF-8 charset and utf8_general_ci collation, you should be able to use a REGEXP '[a-gk-nt-z]' where a-g represents one range of Unicode characters from the charts, k-n represents another range, etc.
There is limited number of japanese characters. You can search for these using
SELECT ... LIKE '%カ%'
Alternatively you can try their hexadecimal denomination -
SELECT ...LIKE CONCAT('%',CHAR(0x30ab),'%')
You may find useful this UTF-8 Japanese subset http://www.utf8-chartable.de/unicode-utf8-table.pl?start=12448
Supposing you're using UTF-8 character set for fields, queries, results...
精彩评论