I have a MySQL database, set to use UTF-8.
In my database.yml, the database is set to utf8.
I am doing some HTML scraping and inserting into the MySQL database.
If I retrieve the HTML from the database in PHP, it correctly encodes all characters and produces fine input:
// code
$result = mysql_query("SELECT raw_html FROM pages WHERE id = 1");
echo mysql_result($result,0);
// output
Hawaiʻi.
And the output looks great. However, in rails, I get strange characters:
// code in the controller
@page = Page.find(params[:id])
// code in the view
<%= @page.raw_html %>
// output
Hawaiʻi
Is there somewhere else I need to force UTF-8? I've tried using the iconv library to no avail (unless I'm using it wrong).
UPDATE: I've reproduced the same problem when using the console. So:
Page.find(2).raw_html[91..94]
"Ê»"
The problem also occurs under the console (script/console) if that sheds any more light on the issue.
UPDATE 2: Okay, on further investigation I've realized I was doing something dumb. But it didn't fix it.
While the table was set to UTF8, the column was not. I've changed the column to be 'utf8_general_ci'. However (and this makes me think I'm screwing something basic up), this actually produces the correct result:
@raw_html = Iconv.conv('LATIN1','UTF-8',@page.raw_html[0..10000])
That comes out lovely. Unfortunately, if I run the whole page through, I get:
Iconv::IllegalSequence in PagesController#show
"€²18″N<"...
So there's some other funky stuff going on in there. Could it be that I still have it 'latin' encoded, even though I've explicitly set both the table and the column to UTF-8 (and repopulated the HTML) ? I'm currently using the mysql2 gem as well, per Jeffrey's suggestion.
UPDATE 3: To clarify, I'm getting console errors as well. This is the command:
开发者_JS百科Page.find(2).raw_html[91..94]
And this is the response:
"Ê»"
In your database.yml add encoding: utf8
to each of your environment setups.
You might to switch to mysql2 :)
Set it both your gem file and database.yml
adapter: mysql2
gem "mysql2"
That should save you a lot of trouble :)
Check that you have set the character encoding for the html page in your layout
If you are using HTML5, try adding this as the first line in your page
<meta charset="UTF-8">
For HTML 4, try adding this to the head section of the page
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
For XHTML pages, try
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
if you are serving with the text/html MIME type, and this
<?xml version="1.0" encoding="UTF-8"?>
as the very first line of the served file if its XHTML served as XML
精彩评论