开发者

UTF-8 encoding in a rails model

开发者 https://www.devze.com 2023-01-31 06:53 出处:网络
I have a MySQL database, set to use UTF-8. In my database.yml, the database is set to utf8. I am doing some HTML scraping and inserting into the MySQL database.

I have a MySQL database, set to use UTF-8.

In my database.yml, the database is set to utf8.

I am doing some HTML scraping and inserting into the MySQL database.

If I retrieve the HTML from the database in PHP, it correctly encodes all characters and produces fine input:

// code
$result = mysql_query("SELECT raw_html FROM pages WHERE id = 1");
echo mysql_result($result,0);

// output
Hawaiʻi.

And the output looks great. However, in rails, I get strange characters:

// code in the controller
@page = Page.find(params[:id])

// code in the view
<%= @page.raw_html %>

// output
Hawaiʻi

Is there somewhere else I need to force UTF-8? I've tried using the iconv library to no avail (unless I'm using it wrong).

UPDATE: I've reproduced the same problem when using the console. So:

Page.find(2).raw_html[91..94]

"Ê»"

The problem also occurs under the console (script/console) if that sheds any more light on the issue.

UPDATE 2: Okay, on further investigation I've realized I was doing something dumb. But it didn't fix it.

While the table was set to UTF8, the column was not. I've changed the column to be 'utf8_general_ci'. However (and this makes me think I'm screwing something basic up), this actually produces the correct result:

@raw_html = Iconv.conv('LATIN1','UTF-8',@page.raw_html[0..10000])

That comes out lovely. Unfortunately, if I run the whole page through, I get:

Iconv::IllegalSequence in PagesController#show 
"€²18″N<"...

So there's some other funky stuff going on in there. Could it be that I still have it 'latin' encoded, even though I've explicitly set both the table and the column to UTF-8 (and repopulated the HTML) ? I'm currently using the mysql2 gem as well, per Jeffrey's suggestion.

UPDATE 3: To clarify, I'm getting console errors as well. This is the command:

开发者_JS百科
Page.find(2).raw_html[91..94]

And this is the response:

"Ê»"


In your database.yml add encoding: utf8 to each of your environment setups.


You might to switch to mysql2 :)

Set it both your gem file and database.yml

adapter: mysql2

gem "mysql2"

That should save you a lot of trouble :)


Check that you have set the character encoding for the html page in your layout

If you are using HTML5, try adding this as the first line in your page

<meta charset="UTF-8">

For HTML 4, try adding this to the head section of the page

<meta http-equiv="Content-type" content="text/html;charset=UTF-8">

For XHTML pages, try

<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />

if you are serving with the text/html MIME type, and this

<?xml version="1.0" encoding="UTF-8"?>

as the very first line of the served file if its XHTML served as XML

0

精彩评论

暂无评论...
验证码 换一张
取 消