I have some text in UTF-8. I put it into a MySQL database, collation utf8_general_ci
and then I've been auto-posting it to Twitter via Net::Twitter.
But when I post it, even though Twitter itself seems to be expecting UTF-8, going by the content-type 开发者_JAVA百科in their input pages, I'm getting those artefacts you get when UTF-8 text is misinterpreted: é comes out as é for instance.
So ... at what point is this going wrong? How can I ensure it makes the trip undamaged?
- Set my script to treat all text as UTF-8 somehow?
- Make sure I extract it from the database in UTF-8?
- Tell Net::Twitter that it's posting in UTF-8?
You probably need to enable the mysql_enable_utf8
attribute when opening your db connection:
my $dbh = DBI->connect("DBI:mysql:database=test;host=localhost",
"user", "password",
{ mysql_enable_utf8 => 1});
This will tell Perl that strings retrieved from the database are UTF-8 encoded.
My guess would be the encoding of the database connection, which often is iso-8859-1
by default. That would explain the é
- it's a two-byte UTF-8 character displayed in single-byte iso-8859-1.
Does sending a query with SET NAMES utf8;
after connecting help? (Or whatever specific command Perl's mySQL client library might have for setting the connection character set.)
I found the answer here.
Instead of
$r = $nt->update ( { 'status' => $message } );
Try
use Encode;
$r = $nt->update ( { 'status' => decode( 'utf-8' , $message ) } ) ;
精彩评论