开发者

Automatic character encoding handling in Perl / DBI / DBD::ODBC

开发者 https://www.devze.com 2023-03-03 23:17 出处:网络
I\'m us开发者_如何学JAVAing Perl with DBI / DBD::ODBC to retrieve data from an SQL Server database, and have some issues with character encoding.

I'm us开发者_如何学JAVAing Perl with DBI / DBD::ODBC to retrieve data from an SQL Server database, and have some issues with character encoding.

The database has a default collation of SQL_Latin1_General_CP1_CI_AS, so data in varchar columns is encoded in Microsoft's version of Latin-1, AKA windows-1252.

There doesn't seem to be a way to handle this transparently in DBI/DBD::ODBC. I get data back still encoded as windows-1252, for instance, € “ ” are encoded as bytes 0x80, 0x93 and 0x94. When I write those to an UTF-8 encoded XML file without decoding them first, they are written as Unicode characters 0x80, 0x93 and 0x94 instead of 0x20AC, 0x201C, 0x201D, which is obviously not correct.

My current workaround is to call $val = Encode::decode('windows-1252', $val) on every column after every fetch. This works, but hardly seems like the proper way to do this.

Isn't there a way to tell DBI or DBD::ODBC to do this conversion for me?

I'm using ActivePerl (5.12.2 Build 1202), with DBI (1.616) and DBD::ODBC (1.29) provided by ActivePerl and updated with ppm; running on the same server that hosts the database (SQL Server 2008 R2).

My connection string is:

dbi:ODBC:Driver={SQL Server Native Client 10.0};Server=localhost;Database=$DB_NAME;Trusted_Connection=yes;

Thanks in advance.


DBD::ODBC (and ODBC API) does not know the character set of the underlying column so DBD::ODBC cannot do anything with 8 bit data returned, it can only return it as it is and you need to know what it is and decode it. If you bind the columns as SQL_WCHAR/SQL_WVARCHAR the driver/sql_server should translate the characters to UCS2 and DBD::ODBC should see the columns as SQL_WCHAR/SQL_WVARCHAR. When DBD::ODBC is built in unicode mode SQL_WCHAR columns are treat as UCS2 and decoded and re-encoded in UTF-8 and Perl should see them as unicode characters.

You need to set SQL_WCHAR as the bind type after bind_columns as bind types are not sticky like parameter types.

If you want to continue reading your varchar data which windows 1252 as bytes then currently you have no choice but to decode them. I'm not in a rush to add something to DBD::ODBC to do this for you since this is the first time anyone has mentioned this to me. You might want to look at DBI callbacks as decoding the returned data might be more easily done in those (say the fetch method).

You might also want to investigate the "Perform Translation for character data" setting in newer SQL Server ODBC Drivers although I have little experience with it myself.

0

精彩评论

暂无评论...
验证码 换一张
取 消