Unfortunately I'm using a MSSQL-based data source and attempting to integrate it into a custom Drupal Module being written in PHP. My issue is that no matter what sort of wrapper function I use, I CANNOT get apostrophes to appear correctly on the page. They all turn into question marks. In addition, emdashes do the same thing.
I know this is an encoding issue. The page is encoded in UTF-8, but the database is encoded in SQL_Latin1_General_CP1_CI_AS. I have no control over the database 开发者_如何学JAVAstructure and it cannot be modified. I do not have the option to change all the values in the database.
How can I access this data in uncorrupted form or at least get PHP to spit it out properly?
I have tried, without success: utf_encode utf_decode html_entities iconv several custom coded str_replace functions MSSQL doesn't have a SET NAMES function
Help!
Have you tried explicitly casting the output? For example:
select col1 COLLATE Latin1_General_100_CI_AS from table1
According to the Collation and Unicode Support page on MSDN, Unicode 5.0 is supported, though you may need to force the use of the newer *_100 collations to take advantage of the new features. Another page claims that SQL Server doesn't support UTF-8, but UTF-16 IS supported.
You can peruse the entire list of supported collations with a built-in TVF:
select * from fn_helpcollations()
As you're using an older version of SQL Server which doesn't support the new collations, have you tried to cast that data out as a NVARCHAR?
For example:
SELECT CONVERT(NVARCHAR(MAX), col1) FROM table1
There is an MSDN page on Managing Data Conversion Between Client/Server Code Pages which provides some generic information. In general, the recommendation seems to center around modifying either the specifics of the connection or the database structure (which you said is not possible given current limitations). Specifically,
The best choice for a code page-specific server is to communicate only with the clients using the same code page. The second-best choice is to use another code page that has almost the same character set. [...] If you must communicate with clients using different code pages, the supported solution is to store your data in Unicode columns. If any one of these options is not feasible, the other alternative is to store the data in binary columns using the binary, varbinary, or varbinary(max) data types. However, binary data can only be sorted and compared in binary order. This makes it less flexible than character data.
精彩评论