Is there any way in SQL Server of determining what a character in a code page would represent without actually creating a test database of that collati开发者_运维知识库on?
Example. If I create a test database with collation SQL_Ukrainian_CP1251_CS_AS
and then do CHAR(255)
it returns я
.
If I try the following on a database with SQL_Latin1_General_CP1_CS_AS
collation however
SELECT CHAR(255) COLLATE SQL_Ukrainian_CP1251_CS_AS
It returns y
SELECT CHAR(255)
Returns ÿ
so it is obviously going first via the database's default collation then trying to find the closest equivalent to that in the explicit collation. Can this be avoided?
Actually I have found an answer to my question now. A bit clunky but does the job unless there's a better way out there?
SET NOCOUNT ON;
CREATE TABLE #Collations
(
code TINYINT PRIMARY KEY
);
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1), --2
E02(N) AS (SELECT 1 FROM E00 a, E00 b), --4
E04(N) AS (SELECT 1 FROM E02 a, E02 b), --16
E08(N) AS (SELECT 1 FROM E04 a, E04 b) --256
INSERT INTO #Collations
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1
FROM E08
DECLARE @AlterScript NVARCHAR(MAX) = ''
SELECT @AlterScript = @AlterScript + '
RAISERROR(''Processing' + name + ''',0,1) WITH NOWAIT;
ALTER TABLE #Collations ADD ' + name + ' CHAR(1) COLLATE ' + name + ';
EXEC(''UPDATE #Collations SET ' + name + '=CAST(code AS BINARY(1))'');
EXEC(''UPDATE #Collations SET ' + name + '=NULL WHERE ASCII(' + name + ') <> code'');
'
FROM sys.fn_helpcollations()
WHERE name LIKE '%CS_AS'
AND name NOT IN /*Unicode Only Collations*/
( 'Assamese_100_CS_AS', 'Bengali_100_CS_AS',
'Divehi_90_CS_AS', 'Divehi_100_CS_AS' ,
'Indic_General_90_CS_AS', 'Indic_General_100_CS_AS',
'Khmer_100_CS_AS', 'Lao_100_CS_AS',
'Maltese_100_CS_AS', 'Maori_100_CS_AS',
'Nepali_100_CS_AS', 'Pashto_100_CS_AS',
'Syriac_90_CS_AS', 'Syriac_100_CS_AS',
'Tibetan_100_CS_AS' )
EXEC (@AlterScript)
SELECT * FROM #Collations
DROP TABLE #Collations
While MS SQL supports both code pages and Unicode unhelpfully it doesn't provide any functions to convert between the two so figuring out what character is represented by a value in a different code page is a pig.
There are two potential methods I've seen to handle conversions, one is detailed here http://www.codeguru.com/cpp/data/data-misc/values/article.php/c4571 and involves bolting a custom conversion program onto the database and using that for conversions.
The other is to construct a db table consisting of
[CodePage], [ANSI Value], [UnicodeValue]
with the unicode value stored as either the int representing the unicode character to be converted using nchar()
or the nchar itself
Your using the collation SQL_Ukrainian_CP1251_CS_AS
which is code page 1251 (CP1251 from the centre of the string). You can grab its translation table here http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
Its a TSV so after trimming the top off the raw data should import fairly cleanly.
Personally I'd lean more towards the latter than the former especially for a production server as the former may introduce instability.
精彩评论