开发者

MS SQL: How to convert two-bytes nvarchar to string?

开发者 https://www.devze.com 2023-03-11 09:50 出处:网络
I have nvarchar(4000) field containing data like this: D0B6D181D0B5D0B4D0BA35D0BC (cyrillic string) E59EA0E开发者_C百科78999E79B98E99499 (chinese string)

I have nvarchar(4000) field containing data like this:

D0B6D181D0B5D0B4D0BA35D0BC (cyrillic string)
E59EA0E开发者_C百科78999E79B98E99499 (chinese string)
...

Each character is presented by two bytes in data sequence. How to convert this data to string using T-SQL?


Your first example is UTF8-encoded Cyrillic text that's been converted to a hexadecimal string and stored in an nvarchar(4000) field in SQL Server. That's quite an odd combination. Strangely, SQL Server has no native support for converting UTF8 to nvarchar in TSQL. You can roll your own DecodeUTF8 function, or you can use mine below.

Your examples:

select
  Cyrillic = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'D0B6D181D0B5D0B4D0BA35D0BC', 1))
  , Chinese = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'E59EA0E78999E79B98E99499', 1))

Output:

Cyrillic  Chinese 
жседк5м   垠牙盘错

My UTF8 decoder for TSQL:

create function [dbo].[DecodeUTF8](@utf8 varchar(max)) returns nvarchar(max)
as
begin
    declare @xml xml;

    with e2(n) as (select top(16) 0 from (values(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e(n))
    , e3(n) as (select top(256) 0 from e2, e2 e)
    , e4(n) as (select top(65536) 0 from e3, e3 e)
    , e5(n) as (select top(power(2.,31)-1) row_number() over (order by(select 0)) from e4, e4 e)
    , numbers(i) as (select top(datalength(@utf8)) row_number() over (order by(select 0)) from e5)
    , x as (
        select *
        from numbers
        cross apply (select byte = convert(tinyint, convert(binary(1), substring(@utf8, i, 1)))) c
        cross apply (select n = floor(log(~(byte) * 2 + 1, 2)) - 1) d
        cross apply (select bytes = case when n in (5,4,3) then 7 - n else 1 end) e
        cross apply (select data = byte % power(2, n)) f
    )
    select @xml =
    (
        select nchar(case x.bytes
            when 1 then x.data
            when 2 then power(2, 6) * x.data + x2.data 
            when 3 then power(2, 6*2) * x.data + power(2, 6) * x2.data + x3.data
            when 4 then power(2, 6*3) * x.data + power(2, 6*2) * x2.data + power(2, 6) * x3.data + x4.data
          end)
        from x
        left join x x2 on x2.i = x.i + 1 and x.bytes > 1
        left join x x3 on x3.i = x.i + 2 and x.bytes > 2
        left join x x4 on x4.i = x.i + 3 and x.bytes > 3
        where x.n <> 6
        order by x.i
        for xml path('')
    );

    return @xml.value('.', 'nvarchar(max)');
end


Take a look at this

http://devio.wordpress.com/2009/07/11/convert-unicode-hex-codepoint-to-unicode-character-in-sql-server/

It looks like they are doing single byte codepoints, so you may need to modify it a bit

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号