开发者

Convert Between Latin1-encoded Data.ByteString and Data.Text

开发者 https://www.devze.com 2023-04-07 02:29 出处:网络
Since the latin-1 (aka I开发者_如何学编程SO-8859-1) character set is embedded in the Unicode character set as its lowest 256 code-points, I\'d expect the conversion to be trivial, but I didn\'t see an

Since the latin-1 (aka I开发者_如何学编程SO-8859-1) character set is embedded in the Unicode character set as its lowest 256 code-points, I'd expect the conversion to be trivial, but I didn't see any latin-1 encoding conversion functions in Data.Text.Encoding which contains only conversion functions for the common UTF encodings.

What's the recommended and/or efficient way to convert between Data.ByteString values encoded in latin-1 representation and Data.Text values?


The answer is right at the top of the page you linked:

To gain access to a much larger family of encodings, use the text-icu package: http://hackage.haskell.org/package/text-icu

A quick GHCi example:

λ> import Data.Text.ICU.Convert
λ> conv <- open "ISO-8859-1" Nothing
λ> Data.Text.IO.putStrLn $ toUnicode conv $ Data.ByteString.pack [198, 216, 197]
ÆØÅ
λ> Data.ByteString.unpack $ fromUnicode conv $ Data.Text.pack "ÆØÅ"
[198,216,197]

However, as you pointed out, in the specific case of latin-1, the code points coincide with Unicode, so you can use pack/unpack from Data.ByteString.Char8 to perform the trivial mapping from latin-1 from/to String, which you can then convert to Text using the corresponding pack/unpack from Data.Text.

0

精彩评论

暂无评论...
验证码 换一张
取 消