开发者

Haskell IO with non English characters

开发者 https://www.devze.com 2023-01-14 02:18 出处:网络
Look at this , i am try appendFile \"out\" $ show \'д\' \'д\'is character from Russian alphabet. After that \"out\" file contains:

Look at this , i am try

appendFile "out" $ show 'д'

'д' is character from Russian alphabet. After that "out" file contains:

'\1076'

How i understand is the unicode numeric code of character 'д'. Why is it happens ? And How i can to ge开发者_运维技巧t the normal representation of my character ?

For additional information it is works good:

appendFile "out"  "д"

Thanks.


show escapes all characters outside the ASCII range (and some inside the ASCII range), so don't use show.

Since "д" works fine, just use that. If you can't because the д is actually inside a variable, you can use [c] (where c is the variable containing the character. If you need to surround it by single quotes (like show does), you can use ['\'', c, '\''].


After reading your reply to my comment, I think your situation is that you have some data structure, maybe with type [(String,String)], and you'd like to output it for debugging purposes. Using show would be convienent, but it escapes non-ASCII characters.

The problem here isn't with the unicode, you need a function that will properly format your data for display. I don't think show is the right choice, in part because of the problems with escaping some characters. What you need is a type class like Show, but one that displays data for reading instead of escaping characters. That is, you need a pretty-printer, which is a library that provides functions to format data for display. There are several pretty-printers available on Hackage, I'd look at uulib or wl-pprint to start. I think either would be suitable without too much work.

Here's an example with the uulib tools. The Pretty type class is used instead of Show, the library comes with many useful instances.

import UU.PPrint

-- | Write each item to StdOut
logger :: Pretty a => a -> IO ()
logger x = putDoc $ pretty x <+> line

running this in ghci:

Prelude UU.PPrint> logger 'Д'
Д 
Prelude UU.PPrint> logger ('Д', "other text", 54)
(Д,other text,54) 
Prelude UU.PPrint> 

If you want to output to a file instead of the console, you can use the hPutDoc function to output to a handle. You could also call renderSimple to produce a SimpleDoc, then pattern match on the constructors to process output, but that's probably more trouble. Whatever you do, avoid show:

Prelude UU.PPrint> show $ pretty 'Д'
"\1044"

You could also write your own type class similar to show but formatted as you like it. The Text.Printf module can be helpful if you go this route.


Use Data.Text. It provides IO with locale-awareness and encoding support.


A quick web search for "UTF Haskell" should give you good links. Probably the most recommended package is the text package.

import Data.Text.IO as UTF
import Data.Text as T

main = UTF.appendFile "out"  (T.pack "д")


To display national characters by show, put in your code:

{-# LANGUAGE FlexibleInstances #-}

instance {-# OVERLAPPING #-} Show String where
    show = id

You can try then:

*Main> show "ł"
ł
*Main> show "ą"
ą
*Main> show "ę"
ę
*Main> show ['ę']
ę
*Main> show ["chleb", "masło"]
[chleb,masło]
*Main> data T = T String deriving (Show)
*Main> t = T "Chleb z masłem"
*Main> t
T Chleb z masłem
*Main> show t
T Chleb z masłem


There were no quotes in my previous solution. In addition, I put the code in the module now and the module must be imported into your program.

{-# LANGUAGE FlexibleInstances #-}

module M where

instance {-# OVERLAPPING #-} Show String where
    show x = ['"'] ++ x ++ ['"']

Information for beginners: remember that the show does not display anything. show converts data to string with additional formatting characters.

We can try in WinGHCi: automaticaly by WinGHCi

*M> "ł"
"ł"
*M> "ą"
"ą"
*M> "ę"
"ę"
*M> ['ę']
"ę"
*M> ["chleb", "masło"]
["chleb","masło"]
*M> data T = T String deriving (Show)
*M> t = T "Chleb z masłem"

or manualy

*M> (putStrLn . show) "ł"
"ł"
*M> (putStrLn . show) "ą"
"ą"
*M> (putStrLn . show) "ę"
"ę"
*M> (putStrLn . show) ['ę']
"ę"
*M> (putStrLn . show) ["chleb", "masło"]
["chleb","masło"]
*M> data T = T String deriving (Show)
*M> t = T "Chleb z masłem"
*M> (putStrLn . show) t
T "Chleb z masłem"

In code to display:

putStrLn "ł"
putStrLn "ą"
putStrLn "ę"
putStrLn "masło"
(putStrLn . show) ['ę']
(putStrLn . show) ["chleb", "masło"]
data T = T String deriving (Show)
t = T "Chleb z masłem"
(putStrLn . show) t

I'm adding tag "polskie znaki haskell" for Google.

0

精彩评论

暂无评论...
验证码 换一张
取 消