开发者

Haskell: Parsing escape characters in single quotes

开发者 https://www.devze.com 2022-12-20 09:21 出处:网络
I\'m currently making a scanner for a basic compiler I\'m writing in Haskell. One of the requirements is that any character enclosed in single quotes (\') is translated into a character literal token

I'm currently making a scanner for a basic compiler I'm writing in Haskell. One of the requirements is that any character enclosed in single quotes (') is translated into a character literal token (type T_Char), and this includes escape sequences such as '\n' and '\t'. I've defined this part of the scanner function which works okay for most cases:

scanner ('\'':cs)       |   (length cs) == 0            =   error "Illegal character!"
                         |  head cs == '\\'             =   mkEscape (head (drop 1 cs)) : scanner (drop 3 cs)
                         |  head (drop 1 cs) == '\''    =   T_Char (head cs) : scanner (drop 2 cs)


                         where
                            mkEscape        :: Char -> Token
                            mkEscape 'n'    = T_Char '\n'
                            mkEscape 'r'    = T_Char '\r'
                            mkEscape 't'    = T_Char '\t'
                            mkEscape '\\'   = T_Char '\\'
                            mkEscape '\''   = T_Char '\''

However, this comes up when I run it in GHCi:

Main> scanner "abc '\\' def"
[T_Id "abc", T_Char '\'', T_Id "d开发者_JS百科ef"]

It can recognise everything else but gets escaped backslashes confused with escaped single quotes. Is this something to do with character encodings?


I don't think there's anything wrong with the parser regarding your problem. To Haskell, the string will be read as

abc '\' def

because Haskell also has string escapes. So when it reaches the first quotation mark, cs contains the char sequence \' def. Obviously head cs is a backslash, so it will run mkEscape.

The argument given is head (drop 1 cs), which is ', thus mkEscape will return T_Char '\'', which is what you saw.


Perhaps you should call

scanner "abc '\\\\' def"

The 1st level of \ is for the Haskell interpreter, and the 2nd level is for scanner.

0

精彩评论

暂无评论...
验证码 换一张
取 消