I'm currently making a scanner for a basic compiler I'm writing in Haskell. One of the requirements is that any character enclosed in single quotes (') is translated into a character literal token (type T_Char), and this includes escape sequences such as '\n' and '\t'. I've defined this part of the scanner function which works okay for most cases:
scanner ('\'':cs) | (length cs) == 0 = error "Illegal character!"
| head cs == '\\' = mkEscape (head (drop 1 cs)) : scanner (drop 3 cs)
| head (drop 1 cs) == '\'' = T_Char (head cs) : scanner (drop 2 cs)
where
mkEscape :: Char -> Token
mkEscape 'n' = T_Char '\n'
mkEscape 'r' = T_Char '\r'
mkEscape 't' = T_Char '\t'
mkEscape '\\' = T_Char '\\'
mkEscape '\'' = T_Char '\''
However, this comes up when I run it in GHCi:
Main> scanner "abc '\\' def"
[T_Id "abc", T_Char '\'', T_Id "d开发者_JS百科ef"]
It can recognise everything else but gets escaped backslashes confused with escaped single quotes. Is this something to do with character encodings?
I don't think there's anything wrong with the parser regarding your problem. To Haskell, the string will be read as
abc '\' def
because Haskell also has string escapes. So when it reaches the first quotation mark, cs
contains the char sequence \' def
. Obviously head cs
is a backslash, so it will run mkEscape
.
The argument given is head (drop 1 cs)
, which is '
, thus mkEscape
will return T_Char '\''
, which is what you saw.
Perhaps you should call
scanner "abc '\\\\' def"
The 1st level of \
is for the Haskell interpreter, and the 2nd level is for scanner
.
精彩评论