开发者

Haskell lexer problems

开发者 https://www.devze.com 2022-12-09 12:08 出处:网络
I\'m writing a lexer in haskell. Here\'s the code: lexer :: String -> [Token] lexer s s =~ whitespace :: Bool =

I'm writing a lexer in haskell. Here's the code:

lexer :: String -> [Token]
lexer s
    | s =~ whitespace :: Bool =
        let token = s =~ whitespace :: String in
            lex (drop (length token) s)
    | s =~ number :: Bool =
        let token = s =~ number :: String in
            Val (read token) : lex (drop (length token) s)
    | s =~ operator :: Bool =
        let token = s =~ operator :: String in
            Oper token : lex (drop (length token) s)
    | otherwise = error "unrecognized character"
    where
        whitespace = "^[ \t\n]"
        number = "^[0-9]*(\.[0-9]+)?"
        operator = "^[+-*/()]"

data Token = Val Int | Oper String

There are two problems I'm having. One, the number regex "^[0-9]*(\.[0-9]+)?" throws this error:

lexical error in string/character literal at character '['

And when I comment out the line containing it and the part of the function t开发者_StackOverflow中文版hat uses it, I get this error:

Couldn't match expected type `Token'
           against inferred type `(String, String)'
      Expected type: [Token]
      Inferred type: [(String, String)]
    In the expression: lex (drop (length token) s)
    In the expression:
        let token = s =~ whitespace :: String
        in lex (drop (length token) s)

I have no idea why I'm getting either of these errors. Can someone help me?


Backslashes are used as escape characters in string literals, like in "\n" for a string containing a newline. If you want a literal backslash you need to escape it as "\\". That's the problem in the regex "^[0-9]*(\.[0-9]+)?", the Haskell parser tries to interpret "\." as a normal string escape and chokes on it (probably because there is no such escape). If you write the regex as "^[0-9]*(\\.[0-9]+)?" the error goes away.

The reason for the type problem is that you call lex from the standard Prelude in lex (drop (length token) s), which has type String -> [(String, String)]. Probably you wanted to do a recursive call to your own function lexer instead...


Also, note that "^[0-9]*(\\.[0-9]+)?" matches an empty string or numbers like .12 (instead of 0.12), which you probably don't want. It is a serious problem, because it would cause your function to call itself infinitely. To fix that, change * to +.

0

精彩评论

暂无评论...
验证码 换一张
取 消