I'm writing a lexer in haskell. Here's the code:
lexer :: String -> [Token]
lexer s
| s =~ whitespace :: Bool =
let token = s =~ whitespace :: String in
lex (drop (length token) s)
| s =~ number :: Bool =
let token = s =~ number :: String in
Val (read token) : lex (drop (length token) s)
| s =~ operator :: Bool =
let token = s =~ operator :: String in
Oper token : lex (drop (length token) s)
| otherwise = error "unrecognized character"
where
whitespace = "^[ \t\n]"
number = "^[0-9]*(\.[0-9]+)?"
operator = "^[+-*/()]"
data Token = Val Int | Oper String
There are two problems I'm having. One, the number regex "^[0-9]*(\.[0-9]+)?"
throws this error:
lexical error in string/character literal at character '['
And when I comment out the line containing it and the part of the function t开发者_StackOverflow中文版hat uses it, I get this error:
Couldn't match expected type `Token' against inferred type `(String, String)' Expected type: [Token] Inferred type: [(String, String)] In the expression: lex (drop (length token) s) In the expression: let token = s =~ whitespace :: String in lex (drop (length token) s)
I have no idea why I'm getting either of these errors. Can someone help me?
Backslashes are used as escape characters in string literals, like in "\n"
for a string containing a newline. If you want a literal backslash you need to escape it as "\\"
.
That's the problem in the regex "^[0-9]*(\.[0-9]+)?"
, the Haskell parser tries to interpret "\."
as a normal string escape and chokes on it (probably because there is no such escape). If you write the regex as "^[0-9]*(\\.[0-9]+)?"
the error goes away.
The reason for the type problem is that you call lex
from the standard Prelude in lex (drop (length token) s)
, which has type String -> [(String, String)]
. Probably you wanted to do a recursive call to your own function lexer
instead...
Also, note that "^[0-9]*(\\.[0-9]+)?"
matches an empty string or numbers like .12 (instead of 0.12), which you probably don't want. It is a serious problem, because it would cause your function to call itself infinitely. To fix that, change * to +.
精彩评论