开发者

Parsing in Haskell for a simple interpreter

开发者 https://www.devze.com 2023-01-23 13:48 出处:网络
I\'m relatively new to Haskell with main programming background coming from OO languages. I am trying to write an interpreter with a parser for a simple programming language. 开发者_StackOverflow中文版

I'm relatively new to Haskell with main programming background coming from OO languages. I am trying to write an interpreter with a parser for a simple programming language. 开发者_StackOverflow中文版So far I have the interpreter at a state which I am reasonably happy with, but am struggling slightly with the parser.

Here is the piece of code which I am having problems with

data IntExp
 = IVar Var
 | ICon Int
 | Add IntExp IntExp
 deriving (Read, Show)

whitespace = many1 (char ' ')

parseICon :: Parser IntExp
parseICon =
  do x <- many (digit)
     return (ICon (read x :: Int))

parseIVar :: Parser IntExp
parseIVar = 
  do x <- many (letter)
     prime <- string "'" <|> string ""
     return (IVar (x ++ prime))

parseIntExp :: Parser IntExp
parseIntExp =
  do x <- try(parseICon)<|>try(parseIVar)<|>parseAdd
     return x

parseAdd :: Parser IntExp
parseAdd =
  do x <- parseIntExp   
     whitespace
     string "+"
     whitespace
     y <- parseIntExp
     return (Add x y)

runP :: Show a => Parser a -> String -> IO ()
runP p input
  = case parse p "" input of
      Left err ->
        do putStr "parse error at "
           print err
      Right x -> print x

The language is slightly more complex, but this is enough to show my problem.

So in the type IntExp ICon is a constant and IVar is a variable, but now onto the problem. This for example runs successfully

runP parseAdd "5 + 5"

which gives (Add (ICon 5) (ICon 5)), which is the expected result. The problem arises when using IVars rather than ICons eg

runP parseAdd "n + m"

This causes the program to error out saying there was an unexpected "n" where a digit was expected. This leads me to believe that parseIntExp isn't working as I intended. My intention was that it will try to parse an ICon, if that fails then try to parse an IVar and so on.

So I either think the problem exists in parseIntExp, or that I am missing something in parseIVar and parseICon.

I hope I've given enough info about my problem and I was clear enough.

Thanks for any help you can give me!


Your problem is actually in parseICon:

parseICon =
  do x <- many (digit)
     return (ICon (read x :: Int))

The many combinator matches zero or more occurrences, so it's succeeding on "m" by matching zero digits, then probably dying when read fails.


And while I'm at it, since you're new to Haskell, here's some unsolicited advice:

  • Don't use spurious parentheses. many (digit) should just be many digit. Parentheses here just group things, they're not necessary for function application.

  • You don't need to do ICon (read x :: Int). The data constructor ICon can only take an Int, so the compiler can figure out what you meant on its own.

  • You don't need try around the first two options in parseIntExp as it stands--there's no input that would result in either one consuming some input before failing. They'll either fail immediately (which doesn't need try) or they'll succeed after matching a single character.

  • It's usually a better idea to tokenize first before parsing. Dealing with whitespace at the same time as syntax is a headache.

  • It's common in Haskell to use the ($) operator to avoid parentheses. It's just function application, but with very low precedence, so that something like many1 (char ' ') can be written many1 $ char ' '.

Also, doing this sort of thing is redundant and unnecessary:

parseICon :: Parser IntExp
parseICon =
  do x <- many digit
     return (ICon (read x))

When all you're doing is applying a regular function to the result of a parser, you can just use fmap:

parseICon :: Parser IntExp
parseICon = fmap (ICon . read) (many digit)

They're the exact same thing. You can make things look even nicer if you import the Control.Applicative module, which gives you an operator version of fmap, called (<$>), as well as another operator (<*>) that lets you do the same thing with functions of multiple arguments. There's also operators (<*) and (*>) that discard the right or left values, respectively, which in this case lets you parse something while discarding the result, e.g., whitespace and such.

Here's a lightly modified version of your code with some of the above suggestions applied and some other minor stylistic tweaks:

whitespace = many1 $ char ' '

parseICon :: Parser IntExp
parseICon = ICon . read <$> many1 digit

parseIVar :: Parser IntExp
parseIVar = IVar <$> parseVarName

parseVarName :: Parser String
parseVarName = (++) <$> many1 letter <*> parsePrime

parsePrime :: Parser String
parsePrime = option "" $ string "'"

parseIntExp :: Parser IntExp
parseIntExp = parseICon <|> parseIVar <|> parseAdd

parsePlusWithSpaces :: Parser ()
parsePlusWithSpaces = whitespace *> string "+" *> whitespace *> pure ()

parseAdd :: Parser IntExp
parseAdd = Add <$> parseIntExp <* parsePlusWithSpaces <*> parseIntExp


I'm also new to Haskell, just wondering:

will parseIntExp ever make it to parseAdd?

It seems like ICon or IVar will always get parsed before reaching 'parseAdd'.

e.g. runP parseIntExp "3 + m"

would try parseICon, and succeed, giving

(ICon 3) instead of (Add (ICon 3) (IVar m))

Sorry if I'm being stupid here, I'm just unsure.

0

精彩评论

暂无评论...
验证码 换一张
取 消