开发者

IndentParser example

开发者 https://www.devze.com 2023-02-09 06:22 出处:网络
Could someone please post a small example of IndentParser usage? I am looking to parse YAML-like input like the following:

Could someone please post a small example of IndentParser usage? I am looking to parse YAML-like input like the following:

fruits:
    apples: yummy
    watermelons: not so yummy

vegetables:
    carrots: are orange
    celery r开发者_StackOverflow中文版aw: good for the jaw

I know there is a YAML package. I would like to learn the usage of IndentParser.


I've sketched out a parser below, for your problem you probably only need the block parser from IndentParser. Note I haven't tried to run it so it might have elementary errors.

The biggest problem for your parser is not really indenting, but that you only have strings and colon as tokens. You might find the code below takes quite a bit of debugging as it will have to be very sensitive about not consuming too much input, though I have tried to be careful about left-factoring. Because you only have two tokens there isn't much benefit you can get from Parsec's Token module.

Note that there is a strange truth to parsing that simple looking formats are often not simple to parse. For learning, writing a parser for simple expressions will teach you much more that an more-or-less arbitrary text format (that might only cause you frustration).

data DefinitionTree = Nested String [DefinitionTree]
                    | Def String String
  deriving (Show)


-- Note - this might need some testing.
--
-- This is a tricky one, the parser has to parse trailing 
-- spaces and tabs but not a new line.
--
category :: IndentCharParser st String
category = do 
    { a <- body 
    ; rest 
    ; return a
    } 
  where
    body = manyTill1 (letter <|> space) (char ':') 
    rest = many (oneOf [' ', '\t'])

-- Because the DefinitionTree data type has two quite 
-- different constructors, both sharing the same prefix
-- 'category' this combinator is a bit more complicated
-- than usual, and has to use an Either type to descriminate
-- between the options. 
-- 
definition :: IndentCharParser st DefinitionTree
definition = do 
    { a <- category
    ; b <- (textL <|> definitionsR)
    ; case b of
        Left ss -> return (Def a ss)
        Right ds -> return (Nested a ds)
    }

-- Note this should parse a string *provided* it is on 
-- the same line as the category.
--
-- However you might find this assumption needs verifying...
--
textL :: IndentCharParser st (Either DefinitionTrees a)
textL = do 
    { ss <- manyTill1 anyChar "\n" 
    ; return (Left ss)
    }

-- Finally this one uses an indent parser.
--
definitionsR :: IndentCharParser st (Either a [DefinitionTree]) 
definitionsR = block body 
  where 
    body = do { a <- many1 definition; return (Right a) }
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号