开发者

Unexpected end of input in parsec

开发者 https://www.devze.com 2023-03-11 12:25 出处:网络
I want to parse a file like this: 开发者_C百科66:3 3:4 329:2 101:3 495:4 55:5 268:5 267:2 242:4 262:1

I want to parse a file like this:

开发者_C百科66:3 3:4
329:2 
101:3 
495:4 
55:5 
268:5 
267:2 
242:4 
262:1 
861:1 

My code is like the following:

getTestData :: String -> IO [[(Int, Int)]]
getTestData name = do
    --res <- parseFromFile testData (name ++ ".test")
    fc <- readFile (name ++ ".test")
    let res = parse testData "test data" fc
    case res of
        Left e -> error $ show e-- "test data parse eror."
        Right ts -> return ts

eol = char '\n'
testData = endBy line eol
--testData = many line
testTuple = do
    i <- natural
    colon
    r <- natural
    return (fromIntegral i:: Int, fromIntegral r:: Int)

line = sepBy testTuple whiteSpace

But when run, it throw an exception:

ts <- getTestData "data" 
*** Exception: "test data" (line 11, column 1):
unexpected end of input
expecting natural or "\n"

I don't understand, why it said line 11, when my data.test file only has 10 lines. So I failed to fix this problem after several tries.


My best guess is that whiteSpace in line is consuming the newlines. So your whole file is being parsed by a single line parser, and the eol parser never gets a chance to get its hands on a "\n". Try replacing whiteSpace with many (char ' ') and see if that helps.


This is a working implementation using primitive char parsers rather than token parsers. Note - it's more robust not to use whitespace as a separator, but to drop it if it exists. The bits where I've used one line do-notation are a lot neater if you use (<*) from Applicative.

{-# OPTIONS -Wall #-}

module ParsecWhite where

import Text.ParserCombinators.Parsec

import Data.Char

main = getTestData "sample"

getTestData :: String -> IO [[(Int, Int)]]
getTestData name = do
    --res <- parseFromFile testData (name ++ ".test")
    fc <- readFile (name ++ ".test")
    let res = parse testData "test data" fc
    case res of
        Left e -> error $ show e -- "test data parse eror."
        Right ts -> return ts

testData :: Parser [[(Int,Int)]]
testData = input


input :: Parser [[(Int,Int)]]
input = many (do { a <- line; newline; return a })
     <?> "input"

line :: Parser [(Int,Int)]
line = many (do { a <- testTuple; softWhite; return a})  <?> "line"

testTuple :: Parser (Int,Int)
testTuple = do
    i <- natural
    colon
    r <- natural
    return (i,r)
  <?> "testTuple"

softWhite :: Parser ()
softWhite = many (oneOf " \t") >> return ()

colon :: Parser () 
colon = char ':' >> return ()

natural :: Parser Int
natural = fmap (post 0) $ many1 digit
  where
    post ac []     = (ac * 10) 
    post ac [x]    = (ac * 10) + digitToInt x
    post ac (x:xs) = post ((ac * 10) + digitToInt x) xs


I bet you are missing a newline at the end of the last line. For parsing a complete line it should be "861:1\n" but it probably is "861:1EOF". So I think your parser correctly identifies your input to be incorrect.


actually, i found you can use whiteSpace (to easily ignore multi-line block comments, for example), while still being line-oriented. just include this parser when you want newlines.

col (== 1) "only matches beginning of line"

col pred errStr = do
  c <- sourceColumn <$> getPosition
  if pred c then return ()
            else unexpected errStr
0

精彩评论

暂无评论...
验证码 换一张
取 消