I want to parse a file like this:
开发者_C百科66:3 3:4 329:2 101:3 495:4 55:5 268:5 267:2 242:4 262:1 861:1
My code is like the following:
getTestData :: String -> IO [[(Int, Int)]]
getTestData name = do
--res <- parseFromFile testData (name ++ ".test")
fc <- readFile (name ++ ".test")
let res = parse testData "test data" fc
case res of
Left e -> error $ show e-- "test data parse eror."
Right ts -> return ts
eol = char '\n'
testData = endBy line eol
--testData = many line
testTuple = do
i <- natural
colon
r <- natural
return (fromIntegral i:: Int, fromIntegral r:: Int)
line = sepBy testTuple whiteSpace
But when run, it throw an exception:
ts <- getTestData "data"
*** Exception: "test data" (line 11, column 1):
unexpected end of input
expecting natural or "\n"
I don't understand, why it said line 11, when my data.test file only has 10 lines. So I failed to fix this problem after several tries.
My best guess is that whiteSpace
in line
is consuming the newlines. So your whole file is being parsed by a single line
parser, and the eol
parser never gets a chance to get its hands on a "\n"
. Try replacing whiteSpace
with many (char ' ')
and see if that helps.
This is a working implementation using primitive char parsers rather than token parsers. Note - it's more robust not to use whitespace as a separator, but to drop it if it exists. The bits where I've used one line do-notation are a lot neater if you use (<*)
from Applicative.
{-# OPTIONS -Wall #-}
module ParsecWhite where
import Text.ParserCombinators.Parsec
import Data.Char
main = getTestData "sample"
getTestData :: String -> IO [[(Int, Int)]]
getTestData name = do
--res <- parseFromFile testData (name ++ ".test")
fc <- readFile (name ++ ".test")
let res = parse testData "test data" fc
case res of
Left e -> error $ show e -- "test data parse eror."
Right ts -> return ts
testData :: Parser [[(Int,Int)]]
testData = input
input :: Parser [[(Int,Int)]]
input = many (do { a <- line; newline; return a })
<?> "input"
line :: Parser [(Int,Int)]
line = many (do { a <- testTuple; softWhite; return a}) <?> "line"
testTuple :: Parser (Int,Int)
testTuple = do
i <- natural
colon
r <- natural
return (i,r)
<?> "testTuple"
softWhite :: Parser ()
softWhite = many (oneOf " \t") >> return ()
colon :: Parser ()
colon = char ':' >> return ()
natural :: Parser Int
natural = fmap (post 0) $ many1 digit
where
post ac [] = (ac * 10)
post ac [x] = (ac * 10) + digitToInt x
post ac (x:xs) = post ((ac * 10) + digitToInt x) xs
I bet you are missing a newline at the end of the last line. For parsing a complete line it should be "861:1\n" but it probably is "861:1EOF". So I think your parser correctly identifies your input to be incorrect.
actually, i found you can use whiteSpace (to easily ignore multi-line block comments, for example), while still being line-oriented. just include this parser when you want newlines.
col (== 1) "only matches beginning of line"
col pred errStr = do
c <- sourceColumn <$> getPosition
if pred c then return ()
else unexpected errStr
精彩评论