开发者

Reading in arbitrary amount of binary messages

开发者 https://www.devze.com 2023-03-07 02:32 出处:网络
I am parsing binary data out of files using Binary.Get and have something like the following: data FileMessageHeaders = FileMessa开发者_运维问答geHeaders [FileMessageHeader]

I am parsing binary data out of files using Binary.Get and have something like the following:

data FileMessageHeaders = FileMessa开发者_运维问答geHeaders [FileMessageHeader]

data FileMessageHeader = FileMessageHeader ...

instance Binary FileMessageHeaders where
  put = undefined
  get = do
    messages <- untilM get isEmpty
    return (FileMessageHeaders messages)

instance Binary FileMessageHeader where
  put = undefined
  get = ..

The problem I am having is that the untilM from monad-loops on hackage uses sequence so I believe that this is what is causing a massive delay in returning the head of the FileMessageHeader list as the whole file must be read (is this correct?). I am having trouble coming up with a way to rewrite this and avoid sequencing all of the FileMessageHeaders in the file. Any suggestions?

Thanks!


As FUZxxl notes, the problem is untilM; the Get monad is strict and requires that the entire untilM action completes before it returns. IO has nothing to do with it.

The easiest thing to do is probably switch to attoparsec and use that for parsing instead of binary. Attoparsec supports streaming parses and would likely be much easier to use for this case.

If you can't switch to attoparsec, you'll need to use some of the lower-level functions of binary rather than just using the Binary instance. Something like the following (completely untested).

getHeaders :: ByteString -> [FileMessageHeader]
getHeaders b = go b 0
  where
    go bs n
      | B.null bs = []
      | otherwise = let (header, bs', n') = runGetState get bs n
                    in header : go bs' n'

Unfortunately this means you won't be able to use the Binary instance or the get function, you'll have to use getHeaders. It will stream though.


The problem here is, that an IO action has to finish before the control flow can continue. Thus, the program has to read in all the messages, before they get evaluated. You could try to define an own combinator sequenceI, that uses the function unsafeInterleaveIO from System.IO.Unsafe. This function allows you, well, to interleave actions. It is used, for instance by getContents. I would define sequenceI like this:

sequenceI (x:xs) = do v <- x
                      vs <- unsafeInterleaveIO $ sequenceI xs
                      return (v:vs)

On top of this combinator, you can define your own untilM, that streams. Doing this is left as an excercise to the reader.

Edit (corrected for compilation)

This is a proof-of-concept, untested implementation of untilM:

untilMI f p = do
  f' <- f
  p' <- p
  if p'
    then return [f']
    else do g' <- unsafeInterleaveIO $ untilMI f p
            return (f' : g')
0

精彩评论

暂无评论...
验证码 换一张
取 消