开发者

Lazy IO - string not garbage collected?

开发者 https://www.devze.com 2023-03-21 12:48 出处:网络
I\'m currently trying to read the contents of an XML file into a Map Int (Map Int String) and it works quite well (using HaXml). However, I\'m not satisfied with the memory consumption of my program a

I'm currently trying to read the contents of an XML file into a Map Int (Map Int String) and it works quite well (using HaXml). However, I'm not satisfied with the memory consumption of my program and the problems seems to be the garbage collection.

Here's the code I'm using to read the XML file:

type TextFile = Map Int (Map Int String)

buildTextFile :: String -> IO TextFile
buildTextFile filename = do content <- readFile filename
                            let doc = xmlParse filename content
                                con = docContent (posInNewCxt filename Nothing) doc
        开发者_如何学编程                    return $ buildTF con

My guess is that content is held in memory even after the return, although it doesn't need to be (of course it could also be doc or con). I come to this conclusion because the memory consumption rises quickly with very large XML files, although the resulting TextFile is only a singleton map of a singleton map (using a special testing file, generally it's different, of course). So in the end, I have a Map of a Map Int String, with only one string in it, but the memory consumption is up to 19 MB.

Using strict application ($!) or using Data.Text instead of String in TextFile doesn't change anything.

So my question is: Is there some way to tell the compiler that the string content (or doc or con) isn't needed anymore and that it can be garbage collected?

And more generally: How can I find out where the problem really comes from without all the guessing?

Edit: As FUZxxl suggested I tried using deepseq and changed the second line of buildTextFile like so:

let doc = content `deepseq` xmlParse filename content

Unfortunately that didn't change anything really (or am I using it wrong?)...


Don't Guess What Is Consuming Memory, Find Out For Sure

The first step is to determine what types are consuming the most memory. You can see lots of examples of heap profiling here on SO or read the GHC manual.

Forcing Computation

If the problem is lazy evaluation (you're building an on-heap thunk that can compute the XML document type and leaving the string in heap too) then use rnf and seq:

buildTextFile :: String -> IO TextFile
buildTextFile filename = do content <- readFile filename
                            let doc = xmlParse filename content
                                con = docContent (posInNewCxt filename Nothing) doc
                                res = buildTF con
                            return $ rnf res `seq` res

Or just use bang patterns (!res = buildTF con), either way that should force the thunks and allow the GC to collect String.

0

精彩评论

暂无评论...
验证码 换一张
取 消