开发者

Python running out of memory parsing XML using cElementTree.iterparse

开发者 https://www.devze.com 2023-04-11 20:49 出处:网络
A simplified version of my XML parsing function is here: import xml.etree.cElementTree as ET def analyze(xml):

A simplified version of my XML parsing function is here:

import xml.etree.cElementTree as ET

def analyze(xml):
    it = ET.iterparse(file(xml))
    count = 0

    for (ev, el) in it:
        count += 1

    print('count: {0}'.format(count))

This causes Python to run out of memory, which doesn't make a w开发者_开发问答hole lot of sense. The only thing I am actually storing is the count, an integer. Why is it doing this:

Python running out of memory parsing XML using cElementTree.iterparse

See that sudden drop in memory and CPU usage at the end? That's Python crashing spectacularly. At least it gives me a MemoryError (depending on what else I am doing in the loop, it gives me more random errors, like an IndexError) and a stack trace instead of a segfault. But why is it crashing?


Code example:

import xml.etree.cElementTree as etree

def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory
0

精彩评论

暂无评论...
验证码 换一张
取 消