开发者

How can this function be rewritten to implement OrderedDict? [duplicate]

开发者 https://www.devze.com 2023-01-24 14:23 出处:网络
This question already has answers here: How to implement an ordered, default dict? (9 answers) 开发者_如何学Python
This question already has answers here: How to implement an ordered, default dict? (9 answers) 开发者_如何学Python Closed 1 year ago.

I have the following function which does a crude job of parsing an XML file into a dictionary.

Unfortunately, since Python dictionaries are not ordered, I am unable to cycle through the nodes as I would like.

How do I change this so it outputs an ordered dictionary which reflects the original order of the nodes when looped with for.

def simplexml_load_file(file):
    import collections
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = None
        if el.text:
            item = el.text
        child_dicts = collections.defaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return dict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')

print x

for y in x['root']:
    print y

Outputs:

{'root': {
    'a': ['1'],
    'aa': [{'b': [{'c': ['2']}, '2']}],
    'aaaa': [{'bb': ['4']}],
    'aaa': ['3'],
    'aaaaa': ['5']
}}

a
aa
aaaa
aaa
aaaaa

How can I implement collections.OrderedDict so that I can be sure of getting the correct order of the nodes?

XML file for reference:

<root>
    <a>1</a>
    <aa>
        <b>
            <c>2</c>
        </b>
        <b>2</b>
    </aa>
    <aaa>3</aaa>
    <aaaa>
        <bb>4</bb>
    </aaaa>
    <aaaaa>5</aaaaa>
</root>


You could use the new OrderedDictdict subclass which was added to the standard library's collections module in version 2.7. Actually what you need is an Ordered+defaultdict combination which doesn't exist — but it's possible to create one by subclassing OrderedDict as illustrated below:

If your version of Python doesn't have OrderedDict, you should be able use Raymond Hettinger's Ordered Dictionary for Py2.4 ActiveState recipe as the base class instead.

import collections

class OrderedDefaultdict(collections.OrderedDict):
    """ A defaultdict with OrderedDict as its base class. """

    def __init__(self, default_factory=None, *args, **kwargs):
        if not (default_factory is None or callable(default_factory)):
            raise TypeError('first argument must be callable or None')
        super(OrderedDefaultdict, self).__init__(*args, **kwargs)
        self.default_factory = default_factory  # called by __missing__()

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key,)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):  # Optional, for pickle support.
        args = (self.default_factory,) if self.default_factory else tuple()
        return self.__class__, args, None, None, iter(self.items())

    def __repr__(self):  # Optional.
        return '%s(%r, %r)' % (self.__class__.__name__, self.default_factory, self.items())

def simplexml_load_file(file):
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = el.text or None
        child_dicts = OrderedDefaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return collections.OrderedDict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')
print(x)

for y in x['root']:
    print(y)

The output produced from your test XML file looks like this:

{'root':
    OrderedDict(
        [('a', ['1']),
         ('aa', [OrderedDict([('b', [OrderedDict([('c', ['2'])]), '2'])])]),
         ('aaa', ['3']),
         ('aaaa', [OrderedDict([('bb', ['4'])])]),
         ('aaaaa', ['5'])
        ]
    )
}

a
aa
aaa
aaaa
aaaaa

Which I think is close to what you want.

Minor update:

Added a __reduce__() method which will allow the instances of the class to be pickled and unpickled properly. This wasn't necessary for this question, but came up in a similar one.


The recipe from martineau works for me, but it has problems with the method copy() inherited from DefaultDict. The following approach fix this drawback:

class OrderedDefaultDict(OrderedDict):
    #Implementation as suggested by martineau

    def copy(self):
         return type(self)(self.default_factory, self)

Please consider, that this implementation does no deepcopy, which seems especially for default dictionaries rather the right thing to do in most circumstances


There are many possible implementation of OrderedDict listed in the answer here: How do you retrieve items from a dictionary in the order that they're inserted?

You can create your own OrderedDict module for use in your own code by copying one of the implementations. I assume you do not have access to the OrderedDict because of the version of Python you are running.

One interesting aspect of your question is the possible need for defaultdict functionality. If you need this, you can implement the __missing__ method to get the desired effect.

0

精彩评论

暂无评论...
验证码 换一张
取 消