I have the following function which does a crude job of parsing an XML file into a dictionary.
Unfortunately, since Python dictionaries are not ordered, I am unable to cycle through the nodes as I would like.
How do I change this so it outputs an ordered dictionary which reflects the original order of the nodes when looped with for
.
def simplexml_load_file(file):
import collections
from lxml import etree
tree = etree.parse(file)
root = tree.getroot()
def xml_to_item(el):
item = None
if el.text:
item = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_item(child))
return dict(child_dicts) or item
def xml_to_dict(el):
return {el.tag: xml_to_item(el)}
return xml_to_dict(root)
x = simplexml_load_file('routines/test.xml')
print x
for y in x['root']:
print y
Outputs:
{'root': {
'a': ['1'],
'aa': [{'b': [{'c': ['2']}, '2']}],
'aaaa': [{'bb': ['4']}],
'aaa': ['3'],
'aaaaa': ['5']
}}
a
aa
aaaa
aaa
aaaaa
How can I implement collections.OrderedDict
so that I can be sure of getting the correct order of the nodes?
XML file for reference:
<root>
<a>1</a>
<aa>
<b>
<c>2</c>
</b>
<b>2</b>
</aa>
<aaa>3</aaa>
<aaaa>
<bb>4</bb>
</aaaa>
<aaaaa>5</aaaaa>
</root>
You could use the new OrderedDict
dict
subclass which was added to the standard library's collections
module in version 2.7✶. Actually what you need is an Ordered
+defaultdict
combination which doesn't exist — but it's possible to create one by subclassing OrderedDict
as illustrated below:
✶ If your version of Python doesn't have OrderedDict
, you should be able use Raymond Hettinger's Ordered Dictionary for Py2.4 ActiveState recipe as the base class instead.
import collections
class OrderedDefaultdict(collections.OrderedDict):
""" A defaultdict with OrderedDict as its base class. """
def __init__(self, default_factory=None, *args, **kwargs):
if not (default_factory is None or callable(default_factory)):
raise TypeError('first argument must be callable or None')
super(OrderedDefaultdict, self).__init__(*args, **kwargs)
self.default_factory = default_factory # called by __missing__()
def __missing__(self, key):
if self.default_factory is None:
raise KeyError(key,)
self[key] = value = self.default_factory()
return value
def __reduce__(self): # Optional, for pickle support.
args = (self.default_factory,) if self.default_factory else tuple()
return self.__class__, args, None, None, iter(self.items())
def __repr__(self): # Optional.
return '%s(%r, %r)' % (self.__class__.__name__, self.default_factory, self.items())
def simplexml_load_file(file):
from lxml import etree
tree = etree.parse(file)
root = tree.getroot()
def xml_to_item(el):
item = el.text or None
child_dicts = OrderedDefaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_item(child))
return collections.OrderedDict(child_dicts) or item
def xml_to_dict(el):
return {el.tag: xml_to_item(el)}
return xml_to_dict(root)
x = simplexml_load_file('routines/test.xml')
print(x)
for y in x['root']:
print(y)
The output produced from your test XML file looks like this:
{'root':
OrderedDict(
[('a', ['1']),
('aa', [OrderedDict([('b', [OrderedDict([('c', ['2'])]), '2'])])]),
('aaa', ['3']),
('aaaa', [OrderedDict([('bb', ['4'])])]),
('aaaaa', ['5'])
]
)
}
a
aa
aaa
aaaa
aaaaa
Which I think is close to what you want.
Minor update:
Added a __reduce__()
method which will allow the instances of the class to be pickled and unpickled properly. This wasn't necessary for this question, but came up in a similar one.
The recipe from martineau works for me, but it has problems with the method copy() inherited from DefaultDict. The following approach fix this drawback:
class OrderedDefaultDict(OrderedDict):
#Implementation as suggested by martineau
def copy(self):
return type(self)(self.default_factory, self)
Please consider, that this implementation does no deepcopy, which seems especially for default dictionaries rather the right thing to do in most circumstances
There are many possible implementation of OrderedDict listed in the answer here: How do you retrieve items from a dictionary in the order that they're inserted?
You can create your own OrderedDict module for use in your own code by copying one of the implementations. I assume you do not have access to the OrderedDict because of the version of Python you are running.
One interesting aspect of your question is the possible need for defaultdict functionality. If you need this, you can implement the __missing__
method to get the desired effect.
精彩评论