开发者

How to get all the info in XML into dictionary with Python

开发者 https://www.devze.com 2023-01-07 02:48 出处:网络
Let\'s say I have an XML file as follows. <A> <B> <C>\"blah\"</C> <C>\"blah\"</C>

Let's say I have an XML file as follows.

<A>
 <B>
  <C>"blah"</C>
  <C>"blah"</C>
 </B>
 <B>
  <C>"blah"</C>
  <C>"blah"</C>
 </B>
</A>

I need to read this file into a dictionary something like this.

dict["A.B1.C1"] = "blah"
dict["A.B1.C2"] = "blah"
dict["A.B2.C1"] = "blah"
dict["A.B2.C2"] = "blah"

But the format of the dict doesn't matter, I just want to read the all the 开发者_如何学运维info into the variables of Python.

The thing is that I don't know the structure of the XML, I just want to read all the info in a dictionary.

Is there any way to do this with Python?


You can use untangle library in python. untangle.parse() converts an XML document into a Python object

This takes an xml file as input and returns a python object which represents that xml document.

Lets take following xml file as an example and name it as test_xml.xml

<A>
 <B>
  <C>"blah1"</C>
  <C>"blah2"</C>
 </B>
 <B>
  <C>"blah3"</C>
  <C>"blah4"</C>
 </B>
</A>  

Now lets convert the above xml file into a python object to access the elements of xml file

>>>import untangle

>>>input_file = "/home/tests/test_xml.xml" #Full path to your xml file
>>>obj = untangle.parse(input_file)

>>>obj.A.B[0].C[0].cdata
u'"blah1"'
>>> obj.A.B[0].C[1].cdata
u'"blah2"'
>>> obj.A.B[1].C[0].cdata
u'"blah3"'
>>> obj.A.B[1].C[1].cdata
u'"blah4"'


I usually use the lxml.objectify library for quick XML parsing.

With your XML string, you can do:

from lxml import objectify
root = objectify.fromstring(xml_string)

And then get individual elements using a dictionary interface:

value = root["A"][0]["B"][0]["C"][0]

Or, if you prefer:

value = root.A[0].B[0].C[0]


I usually parse XML using the ElementTree module on the standard library. It does not give you a dictionary, you get a much more useful DOM structure which allows you to iterate over each element for children.

from xml.etree import ElementTree as ET

xml = ET.parse("<path-to-xml-file")
root_element = xml.getroot()

for child in root_element:
   ...

If there is specific need to parse it to a dictionary, instead of getting the information you need from a DOM tree, a recursive function to build one from the root node would be something like:

def xml_dict(node, path="", dic =None):
    if dic == None:
        dic = {}
    name_prefix = path + ("." if path else "") + node.tag
    numbers = set()
    for similar_name in dic.keys():
        if similar_name.startswith(name_prefix):
            numbers.add(int (similar_name[len(name_prefix):].split(".")[0] ) )
    if not numbers:
        numbers.add(0)
    index = max(numbers) + 1
    name = name_prefix + str(index)
    dic[name] = node.text + "<...>".join(childnode.tail
                                         if childnode.tail is not None else
                                         "" for childnode in node)
    for childnode in node:
        xml_dict(childnode, name, dic)
    return dic

For the XML you list above this yields this dictionary:

{'A1': '\n \n <...>\n',
 'A1.B1': '\n  \n  <...>\n ',
 'A1.B1.C1': '"blah"',
 'A1.B1.C2': '"blah"',
 'A1.B2': '\n  \n  <...>\n ',
 'A1.B2.C1': '"blah"',
 'A1.B2.C2': '"blah"'}

(I find the DOM form more useful)


Check out the answers to Really simple way to deal with XML in Python?, you will probably find one of them to directly suit your needs.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号