Python XML Parsing Confusion_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-17 08:15 出处：网络

I\'m using xml.dom.mindom in Python and have retrieved the book node in the below XML tree.I want to get a list of all children nodes.In this case, I would think there would only be one.

相关专题：python xml

I'm using xml.dom.mindom in Python and have retrieved the book node in the below XML tree. I want to get a list of all children nodes. In this case, I would think there would only be one.

<Book>
    <Title>Why is this so hard</Title>
</Book

When I call:

nodeList = bookNode.childNodes
print "nodeList has " + str(nodeList.length) + " elements"
for node in nodeList:
    print "Found a " + node.nodeName + " node"

I get the following output:

nodeList has 3 elements
Found a #text node
Found a Book node
Found a #text node

What are these random #text nodes? How do I get the tagName and value for each of the legitimate nodes? I want to get a list of key->value pairs for each of the nodes under Book. I don't want to use getElementsByName because I will not know all of the tagNames ahead 开发者_开发技巧of time.

Book -> "Why is this so hard"

Thanks- Jonathan

The first text node is the whitespace between <Book> and <Title>. The second is the whitespace between </Title> and </Book>

What are these random #text nodes?

Hardly random, they're text nodes representing the whitespace you put between tags. XML has to remember this, or the document would be all run together in one unreadable line when it's reserialised.

How do I get the tagName and value for each of the legitimate nodes?

Loop over the child nodes, ignoring those that aren't elements.

I want to get a list of key->value pairs for each of the nodes under Book.

book= {}
for child in bookNode.childNodes:
    if child.nodeType==child.ELEMENT_NODE:
        book[child.tagName]= '' if child.firstChild is None else child.firstChild.data

This assumes that every element contains only a single text node.