I am going to handle XML files for a project. I had earlier decided to use lxml but after reading the requirements, I think ElemenTree would be better for my purpose.
The XML files that have to be processed are:
Small in size. Typically < 10 KB.
No namespaces.
Simple XML structure.
Given the small XML size, memory is not an issue. My only concern is fast parsing.
What should I go with? Mostly I have seen people recommend lxml, but g开发者_JS百科iven my parsing requirements, do I really stand to benefit from it or would ElementTree serve my purpose better?
As others have pointed out, lxml implements the ElementTree API, so you're safe starting out with ElementTree and migrating to lxml if you need better performance or more advanced features.
The big advantage of using ElementTree, if it meets your needs, is that as of Python 2.5 it is part of the Python standard library, which cuts down on external dependencies and the (possible) headache of dealing with compiling/installing C modules.
lxml is basically a superset of ElementTree so you could start with ElementTree and then if you have performance or functionality issues then you could change to lxml.
Performance issues can only be studied by you using your own data,
I recommend my own recipe
XML to Python data structure « Python recipes « ActiveState Code
It does not speed up parsing. But it provides a really native object style access.
>>> SAMPLE_XML = """<?xml version="1.0" encoding="UTF-8"?>
... <address_book>
... <person gender='m'>
... <name>fred</name>
... <phone type='home'>54321</phone>
... <phone type='cell'>12345</phone>
... <note>"A<!-- comment --><![CDATA[ <note>]]>"</note>
... </person>
... </address_book>
... """
>>> address_book = xml2obj(SAMPLE_XML)
>>> person = address_book.person
person.gender -> 'm' # an attribute
person['gender'] -> 'm' # alternative dictionary syntax
person.name -> 'fred' # shortcut to a text node
person.phone[0].type -> 'home' # multiple elements becomes an list
person.phone[0].data -> '54321' # use .data to get the text value
str(person.phone[0]) -> '54321' # alternative syntax for the text value
person[0] -> person # if there are only one <person>, it can still
# be used as if it is a list of 1 element.
'address' in person -> False # test for existence of an attr or child
person.address -> None # non-exist element returns None
bool(person.address) -> False # has any 'address' data (attr, child or text)
person.note -> '"A <note>"'
精彩评论