开发者

lxml.etree._ElementTree.find() can't be called on objectify.parse result

开发者 https://www.devze.com 2023-04-03 09:11 出处:网络
>>> from lxml import objectify >>> from StringIO import StringIO >>> f = StringIO(\"<root>data</root>\")
>>> from lxml import objectify
>>> from StringIO import StringIO
>>> f = StringIO("<root>data</root>")
>>> tree = objectify.parse(f)
>>> type(tree)
<type 'lxml.etree._ElementTree'>
>>> tree.find('root')
Traceback开发者_开发技巧 (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 1944, in lxml.etree._ElementTree.find (src/lxml/lxml.etree.c:45105)
TypeError: find() takes exactly one argument (2 given)
>>> tree.find()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 1926, in lxml.etree._ElementTree.find (src/lxml/lxml.etree.c:44970)
TypeError: find() takes at least 1 positional argument (0 given)
>>> print tree.find.__doc__
find(self, path, namespaces=None)

        Finds the first toplevel element with given tag.  Same as
        ``tree.getroot().find(path)``.

        The optional ``namespaces`` argument accepts a
        prefix-to-namespace mapping that allows the usage of XPath
        prefixes in the path expression.

Note that tree.getroot().find works and find works on _ElementTree instances created by etree.parse.

Tha main question: how can the same method raise these two mutually exclusive exceptions? Also, while I can use tree.getroot().find, the shorter form would be preferred if it worked as documented, so I'm curious, is it a lxml bug?


We solve this mystery by looking at the corresponding source (long live OSS):

def find(self, path, namespaces=None):
    u"""find(self, path, namespaces=None)

    Finds the first toplevel element with given tag.  Same as
    ``tree.getroot().find(path)``.

    The optional ``namespaces`` argument accepts a
    prefix-to-namespace mapping that allows the usage of XPath
    prefixes in the path expression.
    """
    self._assertHasRoot()
    root = self.getroot()
    if _isString(path):
        start = path[:1]
        if start == u"/":
            path = u"." + path
        elif start == b"/":
            path = b"." + path
    return root.find(path, namespaces)

"lxml.etree.pyx", line 1926 is the first and "lxml.etree.pyx", line 1944 the last line of the sniplet, so there are in fact two different find methods. Apparently objectify constructs some different objects (and thus this is a bug in lxml) that do not accept the namespaces parameter. If you use lxml.etree.parse to parse your StringIO object, the API works just fine.

0

精彩评论

暂无评论...
验证码 换一张
取 消