Here's the code I have:
from cStringIO import StringIO
from lxml import etree
xml = StringIO('''<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ENTITY test "This is a test">
]>
<root>
<sub>&test;</sub>
</root>''')
d1 = etree.parse(xml)
print '%r' % d1.find('/sub').text
parser = etree.XMLParser(resolve_entities=False)开发者_C百科
d2 = etree.parse(xml, parser=parser)
print '%r' % d2.find('/sub').text
Here's the output:
'This is a test'
None
How do I get lxml to give me '&test;'
, i.e., the raw entity reference?
The "unresolved" Entity is left as child node of the element node sub
>>> print d2.find('/sub')[0]
&test;
>>> d2.find('/sub').getchildren()
[&test;]
精彩评论