I have an XML file, e.g.:
<?xml version="1.0" encoding="UTF-8"?>
<root>
First line. <br/> Second line.
</root>
As an output I want to get: '\nFirst line. <br/> Second line.\n'
I just want to notice, if the root element contains other nested elements, they should be r开发者_StackOverflow中文版eturned as is.
The first that I came up with:
from xml.etree.ElementTree import fromstring, tostring
source = '''<?xml version="1.0" encoding="UTF-8"?>
<root>
First line.<br/>Second line.
</root>
'''
xml = fromstring(source)
result = tostring(xml).lstrip('<%s>' % xml.tag).rstrip('</%s>' % xml.tag)
print result
# output:
#
# First line.<br/>Second line.
#
But it's not truly general-purpose approach since it fails if opening root element (<root>
) contains any attribute.
UPDATE: This approach has another issue. Since lstrip
and rstrip
match any combination of given chars, you can face such problem:
# input:
<?xml version="1.0" encoding="UTF-8"?><root><p>First line</p></root>
# result:
p>First line</p
If your really need only literal string between the opening and closing tags (as you mentioned in the comment), you can use this:
from string import index, rindex
from xml.etree.ElementTree import fromstring, tostring
source = '''<?xml version="1.0" encoding="UTF-8"?>
<root attr1="val1">
First line.<br/>Second line.
</root>
'''
# following two lines are needed just to cut
# declaration, doctypes, etc.
xml = fromstring(source)
xml_str = tostring(xml)
start = index(xml_str, '>')
end = rindex(xml_str, '<')
result = xml_str[start + 1 : -(len(xml_str) - end)]
Not the most elegant approach, but unlike the previous one it works correctly with attributes within opening tag as well as with any valid xml document.
Parse from file:
from xml.etree.ElementTree import parse
tree = parse('yourxmlfile.xml')
print tree.getroot().text
Parse from string:
from xml.etree.ElementTree import fromstring
print fromstring(yourxmlstr).text
精彩评论