开发者

How to fetch content of XML root element in Python?

开发者 https://www.devze.com 2023-03-19 14:11 出处:网络
I have an XML file, e.g.: <?xml version=\"1.0\" encoding=\"UTF-8\"?> <root> First line. <br/> Second line.

I have an XML file, e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    First line. <br/> Second line.
</root>

As an output I want to get: '\nFirst line. <br/> Second line.\n' I just want to notice, if the root element contains other nested elements, they should be r开发者_StackOverflow中文版eturned as is.


The first that I came up with:

from xml.etree.ElementTree import fromstring, tostring

source = '''<?xml version="1.0" encoding="UTF-8"?>
<root>
    First line.<br/>Second line.
</root>
'''

xml = fromstring(source)
result = tostring(xml).lstrip('<%s>' % xml.tag).rstrip('</%s>' % xml.tag)

print result

# output:
#
#   First line.<br/>Second line. 
#

But it's not truly general-purpose approach since it fails if opening root element (<root>) contains any attribute.

UPDATE: This approach has another issue. Since lstrip and rstrip match any combination of given chars, you can face such problem:

# input:
<?xml version="1.0" encoding="UTF-8"?><root><p>First line</p></root>

# result:
p>First line</p

If your really need only literal string between the opening and closing tags (as you mentioned in the comment), you can use this:

from string import index, rindex
from xml.etree.ElementTree import fromstring, tostring

source = '''<?xml version="1.0" encoding="UTF-8"?>
<root attr1="val1">
    First line.<br/>Second line.
</root>
'''

# following two lines are needed just to cut
# declaration, doctypes, etc.
xml = fromstring(source)
xml_str = tostring(xml)

start = index(xml_str, '>')
end = rindex(xml_str, '<')

result = xml_str[start + 1 : -(len(xml_str) - end)]

Not the most elegant approach, but unlike the previous one it works correctly with attributes within opening tag as well as with any valid xml document.


Parse from file:

from xml.etree.ElementTree import parse
tree = parse('yourxmlfile.xml')
print tree.getroot().text

Parse from string:

from xml.etree.ElementTree import fromstring
print fromstring(yourxmlstr).text
0

精彩评论

暂无评论...
验证码 换一张
取 消