开发者

Python and XML Processing

开发者 https://www.devze.com 2023-03-27 07:18 出处:网络
I have used urllib to get the following data: <?xml version=\"1.0\" encoding=开发者_Go百科\"UTF-8\" standalone=\"yes\"?>

I have used urllib to get the following data:

<?xml version="1.0" encoding=开发者_Go百科"UTF-8" standalone="yes"?>
<videos xmlns:xs="http://www.w3.org/2001/XMLSchema" 
        xmlns:www="http://www.www.com"">
  <video type="cl">
    <cd>
      <src lang="music">http://www.google.com/ </src>
    </cd>
  </video>
</videos>

I want to get http://www.google.com/ out, here is my code:

import xml.etree.ElementTree as etree
data='<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com""><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>'
tree = etree.fromstring(data)
geturl=tree.findtext('/video/cd/src').strip()
print geturl

I get error:

AttributeError: 'NoneType' object has no attribute 'strip'

Obviously, the findtext failed. I tried findtext('src'), also wont work.

Whats wrong?


Remove the first forward-slash from the path: video/cd/src:

import xml.etree.ElementTree as etree
data='''<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com"><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>'''
tree = etree.fromstring(data)
geturl=tree.findtext('video/cd/src').strip()
print geturl

yields

http://www.google.com/

The forward-slash indicates an absolute path, which is not allowed on elements.

PS. There is also a syntax error in the data you posted: xmlns:www="http://www.www.com"" has two double-quotes at the end...

0

精彩评论

暂无评论...
验证码 换一张
取 消