开发者

BeautifulSoup print multiple tag / attr

开发者 https://www.devze.com 2023-03-06 02:00 出处:网络
First off all, this is my first try on Python, so far it looks pretty easy to use, though I still ran into a problem..

First off all, this is my first try on Python, so far it looks pretty easy to use, though I still ran into a problem..

I am trying to change an XML-file to an rss-XML The original xml source looks like this:

<news title="Random Title" date="Date and Time" subtitle="The article txt"></news>

It shoold eventually look like this:

<item>
<pubDate>Date and Time</pubDate>
<title>Random Title</title>
<content:encoded>The article txt</content:encoded>
</item>

I am trying开发者_JS百科 to do this using python and BeautifulSoup, using the following script

from BeautifulSoup import BeautifulSoup
import re

doc = [
'<news post_title="Random Title" post_date="Date and Time" post_content="The article txt">''</news></p>'
    ]
soup = BeautifulSoup(''.join(doc))

print soup.prettify()

posttitle = soup.news['post_title']
postdate = soup.news['post_date']
postcontent = soup.news['post_content']

print "<item>"
print "<pubDate>"
print postdate
print "</pubDate>"
print "<title>"
print posttitle
print "</title>"
print "<content:encoded>"
print postcontent
print "</content:encoded>"
print "</item>"

The problem here is, it only retrieves the most ontop string XML, and not the others. Can anybody give me some directions in fixxing this?

Cheers :)


Stealing the code and correcting it:

for news in soup.findAll('news'):
    posttitle = news['post_title']
    postdate = news['post_date']
    postcontent = news['post_content']
    print "<item>"
    print "<pubDate>"
    print postdate
    print "</pubDate>"
    print "<title>"
    print posttitle
    print "</title>"
    print "<content:encoded>"
    print postcontent
    print "</content:encoded>"
    print "</item>"


Your example doc variable only holds one <news> element.

but in general you would need to loop through the news elements

something like

for news in soup.findAll('news'):
    posttitle = news['post_title']
    postdate = news['post_date']
    postcontent = news['post_content']
    print "<item>"
    print "<pubDate>"
    print postdate
    print "</pubDate>"
    print "<title>"
    print posttitle
    print "</title>"
    print "<content:encoded>"
    print postcontent
    print "</content:encoded>"
    print "</item>"
0

精彩评论

暂无评论...
验证码 换一张
取 消