First off all, this is my first try on Python, so far it looks pretty easy to use, though I still ran into a problem..
I am trying to change an XML-file to an rss-XML The original xml source looks like this:
<news title="Random Title" date="Date and Time" subtitle="The article txt"></news>
It shoold eventually look like this:
<item>
<pubDate>Date and Time</pubDate>
<title>Random Title</title>
<content:encoded>The article txt</content:encoded>
</item>
I am trying开发者_JS百科 to do this using python and BeautifulSoup, using the following script
from BeautifulSoup import BeautifulSoup
import re
doc = [
'<news post_title="Random Title" post_date="Date and Time" post_content="The article txt">''</news></p>'
]
soup = BeautifulSoup(''.join(doc))
print soup.prettify()
posttitle = soup.news['post_title']
postdate = soup.news['post_date']
postcontent = soup.news['post_content']
print "<item>"
print "<pubDate>"
print postdate
print "</pubDate>"
print "<title>"
print posttitle
print "</title>"
print "<content:encoded>"
print postcontent
print "</content:encoded>"
print "</item>"
The problem here is, it only retrieves the most ontop string XML, and not the others. Can anybody give me some directions in fixxing this?
Cheers :)
Stealing the code and correcting it:
for news in soup.findAll('news'):
posttitle = news['post_title']
postdate = news['post_date']
postcontent = news['post_content']
print "<item>"
print "<pubDate>"
print postdate
print "</pubDate>"
print "<title>"
print posttitle
print "</title>"
print "<content:encoded>"
print postcontent
print "</content:encoded>"
print "</item>"
Your example doc variable only holds one <news>
element.
but in general you would need to loop through the news elements
something like
for news in soup.findAll('news'):
posttitle = news['post_title']
postdate = news['post_date']
postcontent = news['post_content']
print "<item>"
print "<pubDate>"
print postdate
print "</pubDate>"
print "<title>"
print posttitle
print "</title>"
print "<content:encoded>"
print postcontent
print "</content:encoded>"
print "</item>"
精彩评论