I am using python's beautiful stone soup to extract data from this web page. I am using this code segment to get a <li>
object:
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/200809241\
7 Firefox/3.0.3')
开发者_开发问答 response=urllib2.urlopen(req)
link=response.read()
response.close()
soup = BeautifulStoneSoup(link, convertEntities=BeautifulStoneSoup.XML_ENTITIES)
p = soup.find('ul',{"class":"vod_ordering"})
j = 0
while j < len(p('li')):
li= p('li')[j]
j = j+1
And now I want to break down the <li>
object into it's parts. I don't have a problem (that I know of) to get the icon, link and title but I can't get the description which is between </strong>
and </img>
and does not belong to any tag apart from <li>
.
I tried to use contents but I get an error:
Error Contents: sequence item 1: expected string or Unicode, Tag found
When I try to do this:
print ''.join(li.contents)
How can I get that string?
I would try
print ''.join(map(str, li.contents))
精彩评论