How can i load a specific content from a website through python开发者_开发知识库?For example,i want to load some posts of a blog and appear them to my own site.How can i do this?
An answer:
import urllib2
from BeautifulSoup import BeautifulSoup
def fetchtags(req, name, attrs, num):
try:
website = urllib2.urlopen(req)
except urllib2.HTTPError, e:
print 'A problem occured. Please try again.'
return
soup = BeautifulSoup(website,
convertEntities=BeautifulSoup.HTML_ENTITIES)
tags = soup.findAll(name=name,
attrs=attrs,
limit=num)
return tags
Then you can use it like:
fetchtags('http://www.website.com', 'div', {'class':'c'}, 10)
To get 10 divs of class c from the specified url...
See Beautiful Soup for more details on the returned object.
urllib
and urllib2
will let you load the raw HTML. HTML parsers such as BeautifulSoup and lxml will let you parse the raw HTML so you can get at the sections you care about. Template engines such as Mako, Cheetah, etc. will let you generate HTML so that you can have web pages to display.
精彩评论