开发者

Load website's content through python

开发者 https://www.devze.com 2023-02-20 07:47 出处:网络
How can i load a specific content from a website through python开发者_开发知识库?For example,i want to load some posts of a blog and appear them to my own site.How can i do this?An answer:

How can i load a specific content from a website through python开发者_开发知识库?For example,i want to load some posts of a blog and appear them to my own site.How can i do this?


An answer:

import urllib2
from BeautifulSoup import BeautifulSoup

def fetchtags(req, name, attrs, num):
        try:
            website = urllib2.urlopen(req)
        except urllib2.HTTPError, e:
            print 'A problem occured. Please try again.'
            return
        soup = BeautifulSoup(website,
                             convertEntities=BeautifulSoup.HTML_ENTITIES)
        tags = soup.findAll(name=name,
                            attrs=attrs,
                            limit=num)
        return tags

Then you can use it like:

fetchtags('http://www.website.com', 'div', {'class':'c'}, 10)

To get 10 divs of class c from the specified url...

See Beautiful Soup for more details on the returned object.


urllib and urllib2 will let you load the raw HTML. HTML parsers such as BeautifulSoup and lxml will let you parse the raw HTML so you can get at the sections you care about. Template engines such as Mako, Cheetah, etc. will let you generate HTML so that you can have web pages to display.

0

精彩评论

暂无评论...
验证码 换一张
取 消