开发者

BeautifulSoup Cannot Extract Metadata

开发者 https://www.devze.com 2023-03-07 19:20 出处:网络
I am trying to create a function which will extract meta keywords from a given URL and return it. However no matter what URLs I pass to it, it will always fail.

I am trying to create a function which will extract meta keywords from a given URL and return it. However no matter what URLs I pass to it, it will always fail.

def GetKeywords(url):
  soup = BeautifulSoup(url)
  keywords = soup.findAll('meta', attrs={'name':re.compile("^keywords$", re.I)}) #Find all meta keywords on that page
  if len(keywords) == 0: #Check to see if that page has any me开发者_StackOverflowta keywords to begin with
    print "No meta keywords for: " + str(url)
    return -1
  else:  #If so then return them
    return keywords


Where does the BeautifulSoup state that it would accept and fetch an URL?

soup = BeautifulSoup(url)

Sorry but read the BeautifulSoup documentation first yourself instead trying and guessing API methods..

http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing a Document

What you want is likely using the urllib2 module of Python for fetching data yourself before feeding it into BeautifulSoup or you look at something like the scrapy module.

0

精彩评论

暂无评论...
验证码 换一张
取 消