Is it possible for BeautifulSoup to work in a case-insensitive manner?_问答_开发者

开发者 https://www.devze.com 2022-12-26 08:33 出处：网络

I am trying to extract Meta Description for fetched webpages. But here I am facing the problem of case sensitivity of BeautifulSoup.

相关专题：python

I am trying to extract Meta Description for fetched webpages. But here I am facing the problem of case sensitivity of BeautifulSoup.

As some of the pages have <meta name="Description and some have <meta name="description.

My problem is very much similar to that of Question on Stackoverflow

The only difference is that I can't use lxml .. I have to stick with Beautifulsou开发者_运维问答p.

You can give BeautifulSoup a regular expression to match attributes against. Something like

soup.findAll('meta', name=re.compile("^description$", re.I))

might do the trick. Cribbed from the BeautifulSoup docs.

A regular expression? Now we have another problem.

Instead, you can pass in a lambda:

soup.findAll(lambda tag: tag.name.lower()=='meta',
    name=lambda x: x and x.lower()=='description')

(x and avoids an exception when the name attribute isn't defined for the tag)

With minor changes it works.

soup.findAll('meta', attrs={'name':re.compile("^description$", re.I)})

With bs4 use the following:

soup.find('meta', attrs={'name': lambda x: x and x.lower()=='description'})

Better still use a css attribute = value selector with i argument for case insensitivity

soup.select('meta[name="description" i]')

change case of the html page source. Use functions such as string.lower(), string.upper()

Is it possible for BeautifulSoup to work in a case-insensitive manner?