开发者

Beautiful Soup adding quotations of HTML attributes

开发者 https://www.devze.com 2023-03-31 10:32 出处:网络
Thanks in advance, I\'m currently using beautiful soup to parse comment tags out of a set block of HTML. The issue I\'m having is the html that is scraped has no quotations encapsulating the attribut

Thanks in advance,

I'm currently using beautiful soup to parse comment tags out of a set block of HTML. The issue I'm having is the html that is scraped has no quotations encapsulating the attribute value开发者_Python百科s of the HTML tags. However BeautifulSoup seems to add these in, which in some case may be desirable but unfortunately not for my case.

Which would be the best route to either leave the actually HTML intact without adding the quotes in via BeautifulSoup - or can these be added back in?


You have a tag where some attribute values are quoted and some unquoted. What do you mean by 'add quoting back': either edit each attribute value to kludge the quotes in (probably a terrible idea), or else add quoting when it renders. It depends on what other processing you're doing to the tag. Here's code to add quotes when it prints:

input = "<html><sometag attr1=dont_quote_me attr2='but this one is quoted'>Text</sometag></html>"

bs = BeautifulSoup(input)

bs2 = bs.find('sometag')
for a in bs2.attrs:
    (attr,aval) = a
    print "%s='%s'" % (attr,aval),

gives attr1='dont_quote_me' attr2='but this one is quoted'

It's up to you which way. I assume they're all single-words i.e. match regex \w+

0

精彩评论

暂无评论...
验证码 换一张
取 消