python html parsing_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-09 13:10 出处：网络

i need do some html parsing use python .if i have a html file like bellow: 《body》《div class=\"mydiv\"》

相关专题：python

i need do some html parsing use python .if i have a html file like bellow:

《body》
   《div class="mydiv"》
      《p》i want got it《/p》
      《div》
           《p》 good 《/p》
           《a》 boy  《/a》
      《/div》
   《/div》
《/body》

how can i get the content of 《div class="mydiv"》 ,say , i want got .

      《p》i want got it《/p》
      《div》
           《p》 good 《/p》
           《a》 boy 《/a》
      《/div》

i have try HTMLParser， but i f开发者_运维问答ount it can't. anyway else ? thanks!

With BeautifulSoup it is as simple as:

from BeautifulSoup import BeautifulSoup
    html = """
      <body>
        <div class="mydiv">
          <p>i want got it</p>
          <div>
            <p> good </p>
            <a> boy  </a>
          </div>
        </div>
      </body>
    """

    soup = BeautifulSoup(html)
    result = soup.findAll('div', {'class': 'mydiv'})
    tag = result[0]
    print tag.contents
    [u'\n', <p>i want got it</p>, u'\n', <div>
    <p> good </p>
    <a> boy  </a>
    </div>, u'\n']

Use lxml. Or BeautifulSoup.

I would prefer lxml.html.

import lxml.html as H
doc  = H.fromstring(html)
node = doc.xpath("//div[@class='mydiv']")

python html parsing

精彩评论

关注公众号

热门标签

图文推荐

python html parsing

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：