开发者

How to retrieve onclick text?

开发者 https://www.devze.com 2023-01-28 02:02 出处:网络
In a webpage I have these elements: <a href=\"#\" onClick=\"window.open(\'/link.php?webpage=45980a6f91ac0c850745e0500d612172\" class=\"pagelink\" >Page 1</a>

In a webpage I have these elements:

<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850745e0500d612172" class="pagelink" >Page 1</a>
<a href=开发者_运维问答"#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850745e05676787895" class="pagelink" >Page 2</a>
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c85786787666456fgg3" class="pagelink" >Page 3</a>
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850734234324756767" class="pagelink" >Page 4</a>
...

.

and I need to retrieve the text in the window.open function of all A tags of class "pagelink":

/link.php?webpage=45980a6f91ac0c850745e0500d612172
/link.php?webpage=45980a6f91ac0c850745e05676787895
/link.php?webpage=45980a6f91ac0c85786787666456fgg3
/link.php?webpage=45980a6f91ac0c850734234324756767

How can I do this with python ?


from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        attr = dict(attrs)
        if attrs["class"] == "pagelink":
            add_to_result(attrs["onclick"])

Replace add_to_result with your aggregation object (e.g. list) and actual code, and then just remove leading window.open from results.


This question has already been answered here. You need to parse HTML to get any data that you might require from it. The parsing is done with Beautiful Soup.

Of course someone might post the code as it is, but that is no fun right?

So again, you have to read up the documentation :)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号