How to retrieve onclick text?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-28 02:02 出处：网络

In a webpage I have these elements: <a href=\"#\" onClick=\"window.open(\'/link.php?webpage=45980a6f91ac0c850745e0500d612172\" class=\"pagelink\" >Page 1</a>

In a webpage I have these elements:

<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850745e0500d612172" class="pagelink" >Page 1</a>
<a href=开发者_运维问答"#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850745e05676787895" class="pagelink" >Page 2</a>
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c85786787666456fgg3" class="pagelink" >Page 3</a>
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850734234324756767" class="pagelink" >Page 4</a>
...

and I need to retrieve the text in the window.open function of all A tags of class "pagelink":

/link.php?webpage=45980a6f91ac0c850745e0500d612172
/link.php?webpage=45980a6f91ac0c850745e05676787895
/link.php?webpage=45980a6f91ac0c85786787666456fgg3
/link.php?webpage=45980a6f91ac0c850734234324756767

How can I do this with python ?

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        attr = dict(attrs)
        if attrs["class"] == "pagelink":
            add_to_result(attrs["onclick"])

Replace add_to_result with your aggregation object (e.g. list) and actual code, and then just remove leading window.open from results.

This question has already been answered here. You need to parse HTML to get any data that you might require from it. The parsing is done with Beautiful Soup.

Of course someone might post the code as it is, but that is no fun right?

So again, you have to read up the documentation :)