In a webpage I have these elements:
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850745e0500d612172" class="pagelink" >Page 1</a>
<a href=开发者_运维问答"#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850745e05676787895" class="pagelink" >Page 2</a>
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c85786787666456fgg3" class="pagelink" >Page 3</a>
<a href="#" onClick="window.open('/link.php?webpage=45980a6f91ac0c850734234324756767" class="pagelink" >Page 4</a>
...
.
and I need to retrieve the text in the window.open function of all A tags of class "pagelink":
/link.php?webpage=45980a6f91ac0c850745e0500d612172
/link.php?webpage=45980a6f91ac0c850745e05676787895
/link.php?webpage=45980a6f91ac0c85786787666456fgg3
/link.php?webpage=45980a6f91ac0c850734234324756767
How can I do this with python ?
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
attr = dict(attrs)
if attrs["class"] == "pagelink":
add_to_result(attrs["onclick"])
Replace add_to_result
with your aggregation object (e.g. list) and actual code, and then just remove leading window.open
from results.
This question has already been answered here. You need to parse HTML to get any data that you might require from it. The parsing is done with Beautiful Soup.
Of course someone might post the code as it is, but that is no fun right?
So again, you have to read up the documentation :)
精彩评论