I have a bunch of URLs structured like so
<h4 class="classname"><a href="http://some-website.com" onclick="someVaryingJS();" title="Some Title">Some Title</a><h4>
I want to be able to extract just the href and title attributes, keeping in mind the onclick attribute changes per tag and that I only want to do it for ancho开发者_运维技巧r tags that are within h4's of that class.
You could load the html fragment into DOMDocument, and process it from there..?
It's obviously going to be more flexible, but a lot heavier than a straight up regex.
精彩评论