开发者

Href URL matching, [duplicate]

开发者 https://www.devze.com 2023-04-02 05:35 出处:网络
This question already has answers here: Closed 11 years ago. Possible Duplicate: Grabbing the href attribute of an A element
This question already has answers here: Closed 11 years ago.

Possible Duplicate:

Grabbing the href attribute of an A element

Im trying to match up in page source :

 <a href="/download/blahbal.html">

I have looked at one other link on this site and used the regex :

   '/<a href=["\']?(\/download\/[^"\'\s>]+)["\'\s>]?/i'

which returns all href l开发者_开发技巧inks on the page but it misses off the .html on some links.

Any help would be greatly appreciated.

Thank you


First use the method described here to retrieve all hrefs, then you can use a regex or strpos to "filter out" those who don't start with /download/.
The reason why you should use a parser instead of a regex is discussed in many other posts on stack overflow (see this). Once you parsed the document and got the hrefs you need, then you can filter them out with simple functions.

A little code:

$dom = new DOMDocument;
//html string contains your html
$dom->loadHTML($html);
//at the end of the procedure this will be populated with filtered hrefs
$hrefs = array();
foreach( $dom->getElementsByTagName('a') as $node ) {
    //look for href attribute
    if( $node->hasAttribute( 'href' ) ) {
        $href = $node->getAttribute( 'href' );
        // filter out hrefs which don't start with /download/
        if( strpos( $href, "/download/" ) === 0 )
            $hrefs[] = $href; // store href
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消