开发者

Extract link attributes from string of HTML

开发者 https://www.devze.com 2022-12-16 23:21 出处:网络
What\'s the best way to extract HTML out of $var? example of $var $var 开发者_StackOverflow中文版= \"<a href=\"http://stackoverflow.com/\">Stack Overflow</a>\"

What's the best way to extract HTML out of $var?

example of $var

$var 开发者_StackOverflow中文版= "<a href="http://stackoverflow.com/">Stack Overflow</a>"

I want

$var2 = "http://stackoverflow.com/"

example: preg_match();

what else?


Instead of crafting long complicated regex, do it in steps

$str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>';
$str = preg_replace("/.*<a\s+href=\"/","",$str);
print preg_replace("/\">.*/","",$str);

one way of "non regex", using explode

$str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>';
$s = explode('href="',$str);
$t = explode('">',$s[1]);
print $t[0];


If it's a valid HTML string that you have, then the DOMDocument module's loadHTML() function will work, and you can navigate your structure very easily. This is a good way to do it if you have a lot of HTML to work with.

$doc = new DOMDocument();
$doc->loadHTML('<a href="http://stackoverflow.com/">Stack Overflow</a>');
$anchors = $doc->getElementsByTagName('a');
foreach($anchors as $node) {
    echo $node->textContent;
    if ($node->hasAttributes()) {
        foreach($node->attributes as $a) {
            echo ' | '.$a->name.': '.$a->value;
        }
    }
}

produces the following:

Stack Overflow | href: http://stackoverflow.com/ 


strip_tags() removes HTML from the value of a variable. The second parameter is useful if you would like to make exceptions, and leave certain tags in, like the paragraph tag.

$text = '<p>Paragraph.</p> <!-- boo --> <a href="#">Other text</a>';
echo strip_tags($text); // Paragraph. Other text
echo strip_tags($text, '<p><a>'); // <p>Paragraph.</p> <a href="#">Other text</a>

phpQuery

If you want to stay away from Regular Expressions, you could use phpQuery to handle the value, and then use jQuery-style selectors and methods to get your value:

// Bring in phpQuery
require("phpQuery-onefile.php");
// Load up our HTML
phpQuery::newDocumentHTML("<a href='http://sampsonresume.com/'>Homepage</a>");
// Print the HREF attribute of the first Anchor
print pq("a:first")->attr("href"); // http://sampsonresume.com/

Regex

You can use the following to find the URL:

$var = "<a href='http://sampsonresume.com/'>Homepage</a>";
preg_match("(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)",$var,$match);
print $match[0]; // http://sampsonresume.com/


Use the following regular expression:

\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.])(?:[^\s()<>]+|\([^\s()<>]+\))+(?:\([^\s()<>]+\)|[^`!()\[\]{};:'".,<>?«»“”‘’\s]))


<?php
preg_match_All("#<a\s[^>]*href\s*=\s*[\'\"]??\s*?(?'path'[^\'\"\s]+?)[\'\"\s]{1}[^>]*>(?'name'[^>]*)<#simU", $html, $hrefs, PREG_SET_ORDER);

foreach ($hrefs AS $urls){
 print $urls['path']."<br>";
}
?>


try this one once for get value of href attribute

$link = 'test <a href="www.something.com">Click here</a> test2 <a href="www.test.com">Click here</a>';
preg_match_all('/<a[^>]+href=([\'"])(?<href>.+?)\1[^>]*>/i', $link, $result);

if (!empty($result)) {
    # Found a link.
    echo $result['href'][0];
    echo "<br/>";
    echo $result['href'][1];
}

Output:-

www.something.com
www.test.com
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号