开发者

Meta keyword and description REG EX issue

开发者 https://www.devze.com 2023-03-15 08:31 出处:网络
Im writing a program to pull the meta info from websites. I need to write some regex to pull the text in the between the content tags.

Im writing a program to pull the meta info from websites.

I need to write some regex to pull the text in the between the content tags.

$find = "<meta\s+name=['\"]??keywords['\"]??\s+content=['\"]??(.+)['\"]??\s*\/?>";

This works ok for meta keywords written like so:

开发者_StackOverflow社区
<meta name="keywords" content="keyword, keyword, keyword" /> or like so
<meta name="keywords" content="keyword, keyword, keyword">

BuT I would like to flip it round so it can find the text inbetween the content tags in this format:

<meta content="keyword, keyword, keyword" name="keywords" /> or like so
<meta content="keyword, keyword, keyword" name="keywords" >

Anyone help? Cheers


For this purpose you could also use get_meta_tags() - a builtin PHP function which extracts <meta> tag attributes from websites (or already downloaded files):

$tags = get_meta_tags('http://www.example.com/');
print_r($tags);


Try this:

<meta[^>]*content="(?<keyword>[^"]*)"[^>]*/?>

Result:

Meta keyword and description REG EX issue


You can also use PHP DOm

$doc=new DOMDocument();
$doc->loadHTML($htmlcontent);
$xpath= new DOMXPath($doc);
$nodelist=$xpath->query('//meta[@name='keywords']/@content');
foreach($nodelist as $node)
  echo $node->nodeValue;

Using regexp works most of the time but it can not work safely on any HTML content.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号