Im writing a program to pull the meta info from websites.
I need to write some regex to pull the text in the between the content tags.
$find = "<meta\s+name=['\"]??keywords['\"]??\s+content=['\"]??(.+)['\"]??\s*\/?>";
This works ok for meta keywords written like so:
开发者_StackOverflow社区<meta name="keywords" content="keyword, keyword, keyword" /> or like so
<meta name="keywords" content="keyword, keyword, keyword">
BuT I would like to flip it round so it can find the text inbetween the content tags in this format:
<meta content="keyword, keyword, keyword" name="keywords" /> or like so
<meta content="keyword, keyword, keyword" name="keywords" >
Anyone help? Cheers
For this purpose you could also use get_meta_tags()
- a builtin PHP function which extracts <meta>
tag attributes from websites (or already downloaded files):
$tags = get_meta_tags('http://www.example.com/');
print_r($tags);
Try this:
<meta[^>]*content="(?<keyword>[^"]*)"[^>]*/?>
Result:
You can also use PHP DOm
$doc=new DOMDocument();
$doc->loadHTML($htmlcontent);
$xpath= new DOMXPath($doc);
$nodelist=$xpath->query('//meta[@name='keywords']/@content');
foreach($nodelist as $node)
echo $node->nodeValue;
Using regexp works most of the time but it can not work safely on any HTML content.
精彩评论