please help me get the link and text from this tag. <h3 class="post-title entry-title">
has to be included because I want the links from that specific tag.
<h3 class="post-title entry-title"&开发者_Go百科gt;
<a href="http://mymplogk.blogspot.com/2011/03/h_25.html">Text</a>
</h3>
my work so far is
<?php
$string = file_get_contents('http://www.domain.com');
$regex_pattern = "";
unset($matches);
preg_match_all($regex_pattern, $string, $matches);
foreach ($matches[0] as $paragraph) {
echo $paragraph;
echo "<br>";
}
?>
Thank you in advance
Don't use regex to parse HTML. It's a bad idea. Use an HTML/XML parser. Since you are using PHP, you can try using PHP Tidy or DOMDocument. It will make your life much easier.
Following your example, this regex will find "http://mymplogk.blogspot.com/2011/03/h_25.html" and "Text":
$regex_pattern = '/<h3[^>]+class\s*=\s*[\'"]post-title entry-title[\'"][^>]*>.*?<a[^>]+href\s*=\s*"([^"]+)"[^>]*>([^<]*)</s';
This matches single or double quotes around the h3 tag, and allows additional attributes in h3 tag and optional whitespace between attributes and values. It also matches multiple times in $string, e.g.
$string = '<h3 class="post-title entry-title">
<a href="http://mymplogk.blogspot.com/2011/03/h_25.html">Text</a>
</h3>
<p>doot</p>
<h3 class=\'post-title entry-title\'>
<a href="http://www.google.com/">More Text</a>
</h3>';
I would recomend you to use DOMDocument and XPath to extract the url from the page instead of using regexp.
This tutorial gives you some starters how to use xpath and dom. http://www.merchantos.com/blog/makebeta/php/scraping-links-with-php#php_dom
edit: If you use firebug-addon in firefox, you can inspect your element on the page, and copy it's xpath.
The regex:
(?<=href=").+(?=")
Should match anything in between href tags
You can test this in RegexStorm
精彩评论