I'm 开发者_StackOverflow社区trying to use two preg_match
in order to get two specific values from an html
source code.
<?php
$url = "http://www.example.com";
$userAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1";
$ch = curl_init();
curl_setopt($ch,CURLOPT_USERAGENT,$userAgent);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_AUTOREFERER,true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_TIMEOUT,10000000);
$html = curl_exec($ch);
preg_match('~<span class="first">(.*)<\/span>~msU',$html,$matching_data);
preg_match('~<span class="second">(.*)<\/span>~msU',$html,$matching_data2);
print_r($matching_data);
print_r($matching_data2);
?>
Taking in consideration that the $html
var contains the following sequence:
<title>foobar title</title>
<body>
<div class="second">Not this one</span>
<div>
<span class="first">First</span>
<span class="second">this one<span>
</div>
</body>
If I run my php
code, the first print_r
returns the right value: <span class="first">First</span>
. But the second print_r
, instead of returning <span class="second">this one<span>
it returns <div class="second">Not this one</span>
.
So I guess that the preg_match
function begins the treatment from the beginning instead of the last preg_match
call.
How can I make the second (third, fourth, etc.) call of preg_match
runs at the last call?
Thank you,
Regards.
To make sequential calls to preg_match
, continuing the search where you last left off, use the PREG_OFFSET_CAPTURE
flag:
http://php.net/manual/en/function.preg-match.php
As for the larger problem though, regular expressions are generally not suitable for parsing HTML. You should be using some sort of DOM parser to do this work for you, and that's if you even need to do the work on the server side. This sort of thing can be done extremely simply (and naturally) on the client side using JavaScript -- you would just have to pass back the relevant values back to the server.
you can use the offset capture and offset arguments in the preg_match function (php:preg_match)
int preg_match ( string $pattern, string $subject [, array &$matches
[, int $flags [, int $offset]]] )
try this:
<?php
...
preg_match('~<span class="first">(.*)<\/span>~msU',$html,$matching_data,PREG_OFFSET_CAPTURE);
preg_match('~<span class="second">(.*)<\/span>~msU',$html,$matching_data2,PREG_OFFSET_CAPTURE, $matching_data[0][1]+strlen($matching_data[0][0]));
print_r($matching_data);
print_r($matching_data2);
Is that HTML the code you need to work with? It's not valid HTML. You can use preg_match_all
as @igorw suggested:
preg_match_all('~<(span|div) class="(first|second)">(.*)<\/?span>~msU', $html,$matching_data);
echo '<xmp>'; print_r($matching_data[0]);
But if the HTML was valid:
<title>foobar title</title>
<body>
<span class="second">Not this one</span>
<div>
<span class="first">First</span>
<span class="second">this one</span>
</div>
</body>
preg_match_all('~<span class="(first|second)">(.*)<\/span>~msU', $html, $matching_data);
echo '<xmp>'; print_r($matching_data[0]);
精彩评论