开发者

Two successive preg_match

开发者 https://www.devze.com 2023-02-09 01:12 出处:网络
I\'m 开发者_StackOverflow社区trying to use two preg_match in order to get two specific values from an html source code.

I'm 开发者_StackOverflow社区trying to use two preg_match in order to get two specific values from an html source code.

<?php

    $url = "http://www.example.com";
    $userAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1";
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_USERAGENT,$userAgent);
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_AUTOREFERER,true);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch,CURLOPT_TIMEOUT,10000000);  
    $html  = curl_exec($ch);
    preg_match('~<span class="first">(.*)<\/span>~msU',$html,$matching_data);
    preg_match('~<span class="second">(.*)<\/span>~msU',$html,$matching_data2);
    print_r($matching_data);
    print_r($matching_data2);   
?>

Taking in consideration that the $html var contains the following sequence:

<title>foobar title</title>
<body>
<div class="second">Not this one</span>
<div>
<span class="first">First</span>
<span class="second">this one<span>
</div>
</body>

If I run my php code, the first print_r returns the right value: <span class="first">First</span>. But the second print_r, instead of returning <span class="second">this one<span> it returns <div class="second">Not this one</span>.

So I guess that the preg_match function begins the treatment from the beginning instead of the last preg_match call.

How can I make the second (third, fourth, etc.) call of preg_match runs at the last call?

Thank you,

Regards.


To make sequential calls to preg_match, continuing the search where you last left off, use the PREG_OFFSET_CAPTURE flag:

http://php.net/manual/en/function.preg-match.php

As for the larger problem though, regular expressions are generally not suitable for parsing HTML. You should be using some sort of DOM parser to do this work for you, and that's if you even need to do the work on the server side. This sort of thing can be done extremely simply (and naturally) on the client side using JavaScript -- you would just have to pass back the relevant values back to the server.


you can use the offset capture and offset arguments in the preg_match function (php:preg_match)

int preg_match ( string $pattern, string $subject [, array &$matches[, int $flags [, int $offset]]] )

try this:

<?php

...

preg_match('~<span class="first">(.*)<\/span>~msU',$html,$matching_data,PREG_OFFSET_CAPTURE);
preg_match('~<span class="second">(.*)<\/span>~msU',$html,$matching_data2,PREG_OFFSET_CAPTURE, $matching_data[0][1]+strlen($matching_data[0][0]));
print_r($matching_data);
print_r($matching_data2); 


Is that HTML the code you need to work with? It's not valid HTML. You can use preg_match_all as @igorw suggested:

preg_match_all('~<(span|div) class="(first|second)">(.*)<\/?span>~msU', $html,$matching_data);
echo '<xmp>'; print_r($matching_data[0]);

But if the HTML was valid:

<title>foobar title</title>
<body>
<span class="second">Not this one</span>
<div>
<span class="first">First</span>
<span class="second">this one</span>
</div>
</body>

preg_match_all('~<span class="(first|second)">(.*)<\/span>~msU', $html, $matching_data);
echo '<xmp>'; print_r($matching_data[0]);
0

精彩评论

暂无评论...
验证码 换一张
取 消