开发者

Regex issue with multiple results

开发者 https://www.devze.com 2023-03-13 21:27 出处:网络
I am doing some php html parsing and this is the code i have right now function get_tag($htmlelement,$attr, $value, $xml ,$arr) {

I am doing some php html parsing and this is the code i have right now

function get_tag($htmlelement,$attr, $value, $xml ,$arr) {
    $attr = preg_quote($attr);
    $开发者_开发技巧value = preg_quote($value);
    if($attr!='' && $value!='')
    {
    $tag_regex = '/<'.$htmlelement.'[^>]*'.$attr.'="'.$value.'">(.*?)<\\/'.$htmlelement.'>/si';
    preg_match($tag_regex,$xml,$matches);
    }
    else
    {
    $tag_regex = '/'.$htmlelement.'[^>]*"(.*?)\/'.$htmlelement.'/i';
    preg_match_all($tag_regex,$xml,$matches);
    }
    if($arr)
        return $matches;
    else 
        return $matches[1];
}
$htmlcontent = file_get_contents("doc.html");
$extract = get_tag('tbody','id', 'open', $htmlcontent,false);

$trows = get_tag('tr','', '', $htmlcontent,false);

The rows that has to be parsed/ the content in $extract can be viewed here http://pastebin.com/ydiAdiuC.

Basically, i am reading the html content and getting the tag tbody from the html. Now i want to take each tr and td values in the tbody and use it in my page. Any idea how to use, i think i am not using the right method of implementing preg_match_all.


Use PHP's DOM Parsers for this. Not Regular Expressions.

A quick approach:

  • Load in the HTML
  • Get the tbody tag.
  • Get the tr tags within.
0

精彩评论

暂无评论...
验证码 换一张
取 消