I am doing some php html parsing and this is the code i have right now
function get_tag($htmlelement,$attr, $value, $xml ,$arr) {
$attr = preg_quote($attr);
$开发者_开发技巧value = preg_quote($value);
if($attr!='' && $value!='')
{
$tag_regex = '/<'.$htmlelement.'[^>]*'.$attr.'="'.$value.'">(.*?)<\\/'.$htmlelement.'>/si';
preg_match($tag_regex,$xml,$matches);
}
else
{
$tag_regex = '/'.$htmlelement.'[^>]*"(.*?)\/'.$htmlelement.'/i';
preg_match_all($tag_regex,$xml,$matches);
}
if($arr)
return $matches;
else
return $matches[1];
}
$htmlcontent = file_get_contents("doc.html");
$extract = get_tag('tbody','id', 'open', $htmlcontent,false);
$trows = get_tag('tr','', '', $htmlcontent,false);
The rows that has to be parsed/ the content in $extract can be viewed here http://pastebin.com/ydiAdiuC.
Basically, i am reading the html content and getting the tag tbody from the html. Now i want to take each tr and td values in the tbody and use it in my page. Any idea how to use, i think i am not using the right method of implementing preg_match_all.
Use PHP's DOM Parsers for this. Not Regular Expressions.
A quick approach:
- Load in the HTML
- Get the
tbody
tag. - Get the
tr
tags within.
精彩评论