Possible Duplicate:
Best methods to parse HTML with PHP
for example i have a html code like :
<table width="100%" border="0" cellspacing="0" cellpadding="0" class="rowData">
<tr align="center" class="fnt-vrdana-mavi" >
<td style="font-size:11px" colspan=3><b>Text text text</b>:3</td>
</tr>
<tr class="header" align="center">
<td height="18" colspan="3">Text text text</td>
</tr>
<tr align="center" class="fnt-vrdana" bgcolor="#eff3f4" height="18">
<td width="32%" height="17"><b>1</b></td>
<td width="34%"><b>0</b></td>
<td width="34%"><b>2</b></td>
</tr>
<tr align="center" class="fnt-v开发者_Go百科rdana-mavi">
<td height="17">2.90</td>
<td>3.20</td>
<td>1.85</td>
</tr>
</table>
Which is best regular expression to match all data from inside <td>
tags?
I normally suggest if you need to actually express what you're looking for in a HTML document to use an xpath
expression for that because it can give you the actual value whereas regex'es are not able to further parse the HTML/XML, and xpath
expressions are much more fine-grained. See the output which returns the text-value for example w/o any further tags inside:
array(8) {
[0]=>
string(16) "Text text text:3"
[1]=>
string(14) "Text text text"
[2]=>
string(1) "1"
[3]=>
string(1) "0"
[4]=>
string(1) "2"
[5]=>
string(4) "2.90"
[6]=>
string(4) "3.20"
[7]=>
string(4) "1.85"
}
Code:
$html = <<<EOD
<table width="100%" border="0" cellspacing="0" cellpadding="0" class="rowData">
<tr align="center" class="fnt-vrdana-mavi" >
<td style="font-size:11px" colspan=3><b>Text text text</b>:3</td>
</tr>
<tr class="header" align="center">
<td height="18" colspan="3">Text text text</td>
</tr>
<tr align="center" class="fnt-vrdana" bgcolor="#eff3f4" height="18">
<td width="32%" height="17"><b>1</b></td>
<td width="34%"><b>0</b></td>
<td width="34%"><b>2</b></td>
</tr>
<tr align="center" class="fnt-vrdana-mavi">
<td height="17">2.90</td>
<td>3.20</td>
<td>1.85</td>
</tr>
</table>
EOD;
// create DomDocument to operate xpath on
$doc = new DomDocument;
$doc->loadHTML($html);
// create DomXPath
$xpath = new DomXPath($doc);
// perform the XPath query
$nodes = $xpath->query('//td');
// process nodes to return their actual value
$values = array();
foreach($nodes as $node) {
$values[] = $node->nodeValue;
}
var_dump($values);
/<td.*?>(.*?)<\/td>/
would get all data between the <td>
and </td>
.
Getting the data from inside a <td>
tag would be /<td([^>]*)>/
or /<td(.*?)>/
精彩评论