开发者

php xpath table parsing question

开发者 https://www.devze.com 2023-02-02 22:08 出处:网络
I have a several tables nested within a table that I am parsing using php xpath. I\'m using a series of xpaths because I\'m breaking up the code into conceptual units across several methods calls, an

I have a several tables nested within a table that I am parsing using php xpath.

I'm using a series of xpaths because I'm breaking up the code into conceptual units across several methods calls, and this structure has been working perfectly in other scenarios without nested tables.

Here's the code:

// create a host DOM document
$dom = new DOMDocument();

// load the html string into the dom
$dom->loadHTML($html_string);

// make an xpath object out of the dom
$xpath = new DOMXpath($dom);

// run query to extract the rows from the master table
$context_nodes = $xpath->query('//table[@id="id1"]/tr[position()>1]');

// parse data from the individual tables nested in each master table row
foreach($context_nodes as $context_node){
    $interesting_nodes[] = $xpath->query('table[2]/tr[td[2]]', $context_node);
}

The resulting $interesting_nodes array contains empty DOMNodeLists.

The $context_nodes DOMNodeList contains valid data. The html content of each $context_node looks like this:

<td>
    <table></table>
    <table>
        <tr>
            <td></td>
        </tr>
        <tr>
            <td></td>
            <td></td>
        </tr>
    </table>
</td>

I tried the following simplified $intesting_nodes query to match any table:

$intesting_nodes[] = $xpath->query('table', $context_node);

But that still produces the same empty DOMNodeLists.

And now the interesting part

When I try an $interesting_nodes query like so:

$interesting_nodes[] = $xpath->query('*[2]/*[*[2]]', $context_node);

Then everything works perfectly; but if I replace any "*" with the corresponding "table", "tr", or "td" tags, then the query breaks once again.

Does anyone else have experience with this behavior and relative xpath queries in php?

I would very much like to be able to use a more exact query, and would prefer to be able to keep the query relative lik开发者_如何学JAVAe it is rather than making it absolute.


I figured it out. :)

The php xpath implementation does not know what to do with table internal nodes (ie: tr, td) if the master table tags are not present.

My outer td tags were causing unexpected results from the xpath query.

Modified the $context_nodes query to:

$context_nodes = $xpath->query('//table[@id="id1"]/tr[position()>1]/td');

And we're good.


I think maybe you need to use a relative path in the subsequent query (preceded by . ), see http://php.net/manual/en/domxpath.query.php#99760

0

精彩评论

暂无评论...
验证码 换一张
取 消