开发者

PHP DOM Get Tag Before N-th Table

开发者 https://www.devze.com 2022-12-21 14:10 出处:网络
Let\'s say the HTML contains 15 table tags, before each table there is a div tag with some text inside. I need to get the text from the div tag that is directly before the 10th table tag in the HTML m

Let's say the HTML contains 15 table tags, before each table there is a div tag with some text inside. I need to get the text from the div tag that is directly before the 10th table tag in the HTML markup. How would I do that?

The only way I can think of is to use explode('<table', $html) to split the HTML into parts and then get the last div tag from the 9th value of the exploded array with regular expression. Is there a better w开发者_运维技巧ay?

I'm reading through the PHP DOM documentation but I cannot see any method that would help me with this task there.


You load your HTML into a DOMDocument and query it with this XPath expression:

//table[10]/preceding-sibling::div[1]

This would work for the following layout:

<div>Some text.</div>
<table><!-- #1 --></table>
  <!-- ...nine more... -->
<div>Some other text.</div> <!-- this would be selected -->
<table><!-- #10 --></table>
  <!-- ...four more... -->

XPath is capable of doing really complex node lookups with ease. If the above expression does not yet work for you, probably very little is required to make it do what you want.

HTML is structured data represented as a string, this is something substantially different from being a string. Don't give in to the temptation of doing stuff like this with string handling functions like explode(), or even regex.


If you don't feel like learning xpath, you can use the same old-school DOM walking techniques you would use with JavaScript in the browser.

document.getElementsByTagName('table')[9]

then crawl your way up the .previousSibling values until you find one that isn't a TextNode and is a div

I've found that PHP's DOMDocument works pretty well with non-perfect HTML and then once you have the DOM I think you can even pass that into a SimpleXML object and work with it XML-style even though the original HTML/XHTML structure wasn't perfect.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号