I'm trying to use the simplehtmldom script to get at some text. The HTML structure is as follows
<div id="posts">
<div align="center">
<SEVERAL LEVELS OF HTML>
<strong>XXX</strong>
</SEVERAL LEVELS OF HTML>
</div>
<div align="center">
<SEVERAL LEVELS OF HTML>
<strong>IGNORE</strong>
</SEVERAL LEVELS OF HTML>
</div>
<div align="center">
<SEVERAL LEVELS OF HTML>
开发者_StackOverflow <strong>IGNORE</strong>
</SEVERAL LEVELS OF HTML>
</div>
</div>
The text I'm trying to get at is the XXX string, in the first <strong>
tags inside the first <div>
with attribute align="center"
, which is inside the <div>
with id="posts"
. I'm not interested in the text in <div align="center">
tags further down.
The "several levels of HTML" include messy nested tables etc.
My code: I'm using descendant selectors and obviously I'm "skipping" through the several levels of html. Is this the reason why my print_r
shows "Trying to get property of non-object"
?
$html = file_get_html($page_1);
$es = $html->find('div#posts div[align=center] strong');
print_r($es->plaintext); die;
Strangely enough this statement also returns the same "Trying to get property of non-object"
result. What am I doing wrong?
$es = $html->find('div#posts');
2 possible reasons :
- In
$html = file_get_html($page_1);
,$page_1
may not be a URL. If it's a string containing html usestr_get_html
as in$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');
instead. - The html contains more than one
div#posts
(which shouldn't).
精彩评论