开发者

Using simplehtmldom to grab a text snippet

开发者 https://www.devze.com 2023-02-09 23:54 出处:网络
I\'m trying to use the simplehtmldom script to get at some text. The HTML structure is as follows <div id=\"posts\">

I'm trying to use the simplehtmldom script to get at some text. The HTML structure is as follows

<div id="posts">
  <div align="center">
    <SEVERAL LEVELS OF HTML>
      <strong>XXX</strong>
    </SEVERAL LEVELS OF HTML>
  </div>
  <div align="center">
    <SEVERAL LEVELS OF HTML>
      <strong>IGNORE</strong>
    </SEVERAL LEVELS OF HTML>
  </div>
  <div align="center">
    <SEVERAL LEVELS OF HTML>
开发者_StackOverflow      <strong>IGNORE</strong>
    </SEVERAL LEVELS OF HTML>
  </div>
</div>

The text I'm trying to get at is the XXX string, in the first <strong> tags inside the first <div> with attribute align="center", which is inside the <div> with id="posts". I'm not interested in the text in <div align="center"> tags further down.

The "several levels of HTML" include messy nested tables etc.

My code: I'm using descendant selectors and obviously I'm "skipping" through the several levels of html. Is this the reason why my print_r shows "Trying to get property of non-object"?

$html = file_get_html($page_1);
$es = $html->find('div#posts div[align=center] strong');
print_r($es->plaintext); die;

Strangely enough this statement also returns the same "Trying to get property of non-object" result. What am I doing wrong?

$es = $html->find('div#posts');


2 possible reasons :

  1. In $html = file_get_html($page_1);, $page_1 may not be a URL. If it's a string containing html use str_get_html as in $html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>'); instead.
  2. The html contains more than one div#posts (which shouldn't).
0

精彩评论

暂无评论...
验证码 换一张
取 消