Stuck on traversing a html dom on a page_问答_开发者

Stuck on traversing a html dom on a page

开发者 https://www.devze.com 2023-03-28 05:58 出处：网络

Ok. I\'m stuck once again and it seems that internet just ran out of traversing a dom with HTML DOM tutorials. I have this page (http://www.nasdaqomxbaltic.com/market/?pg=news&news_id=250910) and

Ok. I'm stuck once again and it seems that internet just ran out of traversing a dom with HTML DOM tutorials. I have this page (http://www.nasdaqomxbaltic.com/market/?pg=news&news_id=250910) and what I'm trying to do is to get the text The statement of shareholders for shares sale and for shares purchase attached. and the attached files into a variables. I'm trying to do it the most efficient way so I'm not using simple_html_dom. I wouldn't use xpath if i had choice or if it would be faster, but I'm not sure:)

EDIT: Tried Phil's code. Can't seem to figure out why it still doesn't work.

   <?
$dom = new DOMDocument();
@$dom->loadHTMLFile("http://www.nasdaqomxbaltic.com/market/?pg=news&news_id=250910");

$xpath = new DOMXPath($dom);
$paragraph = $xpath->query('//table[@id="previewTable"]/tbody/tr[2]/td/p');//tried removing tbody, doesn't fix, why is it there?
if ($paragraph->length == 1) {//what is this?
     $sentence 开发者_运维知识库= $paragraph->nodeValue;
    print_r($sentence);//doesnt work (blank)
}
$links = $xpath->query('//table[@id="previewTable"]//td[@class="tdAttachment"]//a');
foreach ($links as $link) {
    $linkName = $link->nodeValue;
    $linkUrl = $link->getAttribute('href');
echo $linkName;
echo $linkUrl;//works
}
?>

It really depends on how fixed that markup is.

Assuming the structure is fairly static, to retrieve the sentence, try

$paragraphs = $xpath->query('//table[@id="previewTable"]/tr[2]/td/p');
if ($paragraphs->length > 0) { // check to make sure we got at least one node
    $sentence = $paragraphs->item(0)->nodeValue;
}

Retrieving the links is slightly more complex

$links = $xpath->query('//table[@id="previewTable"]//td[@class="tdAttachment"]//a');
foreach ($links as $link) {
    $linkName = $link->nodeValue;
    $linkUrl = $link->getAttribute('href');

    // do something with these values
}