I'm trying to fetch the content of a div in a html page using xpath and domdocument. This is the structure of the page:
<div id="content">
<div class="div1"></div>
<span class="span1></span>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<div class="div2"></div>
</div>
I want to get only the content of p, not spans and divs. I came thru this xpath expression .//*[@id='content']/p but guess something's not right because i'm getting only the first p. Tried using other expression with following-sibling and node() but all return the first p only.
.//*[@id='content']/span/following-sibling::p
.//*[@id='content']/node()[self::p]
This is how's used xpath:
$domDocument=new DOMDocument();
$domDocument->encoding = 'UFT8';
$domDocument->loadHTML($page);
$domXPath = new DOMXPath($domDocument);
$domNodeL开发者_运维百科ist = $domXPath->query($this->xpath);
$content = $this->GetHTMLFromDom($domNodeList);
And this is how i get html from nodes:
private function GetHTMLFromDom($domNodeList){
$domDocument = new DOMDocument();
$node = $domNodeList->item(0);
foreach($node->childNodes as $childNode)
$domDocument->appendChild($domDocument->importNode($childNode, true));
return $domDocument->saveHTML();
}
This XPath expression:
//div[@id='content']/p
Result in the wanted node set (five p
elements)
EDIT: Now it's clear what is your problem. You need to iterate over the NodeList:
private function GetHTMLFromDom($domNodeList){
$domDocument = new DOMDocument();
foreach ($nodelist as $node) {
$domDocument->appendChild($domDocument->importNode($node, true));
}
return $domDocument->saveHTML();
}
精彩评论