I am trying to retreive the body text located开发者_如何学运维 in this span class attribute.
<span id="" style="color:#525B64;">The quick brown fox jumped over the lazy dog.</span>
I tested it on my web server and I get no errors but the page is blank. I'm very new to this so I do not know where to go from here.
Here is my code.
<?php
// Load remote file, supress parse errors
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://somewebpage.com');
libxml_clear_errors();
// use XPath to find all nodes with a class attribute of header
$xp = new DOMXpath($dom);
$nodes = $xp->query('//span[@class="msgBody"]');
// output first item's content
echo $nodes->item(0)->nodeValue;
?>
Everything seems fine in this code.
What I'd try to do is:
- remove the line which supresses parse errors.
- load the remote file with
file_get_contents
to see if it loads properly - query document with XPath like
//*
and loop over resultingDOMNodeList
(with foreach) to see if the tree is built correctly.
Btw. to surpress parse errors reported by ->loadHTMLFile()
method I use @
operator.
The DOM creates nodes for everthing: attributes, text, comments, elements, you name it. So you're not after the value of the span node even though it might seem that way, you actually want to get the TextNode inside of the span and get its value instead. Try something like:
echo $nodes->item(0)->childNodes->item(0)->nodeValue
You can also get this directly from the xpath query:
$nodes = $xp->query('//span[@class="msgBody"]/text()');
(Though I've never had much luck with xpath, personally.)
Are you sure there is only one span
element with this class in the document you are parsing?
Maybe ->item(0)
returns empty element and the desired element is next on the list?
Very often such behavior is due to a default namespace (check to see if there is something similar to this: xmlhs="http://www.w3.org/1999/xhtml"
).
Using in XPath expressions element names that are in default namespace, is the most FAQ in the xpath tag -- just search for "xpath default namespace" to find many good answers.
精彩评论