Hey, Consider i have the follwing html syntax
<p>xyz</p>
<p>abc</p>
I want to retrieve the text (xyz and abc) using DOM.
This is my code.
<?php
$link='http://www.xyz.com';
$ret= getLinks($link);
print_r ($ret);
function getLinks($link)
{
/*** return array ***/
$ret = array();
/*** a new dom object ***/
$开发者_开发百科dom = new domDocument;
/*** get the HTML (suppress errors) ***/
@$dom->loadHTML(file_get_contents($link));
/*** remove silly white space ***/
$dom->preserveWhiteSpace = false;
/*** get the links from the HTML ***/
$text = $dom->getElementsByTagName('p');
/*** loop over the links ***/
foreach ($text as $tag)
{
$ret[] = $tag->innerHTML;
}
return $ret;
}
?>
But i get an empty result. wat am i miissing here.?
To suppress parsing errors, do not use
@$dom->loadHTML(file_get_contents($link));
but
libxml_use_internal_errors(TRUE);
Also, there is no reason to use file_get_contents
. DOM can load from remote resources.
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile($link);
libxml_clear_errors();
Also, Tag Names are case sensitive. You are querying for <P>
when the snippet contains <p>
. Change to
$text = $dom->getElementsByTagName('p');
And finally, there is no innerHTML
. A userland solution to fetch it is in
- How to get innerHTML of DOMNode?
You can fetch the outerHTML
with
$ret[] = $dom->saveHtml($tag); // requires PHP 5.3.6+
or
$ret[] = $dom->saveXml($tag); // that will make it XML compliant though
To get the text content of the P tag, use
$ret[] = $tag->nodeValue;
First, case matters:
$dom->getElementsByTagName('P');
Should be:
$dom->getElementsByTagName('p');
Second, innerHTML
is not a valid DOMElement property.
Try:
echo $dom->textContent;
echo $dom->nodeValue;
However, this won't return the inner HTML tags and will strip them. There are a few examples on how to make it work in the PHP manual.
精彩评论