Hello good day I am trying to scrape an xml feed that was given to us, I am using simple htmldom to scrape it but some contents have cdata, how can I remove it?
<date>
<weekday>
<![CDATA[ Friday
]]>
</weekday>
</date>
php
<?php
<?php
include('simple_html_dom.php');
include ('phpQuery.php');
if (ini_get('allow_url_fopen'))
$xml = file_get_html('http://www.link.com/url.xml'); }
else{ $ch = curl_init('http://www.link.com/url.xml');
curl_setopt ($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$src = curl_exec($ch);
$xml = str_get_html($src, false); }
?>
<?php
foreach($xml->find('weekday') as $e)
echo $e->innertext . '<br>';
?>
I believe by default simplehtmldom remove开发者_高级运维s the cdata but for some reason it doesn't work.
Kindly tell me if you need any info that would be helpful to solve this issue
Thank you so much for your help
You can make use of another xml parser that is able to convert cdata into a string (Demo):
$innerText = '<![CDATA[ Friday
]]>';
$innerText = (string) simplexml_load_string("<x>$innerText</x>"));
Extended code-example based on OP's code
# [...]
<?php
foreach($xml->find('weekday') as $e)
{
$innerText = $e->innertext;
$innerText = (string) simplexml_load_string("<x>$innerText</x>");
echo $innerText . '<br>';
}
?>
Usage instructions: Locate the line which contains the foreach
and then compare the original code with the new code (only the foreach
in question has been replaced).
I agree with the other answer - just allow CDATA to be shown. I'd recommend simpleXML
$xml = simplexml_load_file('test.xml', 'SimpleXMLElement', LIBXML_NOCDATA);
echo '<pre>', print_r($xml), '</pre>';
LIBXML_NOCDATA is important - keep that in there.
精彩评论