开发者

Parsing XML file

开发者 https://www.devze.com 2023-03-14 21:33 出处:网络
I\'ve got a problem with parsing an XML file (nb. well formed one). Consider XML file like this: &l开发者_开发技巧t;?xml version=\"1.0\" encoding=\"utf-8\" ?>

I've got a problem with parsing an XML file (nb. well formed one).

Consider XML file like this:

&l开发者_开发技巧t;?xml version="1.0" encoding="utf-8" ?>
<root>
    <list>
        <item no="1">
            <title>Item's 1 title</title>
            <content>Some long content with <special>tags</special> inside</content>
        </item>
        <item no="2">
            <title>Item's 2 title</title>
            <content>Some long content with <special>tags</special> inside</content>
        </item>
    </list>
</root>

I need to get contents contents of each item in the list and put them in an array. Generally not a problem, but in this case, I can't get my head round it.

Problem lays in <content> contents. It is string with tags in-between. I can't find a way to extract the contents. SimpleXML returns/echoes just the string with anything including and inside <special> tags stripped out. Like this:

Some long content with inside.

I'd ideally want it to get a string like this:

Some long content with <special>tags</special> inside

How do I get it?


You could use DOMDocument which is built into PHP.

<?php

$xml = <<<END
<?xml version="1.0" encoding="utf-8" ?>
<root>
    <list>
        <item no="1">
            <title>Item's 1 title</title>
            <content>Some long content with <special>tags</special> inside</content>
        </item>
        <item no="2">
            <title>Item's 2 title</title>
            <content>Some long content with <special>tags</special> inside</content>
        </item>
    </list>
</root>
END;

$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML($xml);

$nodes = $doc->getElementsByTagName('content');

foreach ( $nodes as $node )
{
  $temp_doc = new DOMDocument('1.0', 'UTF-8');

  foreach ( $node->childNodes as $child )
    $temp_doc->appendChild($temp_doc->importNode($child, true));

  echo $temp_doc->saveHTML(); // Outputs: Some long content with <special>tags</special> inside
}

To select the top level "content" elements (in case there are "content" elements inside), you can use DOMXPath.

$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML($xml); // $xml from the example above

$xpath = new DOMXPath($doc);

$nodes = $xpath->query('/root/list/item/content');

foreach ( $nodes as $node )
{
  $temp_doc = new DOMDocument('1.0', 'UTF-8');

  foreach ( $node->childNodes as $child )
    $temp_doc->appendChild($temp_doc->importNode($child, true));

  echo $temp_doc->saveHTML(); // Outputs: Some long content with <special>tags</special> inside
}


SimpleXML just doesn't support mixed content (text nodes with element nodes as siblings). I suggest you use XMLReader instead.


You could use SimpleXML's asXML function. It will return that called node as the xml string;

$xml = simplexml_load_file($file);
foreach($xml->list->item as $item) {
    $content = $item->contents->asXML();
    echo $content."\n";
}

will print:

<content>Some long content with <special>tags</special> inside</content>
<content>Some long content with <special>tags</special> inside</content>

it's a little ugly but you could then clip out the <content> and </content> with a substr:

$content = substr($content,9,-10);
0

精彩评论

暂无评论...
验证码 换一张
取 消