开发者

PHP DomDocument XML Load with Broken XML Data

开发者 https://www.devze.com 2022-12-10 21:54 出处:网络
How do you deal with broken da开发者_C百科ta in XML files? For example, if I had <text>Some &improper; text here.</text>

How do you deal with broken da开发者_C百科ta in XML files? For example, if I had

<text>Some &improper; text here.</text>

I'm trying to do:

 $doc = new DOMDocument();
 $doc->validateOnParse = false;
 $doc->formatOutput = false;
 $doc->load(...xml');

and it fails miserably, because there's an unknown entity. Note, I can't use CDATA due to the way the software is written. I'm writing a module which reads and writes XML, and sometimes the user inserts improper text.

I've noticed that DOMDocument->loadHTML() nicely encodes everything, but how could I continue from there?


Use htmlspecialchars to serialize special xml characters before pushing the input into your xml/xhtml dom. While its name is prefixed with "html", based on the only characters it replaces, it is truely useful for xml data serialization.


Perhaps you can use preg_replace_callback to do the heavy lifting with entities for you:

http://php.net/manual/en/function.preg-replace-callback.php

function fixEntities($data) {
    switch(substr($data, 1, strlen($data) - 2)) {
        case 'amp':
        case 'lt':
        case 'gt':
        case 'quot': // etc., etc., etc.
            return $data;
    }
    return '';
}
$xml = preg_replace_callback('/&([a-zA-Z0-9#]*);{1}/', 'fixEntities', $xml);


If you are the one who writes the xml, there should be no problem, as you can encode any user input into entities before putting it into xml.

0

精彩评论

暂无评论...
验证码 换一张
取 消