开发者

Load an invalid XML in PHP DOM

开发者 https://www.devze.com 2023-03-26 00:44 出处:网络
I have and input XML file that is not correctly formatted ( ie. it has \'&\' instead of \'& amp;\')

I have and input XML file that is not correctly formatted ( ie. it has '&' instead of '& amp;') When i try to load this XML using PHP DOM, $doc->load("file.xml") it throws and error and stops the parsing.

Is there any way to load this un-formatted XML? and No I cant edit the source XML file. I did try using $doc->loadHTML() but it throws errors all over the place.

I wanted to know if there is a proper way to do this (like开发者_如何转开发 load file contents and change it using regex or something similar)


Try setting $doc->validateOnParse = false; before loading your XML via $doc->loadHTML(...).


First, check that it's the & that's causing the error and not something else.

One way or another, you'll have to modify the XML to get it parsed. The HTML in loadHTML is loaded from a string, can't you just replace the invalid characters with the correct ones?

If your installation supports the PHP Tidy extension (http://php.net/manual/en/book.tidy.php) you could try to clean it up with that, though in my experience it's far from foolproof.


If you are sure that's the only thing making it not validate, then you could try loading the file into a string with file_get_contents() function, then search & replace through the string to change the &'s into &'s, then place that string into simpleXML like $xml = simplexml_load_string($cleaned_string);

0

精彩评论

暂无评论...
验证码 换一张
取 消