开发者

weird chars after getting value from XML php

开发者 https://www.devze.com 2023-03-09 01:49 出处:网络
I\'m trying to get a value with a € sing out of xml but when I try it gives back weird code. $xmlDate = $searchNode->getElementsByTagName( \"kostenvoorverkoop\" );

I'm trying to get a value with a € sing out of xml but when I try it gives back weird code.

$xmlDate = $searchNode->getElementsByTagName( "kostenvoorverkoop" );
$valueKostenvoorverkoop = htmlentities($xmlDate->item(0)->nodeValue,ENT_QUOTES,"UTF-8");
//gives back Á€10,- instead of €10,-

can't find the problem.

//XML
<?xml version="1.0" encoding="ISO-8859-1" ?>
<price>€10</price>

If I leave the htmlentities it gives a completely wierde string like ÁáÙ%10 <---- not exactly this but you know what I mean.

if anyone can help me with this it would help m开发者_如何学运维e greatly, thanks in advance.

edit: found a small work around: change the € for &amp;euro;. know not clean but works.


<?xml version="1.0" encoding="ISO-8859-1" ?>
<price>€10</price>

The character does not exist in ISO-8859-1, so this XML declaration can't possibly be right.

The output Á€ suggests the file has actually been encoded in Windows code page 1252 (Western European), which is similar to ISO-8859-1 but has different characters in the range 0x80–0x9F, include the euro sign.

PHP has parsed the data as ISO-8859-1, where the CP1252 encoding of , byte 0x80, maps to the control character U+0080. It then gives you the Unicode string containing U+0080 as a UTF-8-encoded byte string, U+00C2,U+0080. Outputting that to a browser in a page served as cp1252, ISO-8859-1 (for tedious confusing legacy reasons) or without a charset on a Western European machine, gives Á€. htmlentities() doesn't encode this in any way because there's no HTML entity for the control code U+0080.

Here's how you should proceed:

  • If you must have your XML input file in cp1252, state that in the XML declaration's encoding="windows-1252" rather than the inaccurate ISO-8859-1. XML parsers aren't required to be able to read cp1252, though, so better for interoperability would be to just use the default UTF-8 encoding and re-save the file to match.

  • Serve your output HTML page as UTF-8, using a Content-Type header or meta tag. Then use htmlspecialchars() instead of htmlentities() so you don't waste time encoding non-ASCII characters that don't need it.


Did you tried to change the encoding in the xml from ISO-8859-1 to UTF-8 ? Or just put in php this charset ISO-8859-1 when you are making the decoding..

0

精彩评论

暂无评论...
验证码 换一张
取 消