Json to xml with greek characters_问答_开发者_运维开发者技术经验分享

I am using curl to get a json file which can be located here: (It's way too long to copy paste it): http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.json?localeId=el_GR

After that i use json_decode to get the assosiative array.Till here everything seems ok.When i am using var_dump the characters inside the array are in Greek.After that i am using the following code:

    $JsonClass = new ArrayToXML();
    $mydata=$JsonClass->toXml($json);

class ArrayToXML {

public static function toXML( $data, $rootNodeName = 'ResultSet', &$xml=null ) {

    // turn off compatibility mode as simple xml throws a wobbly if you don't.
   // if ( ini_get('zend.ze1_compatibility_mode') == 1 ) ini_set ( 'zend.ze1_compatibility_mode', 0 );
    if ( is_null( $xml ) ) //$xml = simplexml_load_string( "" );
        $xml = simplexml_load_string("<?xml version='1.0' encoding='UTF-8'?><$rootNodeName />");

    // loop through the data passed in.
    foreach( $data as $key => $value ) {

        $numeric = false;

        // no numeric keys in our xml please!
        if ( is_numeric( $key ) ) {
            $numeric = 1;
            $key = $rootNodeName;
        }

        // delete any char not allowed in XML element names
        `enter code here`$key = preg_replace('/[^a-z0-9\-\_\.\:]/i', '', $key);

        // if there is another array found recrusively call this function
        if ( is_array( $value ) ) {
            $node = ArrayToXML::isAssoc( $value ) || $numeric ? $xml->addChild( $key ) : $xml;

            // recrusive call.
            if ( $numeric ) $key = 'anon';
            ArrayToXML::toXml( $value, $key, $node );
        } else {

            // add single node.
            $value = htmlentities( $value );
            $xml->addChild( $key, $value );
        }
    }

    // pass back as XML
    return $xml->asXML();


}
public static function isAssoc( $array ) {
    return (is_array($array) && 0 !== count(array_diff_key($array, array_keys(array_keys($array)))));
}

}

And here comes the problem .All the greek characters inside the result are in some strange characters Î?Î?Î¥Î?Î?Î¡Î©Î£Î?Î? for example.I really don't know what am i doing wrong.I am really bad with encoding /decoding things :(.

And to make this a bit more clear:

Here is how the assosiative array (on of the parts that i have the problem with) looks like:

{ ["resources"]=> array(4) { ["team-4833"]=> string(24) "ΛΕΥΚΟΡΩΣΙΑ U21" ["t-429"]=> string(72) "ΠΡΟΚΡΙΜΑΤΙΚΑ ΕΥΡΩΠΑΪΚΟΥ ΠΡΩΤΑΘΛΗΜΑΤΟΣ" ["t-429-short"]=> string(6) "ΠΕΠ" ["team-15387"]=> string(16) "ΕΛΛΑΔΑ U21" } ["locale"]=> string(5) "el_GR" } ["relatedNum"]=> NULL }

And here is what i get after the use of simplexml

<resources><team-4833>&Icirc;?&Icirc;?&Icirc;&yen;&Icirc;?&Icirc;?&Icirc;&iexcl;&Icirc;&copy;&Icirc;&pound;&Icirc;?&Icirc;? U21</team-4833><t-429>&Icirc;&nbsp;&Icirc;&iexcl;&Icirc;?&Icirc;?&Icirc;&iexcl;&Icirc;?&Icirc;?&Icirc;?&Icirc;&curren;&Icirc;?&Icirc;?&Icirc;? &Icirc;?&Icirc;&yen;&Icirc;&iexcl;&Icirc;&copy;&Icirc;&nbsp;&Icirc;?&Icirc;&ordf;&Icirc;?&Icirc;?&Icirc;&yen; &Icirc;&nbsp;&Icirc;&iexcl;&开发者_StackOverflow社区;Icirc;&copy;&Icirc;&curren;&Icirc;?&Icirc;?&Icirc;?&Icirc;?&Icirc;?&Icirc;?&Icirc;&curren;&Icirc;?&Icirc;&pound;</t-429><t-429-short>&Icirc;&nbsp;&Icirc;?&Icirc;&nbsp;</t-429-short><team-15387>&Icirc;?&Icirc;?&Icirc;?&Icirc;?&Icirc;?&Icirc;? U21</team-15387></resources><locale>el_GR</locale></lexicon><relatedNum></relatedNum></betGames>

Thanks in advance for your replies.

PS:I have also <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> in the page i display the result but it doesnt help.

I still didn't find a solution with that so i used a different approach something like Yannis suggested.I saved the XML in a file using the class i found here http://www.phpclasses.org/package/1826-PHP-Store-associative-array-data-on-file-in-XML.html .

After that i load the xml with simplexml_load_file and i used xslt to access the data in all nodes and store it in my database.It worked fine that way .If anyone still wants to try and explain me why it doesn't work with the way i tried to do it at the start feel free (Just for the learning purpose :p)Thanks for your replies :).

There is no need - The current json is given in an xml format as well here apparently:

http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.xml?localeId=el_GR

Just had to play with the url parameters a bit :)

This worked for me on chrome using php version 5.3.6:

    $json = file_get_contents('http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.json?localeId=el_GR');
    $json = json_decode($json, true);
    $xml = new SimpleXMLElement('<ResultSet/>');
    array_walk_recursive($json, array ($xml, 'addChild'));
    print $xml->asXML();
    exit();

Clearly your bug is that you are manipulating UTF‑8–encoded Unicode as though those bytes were ISO‐8859‑1.

I cannot see where this is happening; probably in your call to htmlentities, whatever that is.

It may need to use some sort of “multibyte” hack, perhaps including such things as this sort of pattern:

/([^\x00-\x7F])/u

wiht an explicit /u so it works on logical code points instead of 8‑bit code units (read: bytes). It might do this to grab one non-ASCII code point so it can replace it with a numeric entity. Without the easily forgotten /u, it would work on bytes not code points, which matches what your description shows happening.

It could be this sort of thing, or it might be that you have to swap over to some of the mb_*() functions instead of normal ones. This is to work around the fundamental underlying PHP bug that there it no real Unicode support in the language, just a few band-aides here and there that seem to like to fall off from time to time for no good reason.

If you could use a clean language with not just proper Unicode support but also a clear separation between physical bytes and abstract characters, this sort of thing would not be happening. But I bet it’s a common problem that others must be having too, so I would be really surprised if it were a library bug instead of a (perfectly understandable!) oversight somewhere in your code.

answer in your question from GREECE--------- word "? [ΛΕΥΚΟ]"? it has ASC (his code character) 203-197-213-202-207 ()---------- when however you read him [prostithete] the 206 and are doubled the letters---------- but also change code as following 206-(203-48=155)-206-(197-48=149)-206-(213-48=165)- -206-(213-48=165)-206-(202-48=154)-206-(207-48=159)------------- consequently the solution they is checking to a character if you find the 206 to >ignore--------- him and in the ASC of next character to add number 48 and to find the new character. >------------ Because I deal also i with the [ΑΠΟΚΟΔΙΚΟΠΟΙΗΣΗ] of [ΟΠΑΠ] every new knowledge they is >[ΕΥΠΡΟΣΔΕΚΤΟ]------ in mail -->? bluegt03@in.gr