The characters I am getting from the URL, for example www.mydomain.com/?name=john , were fine, as longs as they were not in Russian.
If they were are in Russian, I was getting '����'.
So I added $name= iconv("cp1251","utf-8" ,$name); and now it works fine for Russian and English characters, but screws up other languages. :)))
For example 'Jānis' ( Latvian ) that worked fine before iconv, now turns into 'jДЃnis'.
Any idea if there's some universal encoder that would work with both the Cyrillic lan开发者_JAVA百科guages and not screw up other languages?
Why don't you just use UTF-8 with all files and processes?
Actually this runs down to the problem of how the URL is encoded. If you're clicking a link on a given page the browser will use the page's encoding to sent the request but if you enter the URL directly into the address-bar of your browser the behavior is somehow undefined as there is no standardized way on the encoding to use (Firefox provides an about:config
switch to use UTF-8 encoded URLs).
Besides using some encoding detection there is no way to know the encoding used with the URL in the given request.
EDIT:
Just to backup what I said above, I wrote a small test script that shows the default behavior of the five major browsers (running Mac OS X in my case - Windows Vista via Parallels in case of the IE):
$p = $_GET['p'];
for ($i = 0; $i < strlen($p); $i++) {
// this displays the binary data received via the URL in hex format
echo dechex(ord($p[$i])) . ' ';
}
Calling http://path/to/script.php?p=äöü
leads to
- Safari (4.0.5):
c3 a4 c3 b6 c3 bc
- Firefox (3.6.3):
c3 a4 c3 b6 c3 bc
- Google Chrome (5.0.375.38):
c3 a4 c3 b6 c3 bc
- Opera (10.10):
e4 f6 fc
- Internet Explorer (8.0.6001.18904):
e4 f6 fc
So obviously the first three use UTF-8 encoded URLs while Opera and IE use ISO-8859-1 or some of its variants. Conclusion: you cannot be sure what's the encoding of textual data sent via an URL.
Seems like the issue is the file encoding, you should always use UTF-8 no BOM as the prefered encoding for your .php
files, code editors such as Intype let you easily specify this (UTF-8 Plain).
Also, add the following code to your files before any output:
header('Content-Type: text/html; charset=utf-8');
You should also read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.
精彩评论