开发者

PHP UTF-encoded URL-string

开发者 https://www.devze.com 2023-01-09 18:04 出处:网络
When I type in 开发者_如何学JAVAFirefox (in the address line) URL like http://www.example.com/?query=Траливали, it is automatically encoded to http://www.example.com/?query=%D2%F0%E0%EB%E8%E2

When I type in 开发者_如何学JAVAFirefox (in the address line) URL like http://www.example.com/?query=Траливали, it is automatically encoded to http://www.example.com/?query=%D2%F0%E0%EB%E8%E2%E0%EB%E8.

But URL like http://www.example.com/#ajax_call?query=Траливали is not converted.

Other browsers such as IE8 do not convert query at all.

The question is: how to detect (in PHP) if query is encoded? How to decode it?

I've tried:

  1. $str = iconv('cp1251', 'utf-8', urldecode($str) );

  2. $str = utf8_decode(urldecode($str));

  3. $str = (urldecode($str));

  4. many functions from http://php.net/manual/en/function.urldecode.php Nothing works.

Test:

$str = $_GET['str'];

d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urldecode('%D2%F0%E0%EB%E8%E2%E0%EB%E8'));

d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == $str);

d('Траливали' == $str);

d(urldecode($str));

d(utf8_decode(urldecode($str)));

!!! d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urlencode($str)); !!!

Returns:

[false] [false] [false] ��������� ???? [true]

Some kind of a solution: http://www.example.com/Траливали/ - send a query as a url part and parse with mod_rewrite.


It is not converted as having the query part of the URL after the fragment is not valid.

RFC 3986 defines a URI as composed of the following parts:

     foo://example.com:8042/over/there?name=ferret#nose
     \_/   \______________/\_________/ \_________/ \__/
      |           |            |            |        |
   scheme     authority       path        query   fragment

The order cannot be changed. Therefore,

URL1: http://www.example.com/?query=Траливали#ajax_call

will be handled properly while

URL2: http://www.example.com/#ajax_call?query=Траливали

will not. If we look at URL2, IE actually handles the URL properly by detecting the fragment as #ajax_call?query=Траливали without a query. Fragment is always last and are never sent to the server.

IE will properly encode the query component of URL1 as it will detect it as a query.

As for decoding in PHP, %D2 and similar is automatically decoded in the $_GET['query'] variable. The reason why the $_GET variable was not properly populated was because in URL2, there is no query according to the standard.

Also, one last thing... when doing 'Траливали' == $_GET['query'], this will only be true if your PHP script itself is encoded in UTF-8. Your text editor should be able to tell you the encoding of your file.


rawurldecode($_GET['query']);

but this should actually have been done already by php ;-)

edit you're stating "nothing works" - what are you trying? if the text doesn't appear on screen as you want it, when you echo $_GET['query']; for example, your problem might be the encoding you are specifying for the page sent back to the browser.

Include a line

header("Content-Type: text/html; charset=utf-8");

and see if it helps.


How the fragment is encoded, is unfortunately, browser-dependent:

Is fragment ID (hash) encoded by applying RFC-mandated URL escaping rules?
MSIE: NO
Firefox: PARTLY
Safari: YES
Opera: NO
Chrome: NO
Android: YES

As to the question of what encoding the browser uses to encode international (read: non-ASCII) characters before converting them to %nn escape sequences, "most browsers deal with this by sending UTF-8 data by default on any text entered in the URL bar by hand, and using page encoding on all followed links." (same source).


You could use UTF8::autoconvert_request() for this.

Take a look at http://code.google.com/p/php5-utf8/ for more information.


URLs are limited to certain ascii chars. Non-url friendly chars are supposed to be url-encoded (the %hh encoding you see). Some browsers might automatically encode urls that appear on the addr line.


The answer is easy: string being encoded always. As it's stated in the HTTP standard.
And what is firefox displays - it doesn't matter.

Also, as PHP decode query string automatically, no decoding required either.

Note that '%D2%F0%E0%EB%E8%E2%E0%EB%E8' is single-byte encoding, so, you have your page probably in 1251. At least HTTP header says that to the browser.
While AJAX always use utf-8.

So, you have just to either use single encoding (utf-8) for your pages, or distinguish ajax calls from regular ones.

As for the fragment - do not use a fragment value to send it to the server. Have a JS variable, and then use it twice - to set a fragment and to send to the server using JSON.


RFC 1738 states that only alphanumerics, the special characters $-_.+!*'()," and reserved characters ;/?:@=& are unencoded within a URL. Everything else is encoded by the HTTP client, i.e. Web browser. You can use rawurldecode() whether or not PHP automatically decodes the query string. There's no danger in double-decoding.

0

精彩评论

暂无评论...
验证码 换一张
取 消