开发者

Java URI class: constructor determines whether or not query is encoded?

开发者 https://www.devze.com 2023-03-01 06:25 出处:网络
Is this behavior intentional? //create the same URI using two different constructors URI foo = null, bar = null;

Is this behavior intentional?

//create the same URI using two different constructors

URI foo = null, bar = null;
try { 
    //constructor: URI(uri string)
    foo = new URI("http://localhost/index.php?token=4%2F4EzdsSBg_4vX6D5pzvdsMLDoyItB");
} catch (URISyntaxException e) {} 
try { 
    //constructor: URI(scheme, authority, path, query, fragment) 
    bar = new URI("http", "localhost", "/index.php", "token=4%2F4EzdsSBg_4vX6D5pzvdsMLDoyItB", null);
} catch (URISyntaxException e) {}

//the output:
//foo.getQuery() = token=4/4EzdsSBg_4vX6D5pzvdsMLDoyItB
//bar.getQuery() = token=4%2F4EzdsSBg_4vX6D5pzvdsMLDoyItB

The URI(string uri) constructor seems to be decoding the query part of the URI. I thought the query portion is supposed to be encod开发者_运维问答ed? And why doesn't the other constructor decode the query part?


From the URI JavaDoc:

The single-argument constructor requires any illegal characters in its argument to be quoted and preserves any escaped octets and other characters that are present.

The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character ('%') is always quoted by these constructors. Any other characters are preserved.

Thus URI(String) expects you to encode everything correctly and assumes %2F is such an encoded octed which will be decoded to /.

The other constructors would endcode the % character (resulting in %252F for input %2F) and thus after decoding you still get %2F.

I assume the purpose of the deviation between the constructors is to allow things like new URI(otherUri.toString()) with toString() returning a fully encoded URI.


A quick analysis:

foo

The constructor parses the input URI and unquotes the literal %2F to /. This is what we expect.

bar

With the constructor used in the bar example, the fragment part is taken as a raw String with illegal chars and encoded first, with the effect that %2F is translated to %252F. Then it is parsed and the now unquoted query part is (again) %2F.

Lesson learned: With the first constructor we pass an RFC 2396 compliant URI. The other constructors take normal Strings (unquoted illegal chars) and URI constructs an RFC 2396 compliant representation.

Here's a working example on IDEONE (with extra supporting output)

0

精彩评论

暂无评论...
验证码 换一张
取 消