Why is .NET unable to decode this string when Java can "Hi%E1"?_问答_开发者

Why is .NET unable to decode this string when Java can "Hi%E1"?

开发者 https://www.devze.com 2022-12-20 11:17 出处：网络

My problem is with .Net Http/Uri libraries not being able to decode or unescape this character sequence: \"Hi%E1\".

My problem is with .Net Http/Uri libraries not being able to decode or unescape this character sequence: "Hi%E1". Neither Uri.UnescapeDataString nor HttpUtility.UrlDecode can do it.

Although I have a solution to get around this problem ( URL decoding confusion ) I would like to understand why it is failing.

The 1st test here throws an exception! The second just fails.

Assert.That(Uri.UnescapeDataString("Hi%E1"), Is.EqualTo("Hiá"));
HttpUtility.UrlDecode("Hi%E1").ShouldBe("Hiá");

There is nothing in the docs to indicate that UnescapeDataString or UrlDecode are restricted to character sets or any reason why these tests would fail. However, from testing, it would appear that HttpUtility assumes UTF-8 (or some other) encoding.

The Java equivalent works! Probably because it allows an encoding to be set.

URLDecoder.decode("Hi%E1","windows-1252");    // this works btw, ie passes tests

Which looks like a very sensible move considering the .Net work-around (see URL above)

Are the .Net implementations of these methods just crap and .Net devs just have to write their own - or am I missing something?

BTW Everything I know of in IIS set to UTF-8, and Chinese/Japanese开发者_StackOverflow中文版 characters show fine, so I don't yet know how it could it be that this URI consists of windows-1252 encoded characters. If I could fix the URI to contain UTF-8 encoding, that would be a better way of fixing this.

According to this you can also set the encoding using the HttpUtility.UrlDecode.

Although, that seems to simple if you're running into problems... just making sure you saw the overload.

Addendum

I discovered the underlying issue to this problem. I was using 'escape' in javascript - it's deprecated, don't use it.

escape('á') returns '%E1' - which is a windows-1252 encoding (ie it will fail or return the wrong character when using the methods above eg HttpUtility.UrlDecode unless you are able to specify 'windows-1252' in the overload)

encodeURI('á') returns '%C3%A1' - which is a UTF-8 encoding. Which will work and all your troubles will go away. The methods above will work without throwing exceptions or producing the wrong character.

Dreaming: Wouldn't it be nice if the Uri.UnescapeDataString specified which escape character was the problem? My URI at the time of diagnosis was 23,000 characters long. "Invalid URI" is not such a helpful message in that scenario.

Seems to work as specified...

HttpUtility.UrlDecode("Hi%E1", System.Text.Encoding.GetEncoding("windows-1252"));

Edit: Answer to comment.

If you use Reflector on HttpUtility.UrlDecode(string) you see that it uses UTF8 as the default Encoding. (As it should.)

//From Reflector (System.Web)
public static string UrlDecode(string str)
{
    if (str == null)
    {
        return null;
    }
    return UrlDecode(str, Encoding.UTF8);
}