开发者

How to find out if string has already been URL encoded?

开发者 https://www.devze.com 2022-12-21 02:51 出处:网络
How could I check if string has already been encoded开发者_如何学运维? For example, if I encode TEST==, I get TEST%3D%3D. If I again encode last string, I get TEST%253D%253D, I would have to know be

How could I check if string has already been encoded开发者_如何学运维?

For example, if I encode TEST==, I get TEST%3D%3D. If I again encode last string, I get TEST%253D%253D, I would have to know before doing that if it is already encoded...

I have encoded parameters saved, and I need to search for them. I don't know for input parameters, what will they be - encoded or not, so I have to know if I have to encode or decode them before search.


Decode, compare to original. If it does differ, original is encoded. If it doesn't differ, original isn't encoded. But still it says nothing about whether the newly decoded version isn't still encoded. A good task for recursion.

I hope one can't write a quine in urlencode, or this algorithm would get stuck.

Exception: When a string contains "+" character url decoder replaces it with a space even though the string is not url encoded


Use regexp to check if your string contains illegal characters (i.e. characters which cannot be found in URL-encoded string, like whitespace).


Try decoding the url. If the resulting string is shorter than the original then the original URL was already encoded, else you can safely encode it (either it is not encoded, or even post encoding the url stays as is, so encoding again will not result in a wrong url). Below is sample pseudo (inspired by ruby) code:

# Returns encoded URL for any given URL after determining whether it is already encoded or not
    def escape(url)
      unescaped_url = URI.unescape(url)
      if (unescaped_url.length < url.length)
        return url
      else
        return URI.escape(url)
      end
    end


You can't know for sure, unless your strings conform to a certain pattern, or you keep track of your strings. As you noted by yourself, a String that is encoded can also be encoded, so you can't be 100% sure by looking at the string itself.


Check your URL for suspicious characters[1]. List of candidates:

WHITE_SPACE ,", < , > , { , } , | , \ , ^ , ~ , [ , ] , . and `

I use:

private static boolean isAlreadyEncoded(String passedUrl) {
        boolean isEncoded = true;
        if (passedUrl.matches(".*[\\ \"\\<\\>\\{\\}|\\\\^~\\[\\]].*")) {
                isEncoded = false;
        }
        return isEncoded;
}

For the actual encoding I proceed with:

https://stackoverflow.com/a/49796882/1485527

Note: Even if your URL doesn't contain unsafe characters you might want to apply, e.g. Punnycode encoding to the host name. So there is still much space for additional checks.


[1] A list of candidates can be found in the section "unsafe" of the URL spec at Page 2. In my understanding '%' or '#' should be left out in the encoding check, since these characters can occur in encoded URLs as well.


Using Spring UriComponentsBuilder:

import java.net.URI;
import org.springframework.web.util.UriComponentsBuilder;

private URI getProperlyEncodedUri(String uriString) {
    try {
        return URI.create(uriString);
    } catch (IllegalArgumentException e) {
        return UriComponentsBuilder.fromUriString(uriString).build().toUri();
    }
}


If you want to be sure that string is encoded correctly (if it needs to be encoded) - just decode and encode it once again.

metacode:

100%_correctly_encoded_string = encode(decode(input_string))

already encoded string will remain untouched. Unencoded string will be encoded. String with only url-allowed characters will remain untouched too.


According to the spec (https://www.rfc-editor.org/rfc/rfc3986) all URLs MUST start with a scheme followed by a :

Since colons are required as the delimiter between a scheme and the rest of the URI, any string that contains a colon is not encoded.

(This assumes you will not be given an incomplete URI with no scheme.)

So you can test if the string contains a colon, if not, urldecode it, and if that string contains a colon, the original string was url encoded, if not, check if the strings are different and if so, urldecode again and if not, it is not a valid URI.

You can make this loop simpler if you know what schemes you can expect.


Thanks to this answer I coded a function (JS Language) that encodes the URL just once with encodeURI so you can call it to make sure is encoded just once and you don't need to know if the URL is already encoded.

ES6:

var getUrlEncoded = sURL => {
    if (decodeURI(sURL) === sURL) return encodeURI(sURL)
    return getUrlEncoded(decodeURI(sURL))
}

Pre ES6:

var getUrlEncoded = function(sURL) {
    if (decodeURI(sURL) === sURL) return encodeURI(sURL)
    return getUrlEncoded(decodeURI(sURL))
}

Here are some tests so you can see the URL is only encoded once:

getUrlEncoded("https://example.com/media/Screenshot27 UI Home.jpg")
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(encodeURI("https://example.com/media/Screenshot27 UI Home.jpg"))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(encodeURI(encodeURI("https://example.com/media/Screenshot27 UI Home.jpg")))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(decodeURI("https://example.com/media/Screenshot27 UI Home.jpg"))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(decodeURI(decodeURI("https://example.com/media/Screenshot27 UI Home.jpg")))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
0

精彩评论

暂无评论...
验证码 换一张
取 消