开发者

Apache Common UrlValidator does not support unicode. alernative is avaliable?

开发者 https://www.devze.com 2023-01-07 23:23 出处:网络
i try to url validation. but UrlValidator is does not support unicode. here is code public static boolean isValidHttpUrl(String url) {

i try to url validation. but UrlValidator is does not support unicode. here is code

public static boolean isValidHttpUrl(String url) {
    String[] schemes = {"http", "https"};
    UrlValidator urlValidator = new UrlValidator(schemes);
    if (urlValidator.isValid(url)) {
        System.out.println("url is valid");
        return true;
    }
    System.out.println("url is invalid");
    return false;
}

String url = "ftp://hi.com";
boolean isValid = isValidHttpUrl(url);
assertFalse(isValid);

url = "http:// hi.com";
isV开发者_StackOverflowalid = isValidHttpUrl(url);
assertFalse(isValid);

url = "http://hi.com";
isValid = isValidHttpUrl(url);
assertTrue(isValid);

// this is problem... it's not true... 
url = "http://안녕.com";
isValid = isValidHttpUrl(url);
assertTrue(isValid);

do you know any alternative url validator support unicode?

i add some case... http://seapy_hi.com is invalid. why? underbar is valid domain why invalid?


It doesn't support IDN. You need to convert URL to Punycode first. Try this,

  isValid = isValidHttpUrl(IDN.toASCII(url));


There may be a more recent RFC that supersedes this one, but technically speaking URLs do not suppor Unicode. RFC1738

The relevant section in particular:

No corresponding graphic US-ASCII:

URLs are written only with the graphic printable characters of the
US-ASCII coded character set. The octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
control characters; these must be encoded.


As Kaerber mention in the comment to accepted answer - that one have a bug if the string starts with a scheme. So here's my solution with fix of that:

public static String convertUnicodeURLToAscii(String url) throws URISyntaxException {
    if(url == null) {
        return null;
    }
    url = url.trim();
    URI uri = new URI(url);
    boolean includeScheme = true;

    // URI needs a scheme to work properly with authority parsing
    if(uri.getScheme() == null) {
        uri = new URI("http://" + url);
        includeScheme = false;
    }

    String scheme = uri.getScheme() != null ? uri.getScheme() + "://" : null;
    String authority = uri.getRawAuthority() != null ? uri.getRawAuthority() : ""; // includes domain and port
    String path = uri.getRawPath() != null ? uri.getRawPath() : "";
    String queryString = uri.getRawQuery() != null ? "?" + uri.getRawQuery() : "";
    String fragment = uri.getRawFragment() != null ? "#" + uri.getRawFragment() : "";

    // Must convert domain to punycode separately from the path
    url = (includeScheme ? scheme : "") + IDN.toASCII(authority) + path + queryString + fragment;

    // Convert path from unicode to ascii encoding
    return new URI(url).normalize().toASCIIString();
}
0

精彩评论

暂无评论...
验证码 换一张
取 消