开发者

Converting and validating url from untrusted source

开发者 https://www.devze.com 2023-03-11 02:51 出处:网络
I\'m parsing web page and collecting hrefs. Because web page is untrusted source it can hold links with invalid syntax or non-ascii symbols. So, as I understand, I need

I'm parsing web page and collecting hrefs. Because web page is untrusted source it can hold links with invalid syntax or non-ascii symbols. So, as I understand, I need

1) convert spaces and non-ascii symbols and other symbols

2) validate string that was produced by step 1 (validness crit开发者_开发技巧eria: this url can be typed in browser and it will be able to retrieve page represented by url, such url can be constructed by URL/URI constructors and than appropriate page retrieved - I can type some urls in firefox but can't construct instances in java)

3) construct java.net.URL/URI from (1) if it is valid

I had found two validation libraries: 1 and 2 (which one do you prefer?) but no adequate library for first clause (tools like java.net.URLDecoder/URLEncoder) aren't intended for this purpose.


Can't you just try to make an URL/URI from it in a try/catch statement? I think that class' constructor handles the validation automatically

0

精彩评论

暂无评论...
验证码 换一张
取 消