开发者

Detecting International Characters In Regular Expressions

开发者 https://www.devze.com 2023-03-06 01:03 出处:网络
Here\'s a regular expression to detect product pages on amazon. It works for pages in standard english but not for url\'s with international characters. So URL2 is not detected. How do I get around th

Here's a regular expression to detect product pages on amazon. It works for pages in standard english but not for url's with international characters. So URL2 is not detected. How do I get around this? Thanks.

var URL1 = "www.amazon.com/Big-Short开发者_运维问答-Inside-Doomsday-Machine/dp/0393338827/";
var URL2 = "www.amazon.fr/Larm%C3%A9e-furieuse-Fred-Vargas/dp/2878583760/";

var regex1 = RegExp("http://www.amazon.(com|co.uk|de|ca|it|fr|cn|co.jp)/([\\w-]+/)?(dp|gp/product)/(\\w+/)?(\\w{10})");
m = URL1.match(regex1);


% doesn't match \w, so Larm%C3%A9e-furieuse-Fred-Vargas doesn't match [\w-]+. Why not just use [^/]+?

PS — "." matches any character, so you should use pattern \., which would appear as \\. in the literal.

RegExp("http://www\\.amazon\\.(ca|cn|co\\.(jp|uk)|com|de|fr|it)/([^/]+/)?(dp|gp/product)/(\\w+/)?(\\w{10})");
0

精彩评论

暂无评论...
验证码 换一张
取 消