Here's a regular expression to detect product pages on amazon. It works for pages in standard english but not for url's with international characters. So URL2 is not detected. How do I get around this? Thanks.
var URL1 = "www.amazon.com/Big-Short开发者_运维问答-Inside-Doomsday-Machine/dp/0393338827/";
var URL2 = "www.amazon.fr/Larm%C3%A9e-furieuse-Fred-Vargas/dp/2878583760/";
var regex1 = RegExp("http://www.amazon.(com|co.uk|de|ca|it|fr|cn|co.jp)/([\\w-]+/)?(dp|gp/product)/(\\w+/)?(\\w{10})");
m = URL1.match(regex1);
%
doesn't match \w
, so Larm%C3%A9e-furieuse-Fred-Vargas
doesn't match [\w-]+
. Why not just use [^/]+
?
PS — ".
" matches any character, so you should use pattern \.
, which would appear as \\.
in the literal.
RegExp("http://www\\.amazon\\.(ca|cn|co\\.(jp|uk)|com|de|fr|it)/([^/]+/)?(dp|gp/product)/(\\w+/)?(\\w{10})");
精彩评论