开发者

Regexing URLs with and without a protocol in PHP

开发者 https://www.devze.com 2023-03-25 13:28 出处:网络
So I\'ve got this URL regex: /(?:((?:[^-/\"\':!=a-z0-9_@]|^|\\:))((https?://)((?:[^\\p{P}\\p{Lo}\\s].-|[^\\p{P}\\p{Lo}\\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*\';:=+\\$/%#[]-_,~]+))|@[a-z0-9!*

So I've got this URL regex:

/(?:((?:[^-/"':!=a-z0-9_@]|^|\:))((https?://)((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|@[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-开发者_StackOverflow社区_.,~]*[a-z0-9_&=#/])?))/iux

What it's currently matching:

  • http://www.google.com
  • http://google.com

I need it to also match:

  • www.google.com
  • google.com

I tried making the protocol part of the regex optional by slapping a ? at the end "(https?:\/\/)?" but that didn't do anything.

Ideas?


I'd look for something in the language that you are using to do this. URLs are tough to match with a regex. If you insist, I changed yours to make the (https?://) optional. I did not check it though.

/(?:((?:[^-/"':!=a-z0-9_@]|^|\:))((https?://)?((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|@[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-_.,~]*[a-z0-9_&=#/])?))/iux

I got this example from the RFC 3986 and was directed there by this comment. Although, I'd still recommend using something from whatever language you are using rather than a regex.

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

Since you are using PHP, did you consider using parse_url? It looks like it will return false on bad urls.

0

精彩评论

暂无评论...
验证码 换一张
取 消