So I've got this URL regex:
/(?:((?:[^-/"':!=a-z0-9_@]|^|\:))((https?://)((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|@[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-开发者_StackOverflow社区_.,~]*[a-z0-9_&=#/])?))/iux
What it's currently matching:
- http://www.google.com
- http://google.com
I need it to also match:
- www.google.com
- google.com
I tried making the protocol part of the regex optional by slapping a ? at the end "(https?:\/\/)?" but that didn't do anything.
Ideas?
I'd look for something in the language that you are using to do this. URLs are tough to match with a regex. If you insist, I changed yours to make the (https?://)
optional. I did not check it though.
/(?:((?:[^-/"':!=a-z0-9_@]|^|\:))((https?://)?((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|@[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-_.,~]*[a-z0-9_&=#/])?))/iux
I got this example from the RFC 3986 and was directed there by this comment. Although, I'd still recommend using something from whatever language you are using rather than a regex.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
Since you are using PHP, did you consider using parse_url? It looks like it will return false on bad urls.
精彩评论