开发者

Regex to validate URL - Not checking for HTTP?

开发者 https://www.devze.com 2022-12-20 16:01 出处:网络
I know there are tonns of questions on here to validate a web address with something like this /^[a-zA-Z]+[:\\/\\/]+[A-Za-z0-9\\-_]+\\\\.+[A-Za-z0-9\\.\\/%&=\\?\\-_]+$/i

I know there are tonns of questions on here to validate a web address with something like this

/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i

The only problem is, not everybody uses the http:// or whatever comes before so i wanted to find a way to use the preg_match() but not checking for http as a must have but more of a doesn't really matter, i modified it to this but then it rejects the url it it does have http:// in it:

/^[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i

I was hoping more to validate it on these conditions

  • If it has http:// or www then just ignore this
  • If the .extension is longer than 9 then rejec开发者_JS百科t
  • If it contains no full stops

Anybody got an idea, thanks :)


Can't you just use the built in filter_var function?

filter_var('example.com', FILTER_VALIDATE_URL);

Not sure about the nine chars extension limit, but I guess you could easily check this in an additional step.


Why not have a stage before the regexp to simply remove the http:// if present ? The same would apply to the www. That may make your life a bit easier.


/^(http\://|www\.)/

/^.+?\.\S{0,9}\./

/\./

Those should work for your bullet points?


not everybody uses the http://

They should. Without a scheme it simply isn't a URL, and omitting it can cause weird problems. For example:

www.example.com:8080/file.txt

This is a valid URL with the non-existant scheme www.example.com:.

If you are sure that the normal scheme should be http:, you could try automatically appending http:// to ‘fix up’ any URL that doesn't begin with https?:, before validation. But you shouldn't allow/keep/return schemeless URLs over the longer term.

Incidentally the current regex you are using is a long way from accurate according to the official URI syntax (see RFC 3986). It will disallow many valid URI characters, not to mention Unicode characters in IRI. If you want a proper validation you should use a real URL-parser; if you just want a quick check for obvious problems you should use something much more permissive. For example just checking for the absence of categorically-invalid characters like space and ".

0

精彩评论

暂无评论...
验证码 换一张
取 消