I know there are tonns of questions on here to validate a web address with something like this
/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i
The only problem is, not everybody uses the http://
or whatever comes before so i wanted to find a way to use the preg_match()
but not checking for http as a must have but more of a doesn't really matter, i modified it to this but then it rejects the url it it does have http://
in it:
/^[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i
I was hoping more to validate it on these conditions
- If it has http:// or www then just ignore this
- If the .extension is longer than 9 then rejec开发者_JS百科t
- If it contains no full stops
Anybody got an idea, thanks :)
Can't you just use the built in filter_var
function?
filter_var('example.com', FILTER_VALIDATE_URL);
Not sure about the nine chars extension limit, but I guess you could easily check this in an additional step.
Why not have a stage before the regexp to simply remove the http://
if present ? The same would apply to the www
. That may make your life a bit easier.
/^(http\://|www\.)/
/^.+?\.\S{0,9}\./
/\./
Those should work for your bullet points?
not everybody uses the http://
They should. Without a scheme it simply isn't a URL, and omitting it can cause weird problems. For example:
www.example.com:8080/file.txt
This is a valid URL with the non-existant scheme www.example.com:
.
If you are sure that the normal scheme should be http:
, you could try automatically appending http://
to ‘fix up’ any URL that doesn't begin with https?:
, before validation. But you shouldn't allow/keep/return schemeless URLs over the longer term.
Incidentally the current regex you are using is a long way from accurate according to the official URI syntax (see RFC 3986). It will disallow many valid URI characters, not to mention Unicode characters in IRI. If you want a proper validation you should use a real URL-parser; if you just want a quick check for obvious problems you should use something much more permissive. For example just checking for the absence of categorically-invalid characters like space and "
.
精彩评论