Okay, this may be a dumb question, but I’m pretty new to regular expressions and I honestly have no idea how to do this.
I don’t know how to tell if a regex will work with PHP’s preg_match()
or not.
For example, I would like to use the following regex with PHP’s preg_match()
.
\b
# Match the leading part (proto://hostname, or just hostname)
(
# ftp://, http://, or https:// leading part
(ftp|https?)://[-\w]+(\.\w[-\w]*)+
|
# or, try to find a hostname with our more specific sub-expression
(?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \. )+ # sub domains
# Now ending .com, etc. For these, require lowercase
(?-i: com\b
| edu\b
| biz\b
| gov\b
| in(?:t|fo)\b # .int or .info
| mil\b
| net\b
| org\b
| [a-z][a-z]\b # two-letter country codes
)
)
# Allow an optional port number
( : \d+ )?
# The rest of the URL is optional, and begins with / . . .
(
/
# The rest are heuristics for what seems to work well
[^.!,?;"'<>()\[\]{}\s\x7F-\xFF]*
(?:
[.!,?]+ [^.!,?;"'<>()\[\]{}\s\x7F-\xFF]+
)*
)?
preg_match($regex, $url);
doesn’t work when the above regex is used as-is. Why not?
What are the steps to take here开发者_JS百科 to ‘convert’ it so that it will work?
Note that the regex I’m providing here is just an example; I’d love to learn how to convert any regex to its preg_match
-compatible equivalent.
Thanks in advance!
P.S. I’m asking because I’m collecting and comparing different URL regexes on this test page: http://mathiasbynens.be/demo/url-regex People keep sending me regexes in other languages, and I don’t know how to make them work :(
You can use the x
modifier flag in PHP to allow the use of whitespace and comments. See http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
Also you need to wrap the regex in a set of delimiters. So /regex/modifiers
, like this:
/[abc]/xi
…the i
modifier being for case insensitivity.
I highly recommend 3rd edition of Mastering Regular Expressions (3rd edition includes a whole chapter on PHP but the whole book is very enlightening!).
P.S. RegexBuddy (Windows application) can convert regexes between languages for you: http://cl.ly/050z3e1Z3e050M3W2u2a Sadly, there’s no Mac version.
Unfortunately I can only post one link per reply!?
http://www.php.net/manual/en/regexp.reference.delimiters.php
The above is a link to find out more about delimiters for regexes.
I highly recommend 3rd edition of Mastering Regular Expressions (3rd edition includes a whole chapter on PHP but the whole book is very enlightening!)
please forgive me for going off topic, but that regexp does not include all TLD's. E.g. it is missing .museum and .aero
There is always talk about adding new TLD's or even allow anything as a TLD, so I advice against using a regexp that enumerates them.
精彩评论