I'm trying to create a regex that matches a URL that has a capital letter before the query string. I want to capture the query string including the question mark and I want to capture the non-query string part. If there is no query string, but there is a capital letter, then the non query string part should be captured.
A few examples:
/contextroot/page.html?param1=value1¶m2=value2 NO MATCH
/contextroot/page.html?param=VALUE¶m2=value2 NO MATCH
/contextroot/Page.html?param=value MATCH
/contextro开发者_C百科ot/Page.html GROUP 1
?param=value GROUP 2
/contextroot/page.HTML MATCH
/contextroot/page.HTML GROUP 1
Here's my first cut at the regex:
^(.*[A-Z].*)(\??.*)$
It's busted. This never captures the query string.
^/contextroot/([^?]*[A-Z][^?]*)(\?.*)?$
Explanation:
^/contextroot/ # literal start of URL
( # match group 1
[^?]* # anything except `?` (zero or more)
[A-Z] # one capital letter
[^?]* # see above
)
( # match group 2
\? # one ?
.* # anything that follows
)? # optionally
$ # end of string
(^/contextroot/(?=[^?A-Z]*[A-Z])[^?]*)(\?.*)?
Explanation:
( # match group 1
^/contextroot/ # literal start of URL (optional, remove if not needed)
(?= # positive look-ahead...
[^?A-Z]* # anything but a question mark or upper-case letters
[A-Z] # a mandatory upper-case letter
) # end look-ahead
[^?]* # match anything but a question mark
) # end group 1
( # match group 2
\?.* # a question mark and the rest of the query string
)? # end group 2, make optional
Note that this is intended to check a single URL and does not work when run against a multi-line string.
To make it work with multi-line input (one URL per line), make this change:
(^/contextroot/(?=[^?A-Z\r\n]*[A-Z])[^?\r\n]*)(\?.*)?
精彩评论