I am rejigging some urls that have been defined in javascript:
var x = "http:\/\/example.com\/test.aspx?v=12.1&x=2&p=3";
var y = "http:\/\/example.com\/test.aspx?v=92.1&x=2&p=4";
My regex to capture the domain part, the path and the querystring into a capture group works great:
(开发者_如何学Gohttp:\\/\\/example.com\\/)([0-9a-zA-Z-\\\/\._]+)([\?]?)(.+)`
However the sand in the vaseline is that the last double quote is being matched as well. How do a stop matching just before the end quote?
As it happens this is for IIS7's UrlRwriter so I can't use any code to strip the end quote off.
I assume you don't allow quotes in the URL body, so you could just change the (.+)
to ([^"]+)
.
Edit: It occurs to me you might need to allow for "
or '
, so you could just change the above to ([^"']+)
. If you want to be more thorough, you can go with
([^"']+)(http:\\/\\/example.com\\/)([0-9a-zA-Z-\\\/\._]+)([\?]?)(.+?)\1
...and ignore the first capture group. This way, it takes everything up to the next matching quote. That's probably unnecessary, though. I can't imagine that you'd want to allow '
or "
in your URL string, but the .
already matches several characters that aren't supposed to be in URLs, so I thought I'd leave it up to you.
instead of matching anything one or more times .+
try matching anything not a quote one or more times:
[^"]+
The []
creates a character class, and the '^' means not .. or a negated character class. This will literally match anything not a quote.
精彩评论