开发者

How do I prevent my regex from matching a trailing quote in a string?

开发者 https://www.devze.com 2023-02-15 09:43 出处:网络
I am rejigging some urls that have been defined in javascript: var x = \"http:\\/\\/example.com\\/test.aspx?v=12.1&x=2&p=3\";

I am rejigging some urls that have been defined in javascript:

var x = "http:\/\/example.com\/test.aspx?v=12.1&x=2&p=3";
var y = "http:\/\/example.com\/test.aspx?v=92.1&x=2&p=4";

My regex to capture the domain part, the path and the querystring into a capture group works great:

(开发者_如何学Gohttp:\\/\\/example.com\\/)([0-9a-zA-Z-\\\/\._]+)([\?]?)(.+)`

However the sand in the vaseline is that the last double quote is being matched as well. How do a stop matching just before the end quote?

As it happens this is for IIS7's UrlRwriter so I can't use any code to strip the end quote off.


I assume you don't allow quotes in the URL body, so you could just change the (.+) to ([^"]+).

Edit: It occurs to me you might need to allow for " or ', so you could just change the above to ([^"']+). If you want to be more thorough, you can go with

([^"']+)(http:\\/\\/example.com\\/)([0-9a-zA-Z-\\\/\._]+)([\?]?)(.+?)\1

...and ignore the first capture group. This way, it takes everything up to the next matching quote. That's probably unnecessary, though. I can't imagine that you'd want to allow ' or " in your URL string, but the . already matches several characters that aren't supposed to be in URLs, so I thought I'd leave it up to you.


instead of matching anything one or more times .+ try matching anything not a quote one or more times:

[^"]+

The [] creates a character class, and the '^' means not .. or a negated character class. This will literally match anything not a quote.

0

精彩评论

暂无评论...
验证码 换一张
取 消