开发者

Regex for URL C#

开发者 https://www.devze.com 2023-01-26 18:10 出处:网络
In my C# program I wrote a Google Search Function, which works by fetching the source from each page and getting the URLs via regex.

In my C# program I wrote a Google Search Function, which works by fetching the source from each page and getting the URLs via regex.

My actual Regex is:

(?:(?:(?:http)://)(?:w{3}\\.)?(?:[a-zA-Z0-9/;\\?&=:\\-_\\$\\+!\\*'\\(\\|\\\\~\\[\\]#%\\.])+)
开发者_JAVA百科

This works good at the moment, but I get for example URLs like http://www.example.com/forums/arcade.php?efdf=332

I just want to get in this case the URL without the ?efdf=332 at the end.

So how should I change the regex?


http://(?:www\.)?[a-zA-Z0-9/;&=:_$+!*'()|~\[\]#%.\\-]+

does the same as your regex (I've removed a lot of unnecessary cruft) but stops matching a link before a ?.

In C#:

Regex regexObj = new Regex(@"http://(?:www\.)?[a-zA-Z0-9/;&=:_$+!*'()|~\[\]#%.\\-]+")

That said, I'm not sure this is such a good way of matching URLs (what about https, ftp, mailto etc.?)


You can use the Uri class to access various parts of the URL and either remove the query string from the end, or concatenate the parts you want.

0

精彩评论

暂无评论...
验证码 换一张
取 消