Regex to match subdomain?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-22 06:47 出处：网络

I have the following so far: ^((http[s]?|ftp):\\/\\/)(([^.:\\/\\s]*)[\\.]([^:\\/\\s]+))(:([^\\/]*))?(((\\/\\w+)*\\/)([\\w\\-\\.]+[^#?\\s]+)(\\?([^#]*))?(#(.*))?)?$

相关专题：asp.net regex

I have the following so far:

^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$

Been testing against these:

https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash 
https://google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash 
https://google.com:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash 
http://www.foo.com
http://www.foo.com/
http://blog.foo.com/
http://blog.foo.com.ar/
http://foo.com
http://blog.foo.com
http://foo.com.ar

I'm using the following tool to test the regexes: regex tester

So far I've been able to yield the fo开发者_JAVA技巧llowing groups:

full protocol
reduced protocol
full domain name
subdomain?
top level domain
port
port number
rest of the url
rest of the "directory"
no idea how to drop this group
page name
argument string
argument string
hash tag
hash tag

I will be using this regex to change the subdomain for my application for cross-domain redirect hyperlinks.

Using Request.Url as a parameter, I want to redirect from

http://example.com or http://www.example.com to http://blog.example.com

How can I achieve this?

I can't really tell what, if any, the current subdomain ( either nothing, www, blog, or forum, for instance) actually is...

What would be the best way to make this replacement?

What I actually need is some way to find out what the top level domain is. in either http://www.example.com, http://blog.example.com, or http://example.com I want to get example.com.

What would be the best way to make this replacement?

This may not be the answer you're looking for... but IMO the best way would be to make use of the System.Uri class.

The Uri class will easily extract the Host for you - and you can then split the host on "." delimiter - that should easily give you access to the current subdomain.

This is just my opinion - and its especially formed because I find it hard to maintain regex code like ^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$

You can use the Uri class to parse the strings. There are many properties available in addition to Segments:

Uri MyUri = new Uri("https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash");

foreach (String Segment in MyUri.Segments)
    Response.Write(Segment + "<br />");

I think you should reconsider whether usage of a RegEx is really needed in this case;

I think extracting the top level domain from an URL is quite simple; in case of "http://www.example.com/?blah=111" you can simply take the part before the 3rd slash and perform a String.Split('.') and concat the last two array items. In case of "http://www.example.com", even easier.
- Regex-patterns are very error-prone and quite hard to maintain and according to me you won't get any advantage of it. I recommend you to get rid off the Regex. Perhaps the result will be 2 - 3 more lines of code, but it will work, your code will be much better readable and easier to understand.