开发者

Regex to match subdomain?

开发者 https://www.devze.com 2023-03-22 06:47 出处:网络
I have the following so far: ^((http[s]?|ftp):\\/\\/)(([^.:\\/\\s]*)[\\.]([^:\\/\\s]+))(:([^\\/]*))?(((\\/\\w+)*\\/)([\\w\\-\\.]+[^#?\\s]+)(\\?([^#]*))?(#(.*))?)?$

I have the following so far:

^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$

Been testing against these:

https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash 
https://google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash 
https://google.com:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash 
http://www.foo.com
http://www.foo.com/
http://blog.foo.com/
http://blog.foo.com.ar/
http://foo.com
http://blog.foo.com
http://foo.com.ar

I'm using the following tool to test the regexes: regex tester

So far I've been able to yield the fo开发者_JAVA技巧llowing groups:

  1. full protocol
  2. reduced protocol
  3. full domain name
  4. subdomain?
  5. top level domain
  6. port
  7. port number
  8. rest of the url
  9. rest of the "directory"
  10. no idea how to drop this group
  11. page name
  12. argument string
  13. argument string
  14. hash tag
  15. hash tag

I will be using this regex to change the subdomain for my application for cross-domain redirect hyperlinks.

Using Request.Url as a parameter, I want to redirect from

http://example.com or http://www.example.com to http://blog.example.com

How can I achieve this?

I can't really tell what, if any, the current subdomain ( either nothing, www, blog, or forum, for instance) actually is...

What would be the best way to make this replacement?

What I actually need is some way to find out what the top level domain is. in either http://www.example.com, http://blog.example.com, or http://example.com I want to get example.com.


What would be the best way to make this replacement?

This may not be the answer you're looking for... but IMO the best way would be to make use of the System.Uri class.

The Uri class will easily extract the Host for you - and you can then split the host on "." delimiter - that should easily give you access to the current subdomain.


This is just my opinion - and its especially formed because I find it hard to maintain regex code like ^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$


You can use the Uri class to parse the strings. There are many properties available in addition to Segments:

Uri MyUri = new Uri("https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash");

foreach (String Segment in MyUri.Segments)
    Response.Write(Segment + "<br />");


I think you should reconsider whether usage of a RegEx is really needed in this case;

  • I think extracting the top level domain from an URL is quite simple; in case of "http://www.example.com/?blah=111" you can simply take the part before the 3rd slash and perform a String.Split('.') and concat the last two array items. In case of "http://www.example.com", even easier.

    • Regex-patterns are very error-prone and quite hard to maintain and according to me you won't get any advantage of it. I recommend you to get rid off the Regex. Perhaps the result will be 2 - 3 more lines of code, but it will work, your code will be much better readable and easier to understand.
0

精彩评论

暂无评论...
验证码 换一张
取 消