Possible Duplicate:
The Hostname Regex
I'm trying开发者_StackOverflow to use pcrepp (PCRE) to extract hostname from url. the pcre regular expression is as same as Perl 5 regular expression.
for example:
url = "http://www.pandora.com/#/volume/73";
// the match will be "http://www.pandora.com/".
I can't find the correct syntax of the regex for this example.
- Needs to work for any url:
amazon.com/sds/
should return:amazon.com
. orabebooks.co.uk/isbn="62345627457245"/blabla/
should returnabebooks.co.uk
- I don't need to check if the url is valid. just to get the hostname.
Something like this:
^(?:[a-z]+://)?[^/]+/?
See Regexp::Common::URI::http which uses sub-patterns defined in Regexp::Common::URI::RFC2396. Examining the source code of those modules should give you a good idea how to put together a decent pattern.
Here is one possibility:
^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$
And another:
^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$
These and other URL related regular expressions can be found here: Regular Expression Library
string regex1, regex2, finalRegex;
regex1 = "^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??";
regex2 = "([^#]+)?#?(\\w*)";
//concatenation
finalRegex= regex1+regex2;
the result will be at the sixth place. answered in another question I asked: Details.
精彩评论