Pcrepp - Perl Regular Expression syntax to match host name [duplicate]_问答_开发者

Pcrepp - Perl Regular Expression syntax to match host name [duplicate]

开发者 https://www.devze.com 2022-12-22 01:48 出处：网络

This question already has answers here: Closed 12 years ago. Possible Duplicate: The Hostname Regex I\'m trying开发者_StackOverflow to use pcrepp (PCRE) to extract hostname from url.

This question already has answers here: Closed 12 years ago.

Possible Duplicate:
The Hostname Regex

I'm trying开发者_StackOverflow to use pcrepp (PCRE) to extract hostname from url. the pcre regular expression is as same as Perl 5 regular expression.

for example:

url = "http://www.pandora.com/#/volume/73";
// the match will be "http://www.pandora.com/".

I can't find the correct syntax of the regex for this example.

Needs to work for any url: amazon.com/sds/ should return: amazon.com. or abebooks.co.uk/isbn="62345627457245"/blabla/ should return abebooks.co.uk
I don't need to check if the url is valid. just to get the hostname.

Something like this:

^(?:[a-z]+://)?[^/]+/?

See Regexp::Common::URI::http which uses sub-patterns defined in Regexp::Common::URI::RFC2396. Examining the source code of those modules should give you a good idea how to put together a decent pattern.

Here is one possibility:

^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$

And another:

^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$

These and other URL related regular expressions can be found here: Regular Expression Library

string regex1, regex2, finalRegex; 
regex1 = "^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??";

regex2 = "([^#]+)?#?(\\w*)";

    //concatenation
    finalRegex= regex1+regex2;

the result will be at the sixth place. answered in another question I asked: Details.