开发者

Don't get the url contains: "togl" [Regex]

开发者 https://www.devze.com 2023-03-31 20:31 出处:网络
I have a great URL catching Regex but I have a problem.. I don\'t want to catch url\'s from are togl.me... My Regexp is:

I have a great URL catching Regex but I have a problem.. I don't want to catch url's from are togl.me... My Regexp is:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))

And this is the regex pattern:

(?xi)
\b
(                       # Capture 1: ent开发者_JAVA技巧ire matched URL
  (?:
    https?://               # http or https protocol
    |                       #   or
    www\d{0,3}[.]           # "www.", "www1.", "www2." … "www999."
    |                           #   or
    [a-z0-9.\-]+[.][a-z]{2,4}/  # looks like domain name followed by a slash
  )
  (?:                       # One or more:
    [^\s()<>]+                  # Run of non-space, non-()<>
    |                           #   or
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
  )+
  (?:                       # End with:
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
    |                               #   or
    [^\s`!()\[\]{};:'".,<>?«»“”‘’]        # not a space or one of these punct chars
  )
)

Don't catch URLs from http://togl.me . I can check the domain name with parse_url after catching the URLs but why need it?


After matching the domain, you can look back to check that it was not togl.me.

[a-z0-9.\-]+[.][a-z]{2,4}(?<!/togl\.me)/

Edit: since the domain can be matched in other places than where the comments say so, lets move the check for togl.me.

…
    [a-z0-9.\-]+[.][a-z]{2,4}/  # looks like domain name followed by a slash
  )
  (?<!togl\.me/) 
  (?!togl\.me)
  (?:                       # One or more:
    [^\s()<>]+
…

More help: http://www.regular-expressions.info/lookaround.html

0

精彩评论

暂无评论...
验证码 换一张
取 消