Regex to return all characters until "/" searching backwards_问答_开发者

Regex to return all characters until "/" searching backwards

开发者 https://www.devze.com 2023-03-14 22:38 出处：网络

I\'m having trouble with this regex and I think I\'m almost there. m =re.findall(\'[a-z]{6}\\.[a-z]{3}\\.[a-z]{2} (?=\\\" target)\', \'http://domain.com.uy \" target\')

I'm having trouble with this regex and I think I'm almost there.

m =re.findall('[a-z]{6}\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

This gives me the "exact" output that I want. that is domain.com.uy but obviously this is just an example since [a-z]{6} just matches the previous 6 characters and this is not what I want.

I want it to return domain.com.uy so basically the instruction would be match a开发者_如何学Cny character until "/" is encountered (backwards).

Edit:

m =re.findall('\w+\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

Is very close to what I want but wont match "_" or "-".

For the sake of completeness I do not need the http://

I hope the question is clear enough, if I left anything open to interpretation please ask for any clarification needed!

Thank in advance!

Another option is to use a positive lookbehind such as (?<=//):

>>> re.search(r'(?<=//).+(?= \" target)', 
...           'http://domain.com.uy " target').group(0)
'domain.com.uy'

Note that this will match slashes within the url itself, if that's desired:

>>> re.search(r'(?<=//).+(?= \" target)',
...           'http://example.com/path/to/whatever " target').group(0)
'example.com/path/to/whatever'

If you just wanted the bare domain, without any path or query parameters, you could use r'(?<=//)([^/]+)(/.*)?(?= \" target)' and capture group 1:

>>> re.search(r'(?<=//)([^/]+)(/.*)?(?= \" target)',
...           'http://example.com/path/to/whatever " target').groups()
('example.com', '/path/to/whatever')

try this (maybe you need to escape / in Python):

/([^/]*)$

If regular expressions are not a requirement and you simply wish to extract the FQDN from the URL in Python. Use urlparse and str.split():

>>> from urlparse import urlparse
>>> url = 'http://domain.com.uy " target'
>>> urlparse(url)
ParseResult(scheme='http', netloc='domain.com.uy " target', path='', params='', query='', fragment='')

This has broken up the URL into its component parts. We want netloc:

>>> urlparse(url).netloc
'domain.com.uy " target'

Split on whitespace:

>>> urlparse(url).netloc.split()
['domain.com.uy', '"', 'target']

Just the first part:

>>> urlparse(url).netloc.split()[0]
'domain.com.uy'

It's as simple as this:

[^/]+(?= " target)

But be aware that http://domain.com/folder/site.php will not return the domain. And remember to escape the regex properly in a string.

Regex to return all characters until "/" searching backwards

精彩评论

关注公众号

热门标签

图文推荐

Regex to return all characters until "/" searching backwards

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：