开发者

Regex to return all characters until "/" searching backwards

开发者 https://www.devze.com 2023-03-14 22:38 出处:网络
I\'m having trouble with this regex and I think I\'m almost there. m =re.findall(\'[a-z]{6}\\.[a-z]{3}\\.[a-z]{2} (?=\\\" target)\', \'http://domain.com.uy \" target\')

I'm having trouble with this regex and I think I'm almost there.

m =re.findall('[a-z]{6}\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

This gives me the "exact" output that I want. that is domain.com.uy but obviously this is just an example since [a-z]{6} just matches the previous 6 characters and this is not what I want.

I want it to return domain.com.uy so basically the instruction would be match a开发者_如何学Cny character until "/" is encountered (backwards).

Edit:

m =re.findall('\w+\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

Is very close to what I want but wont match "_" or "-".

For the sake of completeness I do not need the http://

I hope the question is clear enough, if I left anything open to interpretation please ask for any clarification needed!

Thank in advance!


Another option is to use a positive lookbehind such as (?<=//):

>>> re.search(r'(?<=//).+(?= \" target)', 
...           'http://domain.com.uy " target').group(0)
'domain.com.uy'

Note that this will match slashes within the url itself, if that's desired:

>>> re.search(r'(?<=//).+(?= \" target)',
...           'http://example.com/path/to/whatever " target').group(0)
'example.com/path/to/whatever'

If you just wanted the bare domain, without any path or query parameters, you could use r'(?<=//)([^/]+)(/.*)?(?= \" target)' and capture group 1:

>>> re.search(r'(?<=//)([^/]+)(/.*)?(?= \" target)',
...           'http://example.com/path/to/whatever " target').groups()
('example.com', '/path/to/whatever')


try this (maybe you need to escape / in Python):

/([^/]*)$


If regular expressions are not a requirement and you simply wish to extract the FQDN from the URL in Python. Use urlparse and str.split():

>>> from urlparse import urlparse
>>> url = 'http://domain.com.uy " target'
>>> urlparse(url)
ParseResult(scheme='http', netloc='domain.com.uy " target', path='', params='', query='', fragment='')

This has broken up the URL into its component parts. We want netloc:

>>> urlparse(url).netloc
'domain.com.uy " target'

Split on whitespace:

>>> urlparse(url).netloc.split()
['domain.com.uy', '"', 'target']

Just the first part:

>>> urlparse(url).netloc.split()[0]
'domain.com.uy'


It's as simple as this:

[^/]+(?= " target)

But be aware that http://domain.com/folder/site.php will not return the domain. And remember to escape the regex properly in a string.

0

精彩评论

暂无评论...
验证码 换一张
取 消