开发者

Trying to match what is before /../ but after / with regular expressions

开发者 https://www.devze.com 2022-12-26 09:27 出处:网络
I am trying to match what is before /../ but after / with a regular expressions, but I want it to look back and stop at the first /

I am trying to match what is before /../ but after / with a regular expressions, but I want it to look back and stop at the first /

I feel like I am close but it just looks at the first slash and then takes everything after it like... input is this:

this/is/a/./path/t开发者_运维技巧hat/../includes/face/./stuff/../hat

and my regular expression is:

#\/(.*)\.\.\/#

matching /is/a/./path/that/../includes/face/./stuff/../ instead of just that/../ and stuff/../

How should I change my regex to make it work?


.* means "match any number of any character at all[1]". This is not what you want. You want to match any number of non-/ characters, which is written [^/]*.

Any time you are tempted to use .* or .+ in a regex, be very suspicious. Stop and ask yourself whether you really mean "any character at all[1]" or not - most of the time you don't. (And, yes, non-greedy quantifiers can help with this, but character classes are both more efficient for the regex engine to match against and more clear in their communication of your intent to human readers.)

[1] OK, OK... . isn't exactly "any character at all" - it doesn't match newline (\n) by default in most regex flavors - but close enough.


Change your pattern that only characters other than / ([^/]) get matched:

#([^/]*)/\.\./#


Alternatively, you can use a lookahead.

#(\w+)(?=/\.\./)#

Explanation

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    /                        '/'
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    /                        '/'
--------------------------------------------------------------------------------
  )                        end of look-ahead


I think you're essentially right, you just need to make the match non-greedy, or change the (.*) to not allow slashes: #/([^/]*)/\.\./#


In your favourite language, do a few splits and string manipulation eg Python

>>> s="this/is/a/./path/that/../includes/face/./stuff/../hat"
>>> a=s.split("/../")[:-1]  # the last item is not required.
>>> for item in a:
...   print item.split("/")[-1]
...
that
stuff


In python:

>>> test = 'this/is/a/./path/that/../includes/face/./stuff/../hat'
>>> regex = re.compile(r'/\w+?/\.\./')
>>> regex.findall(me)
['/that/..', '/stuff/..']

Or if you just want the text without the slashes:

>>> regex = re.compile(r'/(\w+?)/\.\./')
>>> regex.findall(me)
['that', 'stuff']


([^/]+) will capture all the text between slashes.

([^/]+)*/\.\. matches that\.. and stuff\.. in you string of this/is/a/./path/that/../includes/face/./stuff/../hat It captures that or stuff and you can change that, obviously, by changing the placement of the capturing parens and your program logic.

You didn't state if you want to capture or just match. The regex here will only capture that last occurrence of the match (stuff) but is easily changed to return that then stuff if used global in a global match.

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (                        group and capture to \1 (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [^/]+                    any character except: '/' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of \1 (NOTE: because you're using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
--------------------------------------------------------------------------------
  /                        '/'
--------------------------------------------------------------------------------
  \.                       '.'
--------------------------------------------------------------------------------
  \.                       '.'
0

精彩评论

暂无评论...
验证码 换一张
取 消