开发者

How do I search from the bottom up using a regular expression?

开发者 https://www.devze.com 2023-01-07 02:04 出处:网络
Here is an example of the type of text file I am trying to search (named usefile): DOCK onomatopoeia DOCK blah blah

Here is an example of the type of text file I am trying to search (named usefile):

DOCK onomatopoeia DOCK blah blah

blah DOCK blah

DOCK

blah blah blah

onomatopoeia

blah blah blah

blah blah DOCK

DOCK blah blah

DOCK blah

onomatopoeia

I am using a 开发者_开发百科finditer statement to find everything between DOCK and onomatopoeia as follows:

re.finditer(r'((dock)(.+?)(onomatopoeia))', usefile, re.I|re.DOTALL)

Obviously Dock is a much more common word than onomatopoeia and I only want to grab text between the first instance of Dock before onomatopoeia. The regex I am using above grabs text between the first instance of Dock and stops when it hits onomatopoeia, so I might get Dock Dock Dock Dock onomatopoeia when I really only wanted Dock onomatopoeia.

To be clear what I want from above is:

1. DOCK onomatopoeia

2. DOCK blah blah blah onomatopoeia

3. DOCK blah onomatopoeia

Is there a way to search for onomatopoeia and go UP to the first instance of Dock, or a better way to solve my problem?

Thanks!


A negative lookahead assertion will do the trick.

DOCK((?!DOCK).)+?onomatopoeia


Here's an algorithmic approach:

  • set pushing==false.
  • Break your text apart into words (e.g. spans of letters) and loop over those.
  • upon hitting a DOCK and pushing==false, push it onto a stack and set pushing = true
  • if you hit ono... and pushing==true, print out whatever's on the stack plus ono..., then clear the stack and set pushing = false.
  • any other word, if pushing==true, push it.
  • DOCK, if pushing==true, clear the stack, then push your new DOCK.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号