开发者

RegEx How to find text between two strings

开发者 https://www.devze.com 2023-04-03 14:35 出处:网络
I have this text XXX text XXX XXX text XXX XXX text XXX and i want to capturethe text between the XXX and XXX.

I have this text

XXX
text 
XXX

XXX
text 
XXX

XXX
text 
XXX

and i want to capture the text between the XXX and XXX. (i am trying to get chapters out from a book )

 /XXX.*XXX/

This will capture the first begin and the last end

 /XXX.*?XXX/

This will skip every second chapter

Thanks ah开发者_如何转开发ead Barak


If you text contains line feeds (\n) you'll need to add the "dot matched newline" switch to your regex, as well as making your match "non greedy":

/(?s)XXX.*?XXX/

Edited: Thanks to Alan's comment - I had the wrong switch: (?s) is correct


Solution using sed

$ sed -n '/XXX/,/XXX/{n;p}' text
text 

text 

text 


If this XXX strings are always in separate lines, i would suggest simple iterating through lines and picking it 'by hand'. It should be faster than multi-line regexp.

python :

delim = "XXX"
inside = False
lines = []
idx = 0
for line in file:
    if line.strip() == delim:
        inside = not inside
        if inside: lines.append([])
        else: idx += 1
    elif inside:
        lines[idx].append(line)


Your description doesn't really match your examples. If XXX is supposed to represent a chapter heading, there would only be one at the beginning of each chapter. To detect the end of a chapter, you would need to do a lookahead for the next chapter heading:

/XXX.*?(?=XXX)/s

That should work for all but the last chapter; to match that you can use \z, the end anchor:

/XXX.*?(?=XXX|\z)/s

It really would help if we knew which regex flavor you're using. For example, in Ruby you would have to use /m instead of /s to allow . to match linefeeds.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号