开发者

RegEx selecting more than I want (PHP)

开发者 https://www.devze.com 2023-01-24 04:42 出处:网络
I have the following string: blah blah yo<desc>some text with description - unwanted text</desc>um hey now some words yah<desc>some other description text

I have the following string:

blah blah yo<desc>some text with description - unwanted 
text</desc>um hey now some words yah<desc>some other description text 
stuff - more unwanted here</desc>random word and ; things. Now a hyphen 
outside of desc tag - with other text<desc>yet another description - unwanted
<desc>and that's about it.

(Note: In reality there are no newline/carriage returns in the string. I only added them here for readability.)

I want to select only the text in the desc tag from the hyphen forward, and also including the preceding space, and also including the ending desc开发者_Python百科 tag. That was simple as I just did this:

\s-.*?<\/desc>

Now, the problem is that the hyphen that is outside the desc tag is getting selected too. So all my selections are as follow:

- unwanted text</desc>
- more unwanted here</desc>
- with other text<desc>yet another description - unwanted</desc>

So the first two are perfect but see how that last line is messed up because of the - outside the desc tag?

Just FYI, if interested, in my code I am doing a replace like this:

$text = preg_replace('/\s-.*?<\/desc>/', '</desc>', $text);

I tried doing some Lookbehind stuff but could not get it to work.

Any ideas?

Thanks! Mark


You could try [^-<>]* instead of .*?. This restricts what the regex can select and effectively treats angle brackets and the hyphen as tokens.


What about:

\s-[^-]*?<\/desc>


If desc is the only tag that can appear in this block, you could use a horrible hack like this:

$text = preg_replace('/\s-[^<]*?<\/desc>/', '</desc>', $text);

But if this needs to be bulletproof, you can't reliably do this with a regular expression. You might try using an XML parser and processing the resultant DOM.

0

精彩评论

暂无评论...
验证码 换一张
取 消