开发者

Grouped Regex to match a line that *sometimes* starts with spaces?

开发者 https://www.devze.com 2023-02-05 13:14 出处:网络
RegEx flavor: wxRegEx. I am trying to create a \"grouped\" regex that matches a string that sometimes begins with a whitespace. When it doesn\'t begin with a whitespace, it begins with the target gro

RegEx flavor: wxRegEx.

I am trying to create a "grouped" regex that matches a string that sometimes begins with a whitespace. When it doesn't begin with a whitespace, it begins with the target group (second parenthesized expression in the following sample). It is a relatively simple line made of a few predictable tokens and one portion of arbitrary text, e.g.

"good: Sed ut perspiciatis unde omnis iste natus error "
"better: Sit voluptatem accusantium doloremque laudantium "
"best: Nemo enim ipsam voluptatem quia voluptas "
" ok: Sit voluptatem accusantium doloremque laudantium "

Note: The quoted characters are not part of my in开发者_开发问答put. By introducing the quotes in my posting I am trying to make the boundaries of each line/string clearer.

The regex that I came up with to match the above in a "grouped" manner (i.e. that I can address each group separately for further processing) is:

(^\s*)(good|better|best|ok)(: )(.*)( $)

Note: \s is wxRegEx's class-shorthand escape for [[:space:]].

The problem is that this regex works only when the line actually begins with a space. Why? doesn't the '*' right after '\s' mean "0 or more occurrences of \s" ?

I know I am missing something fundamental here, but what is it?


Have you tried this with (^ *) instead of (^\s*)? Is it possible you're wrong about the \s syntax? I don't know wxRegEx myself.


I'm not familiar with wxRegEx, but if it is PCRE, I think you may want (^\s*)?(good|...

The '?' modifies the entire zero-or-more capture to make it zero-or-one.


That's weird.. you are right that * should match 0 or more occurrences... Does moving the caret (^) outside the group make any difference?


I see no obvious error in your regex. Your interpretation of the * is also correct, of course. Do you maybe have some actual spaces in your expression? The space ( like -> <- ) has no special meaning in regex and the engine will try to match it. If your first capturing group looked like (^ \s*) this would have the effect you describe.

0

精彩评论

暂无评论...
验证码 换一张
取 消