开发者

Problem with logicalOR ( regex ) not greedy

开发者 https://www.devze.com 2022-12-25 20:31 出处:网络
This is the part of a string \"21xy5\". I want to insert \" * \" surrounded with whitespacebetween: digit and letter, letter and digit, letter and letter. I use this regex pattern \"\\d[a-z]|[a-z]\\d|

This is the part of a string "21xy5". I want to insert " * " surrounded with whitespace between: digit and letter, letter and digit, letter and letter. I use this regex pattern "\d[a-z]|[a-z]\d|[a-z][a-z]" to find indexs where I gona insert string " * ". Problem is that when regex OR(|) in string 21xy5 trays to match 21-x|x-y|y-5, when first condition 21-x success, second x-y is not开发者_如何学C checked, and third success. So I have 21 * xy * 5 instead 21 * x * y * 5. If input string is like this xy21, then x-y success and then I have x * y21. Problem is that logical OR is not greedy.

    Regex reg = new Regex(@"\d[a-z]|[a-z]\d|[a-z][a-z]" );
    MatchCollection matchC;
    matchC = reg.Matches(input);
    int ii = 1;
    foreach (Match element in matchC)
    {
        input = input.Insert(element.Index + ii, " * ");
        ii += 3;
    }
    return input;


You want lookarounds.

Regex reg = new Regex(@"(\d(?=[a-z])|[a-z](?=[a-z\d]))");

(Replace reg with $1 *)

The problem of your original regex is not greediness, but it will actually consume 2 characters. That means, when 1x is being matched, only y5 will be left available, so the regex engine cannot see the xy. OTOH, look-ahead is just a zero-width assertion, so the next character will not be consumed. e.g. while 1x together matches \d(?=[a-z]), only 1 will be consumed, so xy5 is available.

0

精彩评论

暂无评论...
验证码 换一张
取 消