开发者

How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?

开发者 https://www.devze.com 2023-01-18 14:44 出处:网络
How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns? Example - I\'m trying to have the expression match all the b characters following any number of a characte

How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?

Example - I'm trying to have the expression match all the b characters following any number of a characters:

Regex expression = new Regex("(?<=a).*");

foreach (Match result in expression.Matches("aaabbbb"))
  MessageBox.Show(result.Value);

returns aabbbb, the lookbehind matching only an a. How can I make it so that it wou开发者_如何学Gold match all the as in the beginning?

I've tried

Regex expression = new Regex("(?<=a+).*");

and

Regex expression = new Regex("(?<=a)+.*");

with no results...

What I'm expecting is bbbb.


Are you looking for a repeated capturing group?

(.)\1*

This will return two matches.

Given:

aaabbbb

This will result in:

aaa
bbbb

This:

(?<=(.))(?!\1).*

Uses the above principal, first checking that the finding the previous character, capturing it into a back reference, and then asserting that that character is not the next character.

That matches:

bbbb


I figured it out eventually:

Regex expression = new Regex("(?<=a+)[^a]+");

foreach (Match result in expression.Matches(@"aaabbbb"))
   MessageBox.Show(result.Value);

I must not allow the as to me matched by the non-lookbehind group. This way, the expression will only match those b repetitions that follow a repetitions.

Matching aaabbbb yields bbbb and matching aaabbbbcccbbbbaaaaaabbzzabbb results in bbbbcccbbbb, bbzz and bbb.


The reason the look-behind is skipping the "a" is because it is consuming the first "a" (but no capturing it), then it captures the rest.

Would this pattern work for you instead? New pattern: \ba+(.+)\b It uses a word boundary \b to anchor either ends of the word. It matches at least one "a" followed by the rest of the characters till the word boundary ends. The remaining characters are captured in a group so you can reference them easily.

string pattern = @"\ba+(.+)\b";

foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
    Console.WriteLine("Match: " + m.Value);
    Console.WriteLine("Group capture: " + m.Groups[1].Value);
}

UPDATE: If you want to skip the first occurrence of any duplicated letters, then match the rest of the string, you could do this:

string pattern = @"\b(.)(\1)*(?<Content>.+)\b";

foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
    Console.WriteLine("Match: " + m.Value);
    Console.WriteLine("Group capture: " + m.Groups["Content"].Value);
}
0

精彩评论

暂无评论...
验证码 换一张
取 消