开发者

Regular Expressions: Match up to an optional word

开发者 https://www.devze.com 2023-01-13 03:58 出处:网络
Need to match the first part of a sentence, up to a given word. However, that word is optional, in which case I want to match the whole sentence.For example:

Need to match the first part of a sentence, up to a given word. However, that word is optional, in which case I want to match the whole sentence. For example:

I have a sentence with a clause I don't want.

I have a sentence and I like it.

In the first case, I want "I have a sentence". In the second case, I want "I have a sentence and I like it."

Lookarounds will give me the first case, but as soon as I try to make it optional, to cover the second case, I get the whole first sentence. I've tried making the expression lazy... no dice.

The code that works for the first case:

var regEx = new Regex(@".*(?=with)");
string matchstr = @"I have a sentence with a clause I don't want";

if (regEx.IsMatch(matchstr)) {
    Console.WriteLine(regEx.Match(matchstr).Captures[0].Value);
    Console.WriteLine("Matched!");
}
else {
    Console.Write开发者_如何学编程Line("Not Matched : (");
}

The expression that I wish worked:

var regEx = new Regex(@".*(?=with)?");

Any suggestions?


There are several ways to do this. You could do something like this:

^(.*?)(with|$)

The first group is matched reluctantly, i.e. as few characters as possible. We have an overall match if this group is followed by either with or the end of the line $ anchor.

Given this input:

I have a sentence with a clause I don't want.
I have a sentence and I like it.

Then there are two matches (as seen on rubular.com):

  • Match 1:
    • Group 1: "I have a sentence "
    • Group 2: "with"
  • Match 2:
    • Group 1: "I have a sentence and I like it".
    • Group 2: "" (empty string)

You can make the grouped alternation non-capturing with (?:with|$) if you don't need to distinguish the two cases.

Related questions

  • Difference between .*? and .* for regex


If I understand your need correctly, you want to match either the sentence up to the word 'with', or, if it's not there, match the entire thing? Why not write the regexp to explicitly look for the two cases?

/(.*) with |(.*)/

Wouldn't this get both cases?


string optional = "with a clause I don't want" 
string rx = "^(.*?)" + Regex.Escape(optional) + ".*$";

// displays "I have a sentence"
string foo = "I have a sentence with a clause I don't want.";
Console.WriteLine(Regex.Replace(foo, rx, "$1"));

// displays "I have a sentence and I like it."
string bar = "I have a sentence and I like it.";
Console.WriteLine(Regex.Replace(bar, rx, "$1"))

If you don't need the complex matching provided by a regex then you could use a combination of IndexOf and Remove. (And obviously you could abstract the logic away into a helper and/or extension method or similar):

string optional = "with a clause I don't want" 

// displays "I have a sentence"
string foo = "I have a sentence with a clause I don't want.";
int idxFoo = foo.IndexOf(optional);
Console.WriteLine(idxFoo < 0 ? foo : foo.Remove(idxFoo));

// displays "I have a sentence and I like it."
string bar = "I have a sentence and I like it.";
int idxBar = bar.IndexOf(optional);
Console.WriteLine(idxBar < 0 ? bar : bar.Remove(idxBar));
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号