开发者

C# word boundary regex instead of .Contains() needed

开发者 https://www.devze.com 2023-01-21 04:24 出处:网络
I have a list: var myList = new List<string> { \"red\", \"blue\", \"green\" }; I have a string: var myString = \"Alfred has a red and blue tie\";

I have a list:

var myList = new List<string> { "red", "blue", "green" };

I have a string:

var myString = "Alfred has a red and blue tie";

I am trying to get a count of matches of words in myList within myString. Currently, I am using .Contains(), which gets me a count of 3 becaus开发者_如何学运维e it is picking up the "red" in "Alfred". I need to be able to osolate words instead. How can this be achieved?

var count = myList.Where(ml => myString.Contains(ml)); // gets 3, want 2


        var myList = new List<string> { "red", "blue", "green" };
        Regex r = new Regex("\\b(" + string.Join("|", myList.ToArray()) + ")\\b");
        MatchCollection m = r.Matches("Alfred has a red and blue tie");

m.Count will give you the number of times red, blue or green are found. \b specifies word boundary.

Each element of m is of Type Match, and you can look at each index to get more info (ie m[0].Value gives you the matched string (red) and m[0].Index gives you the location in the original string (13)).


var count = (from s in myList
            join ms in myString.Split() on s equals ms
            select new { s, ms }).Count();


Something like this?

var numMatches = myString.Split().Intersect(myList).Count();

Note that this doesn't consider duplicate occurrences.

If you do want to consider duplicates, go with @Justin Niessner's technique. Here's an alternative, with an intermediary lookup:

var words = myString.Split().ToLookup(word => word);
var numMatches = myList.Sum(interestingWord => words[interestingWord].Count());


this works \bred\b|\bblue\b|\bgreen\b I am not sure it is most optimized

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号