Check if one collection of values contains another_问答_开发者

Suppose I have two collections as follows:

Collection1: "A1" "A1" "M1" "M2"

Collection2: "M2" "M3" "M1" "A1" "A1" "A2"

all the values are string values. I want to know if all the elements in Collection1 are contained in Collection2, but I have no guara开发者_开发问答ntee on the order and a set may have multiple entries with the same value. In this case, Collection2 does contain Collection1 because Collection2 has two A1's, M1 and M2. Theres the obvious way: sorting both collections and popping off values as i find matches, but I was wondering if there's a faster more efficient way to do this. Again with the initial collections I have no guarantee on the order or how many times a given value will appear

EDIT: Changed set to collection just to clear up that these aren't sets as they can contain duplicate values

The most concise way I know of:

//determine if Set2 contains all of the elements in Set1
bool containsAll = Set1.All(s => Set2.Contains(s));

Yes, there is a faster way, provided you're not space-constrained. (See space/time tradeoff.)

The algorithm:

Just insert all the elements in Set2 into a hashtable (in C# 3.5, that's a HashSet<string>), and then go through all the elements of Set1 and check if they're in the hashtable. This method is faster (Θ(m + n) time complexity), but uses O(n) space.

Alternatively, just say:

bool isSuperset = new HashSet<string>(set2).IsSupersetOf(set1);

Edit 1:

For those people concerned about the possibility of duplicates (and hence the misnomer "set"), the idea can easily be extended:

Just make a new Dictionary<string, int> representing the count of each word in the super-list (add one to the count each time you see an instance of an existing word, and add the word with a count of 1 if it's not in the dictionary), and then go through the sub-list and decrement the count each time. If every word exists in the dictionary and the count is never zero when you try to decrement it, then the subset is in fact a sub-list; otherwise, you had too many instances of a word (or it didn't exist at all), so it's not a real sub-list.

Edit 2:

If the strings are very big and you're concerned about space efficiency, and an algorithm that works with (very) high probability works for you, then try storing a hash of each string instead. It's technically not guaranteed to work, but the probability of it not working is pretty darn low.

The problem I see with the HashSet, Intersect, and other Set theory answers is that you do contain duplicates, and "A set is a collection that contains no duplicate elements". Here's a way to handle the duplicate cases.

var list1 = new List<string> { "A1", "A1", "M1", "M2" };
var list2 = new List<string> { "M2", "M3", "M1", "A1", "A1", "A2" };

// Remove returns true if it was able to remove it, and it won't be there to be matched again if there's a duplicate in list1
bool areAllPresent = list1.All(i => list2.Remove(i));

EDIT: I renamed from Set1 and Set2 to list1 and list2 to appease Mehrdad.

EDIT 2: The comment implies it, but I wanted to explicitly state that this does alter list2. Only do it this way if you're using it as a comparison or control but don't need the contents afterwards.

Check out linq...

string[] set1 = {"A1", "A1", "M1", "M2" };
string[]  set2 = { "M2", "M3", "M1", "A1", "A1", "A2" };

var matching = set1.Intersect(set2);

foreach (string x in matching)
{
    Console.WriteLine(x);
}

Similar one

string[] set1 = new string[] { "a1","a2","a3","a4","a5","aa","ab" };
string[] set2 = new string[] {"m1","m2","a4","a6","a1" };

var a = set1.Select(set => set2.Contains(set));