I need to compare two lists where each list contains about 60,000 objects. what would be the most efficient way of doing this? I want to select all the items that are in the source list that do not exist in the destination list.
I am creating a sync application where c# scans a directory and places the attributes of each file in a list. therefore there is a list for the source directory and another list for the destination directory. Then instead of copying all the files I will just compare the list and see which ones are different. If both list have the same file then I will not copy that file. Here is the Linq query that I use and it works when I scan a small folder but it does not when I scan a large folder.
// s.linst is the list of the source files
// d.list is the list of the files contained in the destination folder
var q = from a in s.lstFiles
from b in d.lstFiles
where
a.compareName == b.compareName &&
a.size == b.size &&
a.dateCreated == b.dateCreated
select a;
// create a list to hold the items that are the same later select the outer join
List<Classes.MyPathInfo.MyFile> tempList = new List<Classes.MyPathInfo.MyFile>();
foreach (Classes.MyPathInfo.MyFile file in q)
{
tempList.Add(file);
}
I don't know why this query takes forever. Also there are other things that I can take advantage. For example, I know that i开发者_StackOverflowf the source file matches a destination file, then it is impossible to have another duplicate with that file because it is not possible to have to file names with the same name and same path.
Create an equality comparer for the type, then you can use that to efficiently compare the sets:
public class MyFileComparer : IEqualityComparer<MyFile> {
public bool Equals(MyFile a, MyFile b) {
return
a.compareName == b.compareName &&
a.size == b.size &&
a.dateCreated == b.dateCreated;
}
public int GetHashCode(MyFile a) {
return
(a.compareName.GetHashCode() * 251 + a.size.GetHashCode()) * 251 +
a.dateCreated.GetHashCode();
}
}
Now you can use this with methods like Intersect
to get all items that exist in both lists, or Except
to get all items that exist in one list but not the other:
List<MyFile> tempList =
s.lstFiles.Intersect(d.lstFiles, new MyFileComparer()).ToList();
As the methods can use the hash code to divide the items into buckets, there are a lot less comparisons that needs to be done compared to a join where it has to compare all items in one list to all items in the other list.
LINQ has an Except()
method for this purpose. You can just use a.Except(b);
Use Except()
and read more about set operations with linq and set operations with HashSet
.
精彩评论