开发者

How to compare two folders for non identical files based on name?

开发者 https://www.devze.com 2023-01-29 20:36 出处:网络
I ha开发者_运维知识库ve two folders A and B..Inside A multiple files are there and inside B multiple files are there..I have to check files in A with files in B for non identical files...I tried like

I ha开发者_运维知识库ve two folders A and B..Inside A multiple files are there and inside B multiple files are there..I have to check files in A with files in B for non identical files...I tried like this it is giving whole search result...

var filesnotinboth = from f1 in dir1.GetFiles("*", SearchOption.AllDirectories)
                     from f2 in dir2.GetFiles("*",SearchOption.AllDirectories)
                     where f1.Name != f2.Name
                     select f1.Name;

Any suggestion?


Well, for one thing that approach is very inefficient - it's going to be calling dir2.GetFiles each time you start with a new f1. It's then going to give a match for every f2 which doesn't match the current f1. So even if it's going to match a later f1, it'll still be output. Imagine that dir1 contains A, B and C, and dir2 contains C and D. You'll end up like this:

f1    f2    Result of where?
 A     C    True
 A     D    True
 B     C    True
 B     D    True
 C     C    False
 C     D    True

So the result would be A, A, B, B, C - you'd still have C (which you didn't want) - just not quite as often as A and B.

You want to use set operations, like this:

var dir1Files = dir1.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => x.Name);

var dir2Files = dir2.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => x.Name);

var onlyIn1 = dir1Files.Except(dir2Files);

Now that should work, and more efficiently...

EDIT: I've assumed you want files in A but not in B, based on possibly an earlier version of the question. (I'm not sure whether it was edited in the first five minutes. Obviously the current code isn't going to return anything in B but not A.)

If you want the symmetric difference, use HashSet<T>.SymmetricExceptWith:

var inExactlyOneDirectory = new HashSet<string>(dir1Files);
inExactlyOneDirectory.SymmetricExceptWith(dir2Files);

(Note that I dislike the fact that SymmetricExceptWith is a void method which mutates the existing set, instead of returning a new set or just a sequence. Aside from anything else, it means the variable name is only appropriate after the second statement, not the first.)

EDIT: If you need uniqueness by name and size, you really need an anonymous type representing both. Unfortunately, it's then hard to create a HashSet<T> based on it. So you'll want an extension method like this:

public static HashSet<T> ToHashSet<T>(this IEnumerable<T> set)
{
    return new HashSet<T>(set);
}

Then:

var dir1Files = dir1.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => new { x.Name, x.Length });

var dir2Files = dir2.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => new { x.Name, x.Length });

var difference = dir1Files.ToHashSet();
difference.SymmetricExceptWith(dir2Files);


Jon Skeet's answer should help you with understanding why your current solution won't work, and is fundamentally inefficient.

As for solving the problem, one option would be to use the HashSet.SymmetricExceptWith method, which "modifies the current HashSet(Of T) object to contain only elements that are present either in that object or in the specified collection, but not both."

// Thanks to Jon Skeet for template
var dir1Files = dir1.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => x.Name);

var dir2Files = dir2.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => x.Name);

var filesNotInBoth = new HashSet<string>(dir1Files);

filesNotInBoth.SymmetricExceptWith(dir2Files);


var files2 = dir2.GetFiles("*", SearchOption.AllDirectories);
var filesnotinboth = dir1.GetFiles("*", SearchOption.AllDirectories)
                         .Where(f1 => !files2.Any(f2 => f2.Name == f1.Name));
0

精彩评论

暂无评论...
验证码 换一张
取 消