开发者

Logic for a file compare

开发者 https://www.devze.com 2022-12-11 19:24 出处:网络
I trying to write a programm for file compare. For example: file1 1 2 3 4 5 file2 1 2 @ 3 4 5 If I do it line by line, I get:

I trying to write a programm for file compare. For example:

file1

1
2
3
4
5

file2

1
2
@
3
4
5

If I do it line by line, I get:

1 == 1; 
2 == 2;
3 != @;
4 != 3;
5 != 4;
  != 5;

But, the truth is that the only difference between the files is @. I want get开发者_运维百科 something like this:

1 == 1;
2 == 2;
  != @;
3 == 3;
4 == 4;
5 == 5;

Which is the best way to do it? without using any external application, such as diff, fc, etc.


I wonder if Levenshtein Distance would help you in this situation. It would give you how similar the two files are but I don't know if you could zero in on the @. Something to look at none the less.


Python has a very handy library for comparing sequences called difflib. The underlying SequenceMatcher class takes two python sequences and gives you (among other things) a sequence of opcodes telling you how you would get from the first sequence to the second (i.e. the differences). These are of the form:

  • Replace this block with that one
  • Insert a block
  • Delete a block
  • Copy a block (called 'equal')

These reference blocks by giving indices into the original sequences. This can be applied to lines in a file or characters in a string or anything else you can turn into a sequence in python.


I believe what you're looking for is the distance between 2 strings, maybe this can help you.


If you are not writing the program to learn something about diff algorithms but are simply looking for a solution, you should try diff-match-patch. It contains implementations of diff and patch algorithms in different programming languages (cpp, c#, java, javascript, python).

I tried its java version and it worked like a charm.


A bit out of date, I suppose :) but I came across this post because I was looking for help on the same problem: I have two files which I display side by side, and I have to mark the lines that don't match in red.

Mine is a little bit of a special case, though, because 1) order is not important, and 2) each line is guaranteed to occur only once (the text is a license file with definitions, line by line).

It turned out that the easiest way of doing it was just to make lists of the two files, ls1 and ls2, and do the following (in pseudocode):

i = 0;
while (i < ls1.count) {
    n = ls2.find(ls1[i]);
    if (n >= 0) {
        // found match in ls2
        ls1.Delete(i);
        ls2.Delete(n);
    } else
        i++;
}

Explained, for each line is ls1, see if there is a corresponding line in ls2. If so, delete both. What you're left with is simply the differences, and you can easily mark up those lines in the original text.

Extremely easy, no libraries included. Just my two cents...

0

精彩评论

暂无评论...
验证码 换一张
取 消