I have two G729 encoded files, i took the pcm version of them. i want to measure the similarity between these two files. these files are binary files so how one can measure the similarity between binary files, i wrote a 开发者_如何学JAVAcode in C that takes patterns from the first one and search for similar ones in the second one, but i want to have a similarity measure.... i searched a lot in the literature, i found jaccard and the others but still can't dtermine which of them is eligible to my case. Thank in advance for your help..
Since you mention the files are audio files, it would be better to define a similarity measure based on audio characteristics rather than simply doing a binary comparison. A quick search brought up a research project called MusicMiner that you may want to look into for further ideas.
I had the same need and I came up with a solution that in my case work, but I cannot guaranty it is universal:
I took a library that creates the diff files. Given fileA and fileB this library creates a third file fileDiff that tell how to pass from fileA to fileB which bytes to copy and which to add. ( for more info about the format: http://www.w3.org/TR/NOTE-gdiff-19970901.html )
- I was working in Java so I used javaxdelta: http://javaxdelta.sourceforge.net/
- Here you can implement an interface that is called DiffWriter: http://javaxdelta.sourceforge.net/javadoc/com/nothome/delta/DiffWriter.html
- At the end you know how many byte are copied and how many are added to go from fileA to fileB
with a function I get a percentage. I know this is not 100% real, for example if u have fileB that is equal to half of fileA the similarity is of the function is 100%.
This is the DiffWriter implementation:
public class Distance implements DiffWriter {
private long newData = 0;
private long copiedData = 0;
@Override
public void flush() throws IOException {}
@Override
public void close() throws IOException {}
@Override
public void addData(byte arg0) throws IOException {
newData++;
}
@Override
public void addCopy(long arg0, int arg1) throws IOException {
copiedData += arg1;
}
public double getSimilarity() {
double a = (double) newData;
double c = (double) copiedData;
return (( c / (c + a) ) * 100.0);
}
}
Here is how I call it:
import com.nothome.delta.Delta;
File f1 = new File(...);
File f2 = new File(...);
Distance dw = new Distance();
try {
new Delta().compute(f1, f2, dw);
dw.getSimilarity();
} catch (Exception e) {
e.printStackTrace();
}
精彩评论