开发者

Algorithm for similarity?

开发者 https://www.devze.com 2023-03-14 06:20 出处:网络
If this may seem like a duplicate, I apologize, but as the previous question seemed to have stirred some confusions, here is another go.

If this may seem like a duplicate, I apologize, but as the previous question seemed to have stirred some confusions, here is another go.

I have 2 base arrays:

float[] baseArr1 = new float[3] {0.430651724, 0.137407839, 0.177024469};
float[] baseArr2 = new float[3] {0.718210936, 0.001312795, 0.009634903};

And another 2 arrays for comparison:

float[] compArr1 = new float[3] {1, 1, 1};
float[] compArr2 = new float[3] {1, 0, 0};

compArr1 and compArr2 are then compared with baseArr1 and baseArr2. I know the answer that I should get but I am having difficulty coming up with an algorithm to come up with the answer. When comparing to baseArr1, the answer should be compArr1 and when comparing to baseArr2, the answer should be compArr2.

Please note that the values of both baseArrs do not necessarily have to add up to 1. Additionally, here are two more concise arrays to try and make my point clearer:

float[] extraArr1 = new float[3] {.5, .3, .3};
float[] extraArr2 = new float[3] {.75, 0, 0};

In which extraArr1开发者_C百科 is 'closer' to compArr1 and extraArr2 is 'closer' to compArr2. I've tried the Cosine Similarity algorithm as suggested by some, but there are times in which the answer is incorrect.

The criteria is having 'more' of the value per element. For example, compArr1 has 'more' values that are closer to baseArr1 than compArr2 and compArr2 has greater 'closeness' to baseArr2 than compArr1 has to baseArr2.

Thank you!

UPDATE:

I got the answer! I'll be posting it here for future reference, I admit I had a lot of trouble and also gave confusion to other people but thanks also for trying to help me! Here is what I made:

float[] pbaseArrX = new float[3];
float[] pcompArrX = new float[3];

float dist1 = 0, dist2 = 0;

for (int i = 0; i < baseArrX.Count; i++)
{
  pbaseArrX[i] = baseArrX[i] / (baseArrX[0] + baseArrX[1] + baseArrX[2]);
}

//Do the following for both compArr1 and compArr2;
for (int i = 0; i < compArrX.Count; i++)
{
  pcompArrX[i] = pcompArrX[i] / (pcompArrX[0] + pcompArrX[1] + pcompArr[2]);
}

//Get distance for both
for (int i = 0; i < pcompArrX.Count; i++)
{
  distX = distX + ((pcompArrX[i] - pbaseArrX[i])^2);
}

//Then just use conditional to determine which is 'closer'


You want to find the closest - to baseArr1 - array from all compArrX arrays.

There are various distances that can be used. Most common are:

  • Euclidean distance

  • Minkowski distnce

  • Taxi-cab or Manhattan distance (this is Minkowski with p=1)

  • Chebysev distance (this is Minkowski with p=infinity)

and many others like:

  • Mahalanobis_distance which is scale invariant. If you search for statistics and correlation, you'll find more complex algorithms that perhaps fit to your problem. See wikipedia's Correlation_and_dependence

We can't know which one fits best your data model.


Another similarity (or dissimilarity) measure - Earth Mover's Distance

0

精彩评论

暂无评论...
验证码 换一张
取 消