I am trying to calculate the divergence of character within a string but I really don't know how to apply the Kullback Divergence algorithm to a problem like this. Please can anyone explain the KLD algorithms that I can possibly use to tackle a problem 开发者_开发技巧like this.
Thanks
KL divergence is a metric that can give you something of a pseudo-distance between one distribution and another assuming they have similar domains (as in they assign probabilities to similar things.. bernoulli distribution gives probabilities to 0,1 coin flips, normal gives to real numbers, etc).
KL(distribution A, distribution B) is sort of a measure of how surprised I will be to be getting stuff sampled from A when I was expecting stuff sampled from B.
Its not really a distance metric because its not symmetric i.e. if, for a domain of [1,2,3,4,5], distribution A gives equal probability to all numbers but distribution B gives all probability to only 2, then KL(B, A) should be much lower than KL(A, B) because I will be a little surprised to see my uniform distribution always return the same number but I will be flabbergasted to see my only-2 distribution return something from [1,3,4,5] because those were deemed impossible by distribution B (probability 0).
Its not immediately clear to me how you are trying to use KL divergence to measure differences between strings. Please elaborate your question so that I can help you with figuring this out.
Wikipedia Article about KL - http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
精彩评论