I'm using the GSL library 1.14 and the ruby wrapper (gsl) for some math calculation. One thing that I need is the Pearson correlation. But I have a problem when 0 in my array.
For example I have this snippet of code:
x = [1,2,2,2,12]
y = [1,2,1,3,33]
puts GSL::Stats::correlation(
GSL::Vector.alloc(x),GSL::Vector.alloc(y)
)
=> 0.9967291641974002
But when I try to calculate it with the following array values, I get an NaN:
x = [1,1,1]
y = [1,1,1]
or
x = [0,1,1]
y = [1,1,1]
puts GSL::Stats::correlation(
GSL::Vector.alloc(x),GSL::Vector.alloc(y)
)
=> NaN
And when I try with this values, it works:
x = [0,1,1]
y = [1,0,1]
puts GSL::Stats::correlation(
GSL::Vector.alloc(x),GSL::Vector.alloc(y)
)
=> -0.开发者_JS百科5
Does anybody know why? this is very strange, isn't it?
I do not know the GSL implementation, but in general, the calculation of the Pearson correlation coefficient involves dividing through both standart deviations so if any of them is 0, the calculation fails. The standart deviation is 0 if all vector elements are equal. All of your failing examples have one vector with equal elements. I hope this answers your question.
theoretically correlation means finding the relation between two data sets.it could be positive or negative depending on the pattern of the datasets.but what i wanted to convey is when you have 0 as one of the element of your data sets,you cannot correlate the quantity 0 with other non-zero element of the other data set.that is why it is giving NaN.
精彩评论