开发者

MySQL Math - Is it possible to calculate a correlation in a query?

开发者 https://www.devze.com 2022-12-22 22:49 出处:网络
In a MySQL (5.1) database table开发者_开发百科 there is data that represents: how long a user takes to perform a task and

In a MySQL (5.1) database table开发者_开发百科 there is data that represents:

  • how long a user takes to perform a task and
  • how many items the user handled during the task.

Would MySQL support correlating the data or do I need to use PHP/C# to calcuate?

Where would I find a good formula to calculate correlation (it's been a long time since I last did this)?


Here's a rough implementation of the sample correlation coefficient as described in:

Wikipedia - Correlation and Dependence

create table sample( x float not null, y float not null );
insert into sample values (1, 10), (2, 4), (3, 5), (6,17);

select @ax := avg(x), 
       @ay := avg(y), 
       @div := (stddev_samp(x) * stddev_samp(y))
from sample;

select sum( ( x - @ax ) * (y - @ay) ) / ((count(x) -1) * @div) from sample;
+---------------------------------------------------------+
| sum( ( x - @ax ) * (y - @ay) ) / ((count(x) -1) * @div) |
+---------------------------------------------------------+
|                                       0.700885077729073 |
+---------------------------------------------------------+


Single-Pass Solution

There are two flavors of the Pearson correlation coefficient, one for a Sample and one for an entire Population. These are single-pass and, I believe, correct formulas for both:

-- Methods for calculating the two Pearson correlation coefficients
SELECT  
        -- For Population
        (avg(x * y) - avg(x) * avg(y)) / 
        (sqrt(avg(x * x) - avg(x) * avg(x)) * sqrt(avg(y * y) - avg(y) * avg(y))) 
        AS correlation_coefficient_population,
        -- For Sample
        (count(*) * sum(x * y) - sum(x) * sum(y)) / 
        (sqrt(count(*) * sum(x * x) - sum(x) * sum(x)) * sqrt(count(*) * sum(y * y) - sum(y) * sum(y))) 
        AS correlation_coefficient_sample
    FROM your_table;

I developed and tested this as T-SQL. The code that generated the test data didn't translate to MySQL but the formulas should. Make sure your x and y are decimals values; integer math can significantly impact these calcs.

0

精彩评论

暂无评论...
验证码 换一张
取 消