I am trying to do a multivarible (9 variables) linear regression on data in my mysql 5.0 database (the result value field only has 2 开发者_JS百科possible values, 1 and 0).
I've done some searching and found I can use:
mysql> SELECT
-> @n := COUNT(score) AS N,
-> @meanX := AVG(age) AS "X mean",
-> @sumX := SUM(age) AS "X sum",
-> @sumXX := SUM(age*age) "X sum of squares",
-> @meanY := AVG(score) AS "Y mean",
-> @sumY := SUM(score) AS "Y sum",
-> @sumYY := SUM(score*score) "Y sum of square",
-> @sumXY := SUM(age*score) AS "X*Y sum"
To get at many of the basic regression variables, but I really don't want to type out doing this for every combination of the 9 variables. All of the sources I can find about how to do regression on multi variables requires Matrix operations. Can I do Matrix operations with mysql, or are there other ways to do a 9 variable linear regression?
Should I export the data out of mysql first? Its ~80,000 rows, so it would be alright to move it, just not sure what else I should use.
Thanks, Dan
It is good to store this data in MySQL but you could process the data from a language that has access to the database. Pseudocode:
variables = [ 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' ];
for X in $variables do
for Y in $variables do
query = 'SELECT
@'+$X+$Y+' := COUNT(score) AS '+$X+$Y+',
@mean'+$X+' := AVG(age) AS "X mean",
@sum'+$X+' := SUM(age) AS "X sum",
@sum'+$X+$X+' := SUM(age*age) "X sum of squares",
@mean'+$Y+' := AVG(score) AS "Y mean",
@sum'+$Y+' := SUM(score) AS "Y sum",
@sum'+$Y+$Y+' := SUM(score*score) "Y sum of square",
@sum'+$X+$Y+' := SUM(age*score) AS "X*Y sum"';
db_execute(query);
done
done
but why do not store the results in a table? More appropriate for a database.
for X in $variables do
for Y in $variables do
query = 'INSERT INTO regression SELECT FROM measurements
"'+$X+'" AS X
"'+$Y+'" AS Y
score AS valX
age AS valY
COUNT(score) AS N,
AVG(age) AS meanX,
SUM(age) AS sumX,
SUM(age*age) squareX,
AVG(score) AS meanY,
SUM(score) AS sumY,
SUM(score*score) squareY,
SUM(age*score) AS sumXY';
db_execute(query);
done
done
Put separate index on both X and the Y columns.
I would reccomend moving the data out of MySQL and into R. With 1/0 response data a logistic regression is much more appropriate and it is not the simple sum of squares you are implementing.
http://en.wikipedia.org/wiki/Logistic_regression
This seems to do a good job of showing how to solve the logistic
http://www.omidrouhani.com/research/logisticregression/html/logisticregression.htm#_Toc147483467
精彩评论