As you know, with an option CORRB
, you can let logistic regression or linear regression in SAS to output a correlations of estimates matrix. Interestingly, I am not sure how to read this matrix. I have two variables which are clearly strongly positive correlated. From PROC CORR
, I can see the pearson correlation coefficient of开发者_如何学Go these two variables is 0.7+
. But the estimates matrix from both logitistic regression and linear regression give me -0.7. The strengh of the correlation is about similar but the sign is reversed. Anyone can explain it? Many thanks.
You are reading the values correctly, they just mean different things. PROC CORR gives you the correlation of the variables, while CORRB is the correlation of the coefficients of these variables in the model.
Here is an intuitive explanation of why positively correlated predictors will have negatively correlated coefficients. Suppose y = a + b1*x1 + b2*x2 + eps
. If you increase b1
a little from its best value obtained from the regression, then the predicted value for y
will also increase (for positive x1
) and will make the overall fit worse. One way to compensate for that and move the predicted values closer to the observed ones is to decrease b2
: since high values of x1
are associated with high values of x2
, you will get back close to the original fit. This shows that the uncertainty in b2
is negatively correlated with the uncertainty in b1
: increasing one while decreasing the other will lead to similar fits.
It might be instructive to look at the extreme case of perfect correlation: x2=x1
. Then the following will give you exactly the same predictions:
y = 1 + 2*x + 3*x
y = 1 + 3*x + 2*x
y = 1 + 9*x + (-4)*x
etc
So b2 = 5-b1
and the coefficients have a perfect negative correlation.
精彩评论