I found from one of my proc logistic report that a certain variable is highly correlated with the intercept. How can I interpret it? What should I change to amend this correlation?
EDIT: Try to ask this q开发者_如何学运维uestion in a more theroetical point of view. In estimate correlation analysis output from most logistic regression package, what does it mean if you see the intercept estimate is highly correlated with a certain variable? How would you deal with such a situation? Hopefully this is a clearer way of asking the question. Thank you very much everyone.
A positive correlation between the coefficient of the intercept and covariate means that the bulk of your covariate values are negative (or vice versa: negative correlation will be seen with positive values).
This is not restricted to logistic regression, and might be easier to see with linear regression. Think of the scatterplot of your values as a blob to the right of the y-axis, and draw the best fitting linear regression line. Now increase both its y-intercept and slope a bit: if the "blob" is far enough, the line will completely miss it. So you can't move both parameters in the same direction while getting a reasonably fitting line. In other words the estimates are negatively correlated.
In practice, this is not a big deal. It is true that the estimate of the intercept will have a high variability, but that is not surprising if the bulk of your data is away from 0. Often x=0 is not meaningful, so you don't even care about the intercept. If you just can't bear to see those large correlations, just center your x variable. The y-axis will move to the middle of your data, and the correlation will magically vanish. Of course, the meaning of the intercept changes as well, but that is often desirable.
精彩评论