I have the following setup to analyse: We have about 150 subjects, and for each subject we perf开发者_StackOverflowormed a pair of tests (under different conditions) 18 times. The 18 different conditions of the test are complementary, in such a way so that if we where to average over the tests (for each subject), we would get no correlation between the tests (between subjects). What we wish to know is the correlation (and P value) between the tests, in within subjects, but over all the subjects.
The way I did this by now was to perform the correlation for each subject, and then look at the distribution of the correlations received so to see if it's mean is different then 0. But I suspect there might be a better way for answering the same question (someone said to me something about "geographical correlation", but a shallow search didn't help).
p.s: I understand there might be a place here to do some sort of mixed model, but I would prefer to present a "correlation", and am not sure how to extract such an output from a mixed model.
Also, here is a short dummy code to give an idea of what I am talking about:
attach(longley)
N <- length(Unemployed)
block <- c(
rep( "a", N),
rep( "b", N),
rep( "c", N)
)
Unemployed.3 <- c(Unemployed + rnorm(1),
Unemployed + rnorm(1),
Unemployed + rnorm(1))
GNP.deflator.3 <- c(GNP.deflator + rnorm(1),
GNP.deflator + rnorm(1),
GNP.deflator + rnorm(1))
cor(Unemployed, GNP.deflator)
cor(Unemployed.3, GNP.deflator.3)
cor(Unemployed.3[block == "a"], GNP.deflator.3[block == "a"])
cor(Unemployed.3[block == "b"], GNP.deflator.3[block == "b"])
cor(Unemployed.3[block == "c"], GNP.deflator.3[block == "c"])
(I would like to somehow combine the last three correlations...)
Any ideas will be welcomed.
Best, Tal
I agree with Tristan - you are looking for ICC. The only difference from standard implementations is that the two raters (tests) evaluate each subject repeatedly. There might be an implementation that allows that. In the meanwhile here is another approach to get the correlation.
You can use "general linear models", which are generalizations of linear models that explicitly allow correlation between residuals. The code below implements this using the gls
function of the nlme
package. I am sure there are other ways as well. To use this function we have to first reshape the data into a "long" format. I also changed the variable names to x
and y
for simplicity. I also used +rnorm(N)
instead of +rnorm(1)
in your code, because that's what I think you meant.
library(reshape)
library(nlme)
dd <- data.frame(x=Unemployed.3, y=GNP.deflator.3, block=factor(block))
dd$occasion <- factor(rep(1:N, 3)) # variable denoting measurement occasions
dd2 <- melt(dd, id=c("block","occasion")) # reshape
# fit model with the values within a measurement occasion correlated
# and different variances allowed for the two variables
mod <- gls(value ~ variable + block, data=dd2,
cor=corSymm(form=~1|block/occasion),
weights=varIdent(form=~1|variable))
# extract correlation
mod$modelStruct$corStruct
In the modeling framework you can use a likelihood ratio test to get a p-value. nlme
can also give you a confidence interval:
mod2 <- gls(value ~ variable + block, data=dd2,
weights=varIdent(form=~1|variable))
anova(mod, mod2) # likelihood-ratio test for corr=0
intervals(mod)$corStruct # confidence interval for the correlation
If I understand your question correctly, you are interested in computing the intraclass correlation between multiple tests. There is an implementation in the psy package, although I have not used it.
If you want to perform inference on the correlation estimate, you could bootstrap the subjects. Just make sure to keep the tests together for each sample.
I'm no expert, but this looks to me like what you want. It's automated, short to code, gives the same correlations as your example above, and produces p-values.
> df = data.frame(block=block, Unemployed=Unemployed.3,
+ GNP.deflator=GNP.deflator.3)
> require(plyr)
Loading required package: plyr
> ddply(df, "block", function(x){
+ as.data.frame(
+ with(x,cor.test(Unemployed, GNP.deflator))[c("p.value","estimate")]
+ )})
block p.value estimate
1 a 0.01030636 0.6206334
2 b 0.01030636 0.6206334
3 c 0.01030636 0.6206334
To see all the details, do this:
> dlply(df, "block", function(x){with(x,cor.test(Unemployed, GNP.deflator))})
$a
Pearson's product-moment correlation
data: Unemployed and GNP.deflator
t = 2.9616, df = 14, p-value = 0.01031
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1804410 0.8536976
sample estimates:
cor
0.6206334
$b
Pearson's product-moment correlation
data: Unemployed and GNP.deflator
t = 2.9616, df = 14, p-value = 0.01031
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1804410 0.8536976
sample estimates:
cor
0.6206334
$c
Pearson's product-moment correlation
data: Unemployed and GNP.deflator
t = 2.9616, df = 14, p-value = 0.01031
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1804410 0.8536976
sample estimates:
cor
0.6206334
attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
block
1 a
2 b
3 c
精彩评论