I have two dataframes:
df1<- as.data.frame(matrix(1:15, ncol=5))
df2<- as.data.frame(matrix(30:44,ncol=5))
By using the two dataframes I want to calculate the zscore. The functions is:
z = (X - u)/ O
df1 contains all the X values, and each row of the df2 dataframe contains values to calculate the mean and the sd. I ge开发者_如何学Gonerate a loop that calculate for each value in the first column of df1 the z score. But now my question is: How can I calculate the z score for the whole dataframe?
test <- list()
for (i in 1:nrow(df1) {
zscore<- (df1[i,1] - (apply(df2[i,],1,mean))) / (apply(df2[i,],1,sd))
test[[i]] <- matrix(zscore)
i <- 1+1
}
Thank you all!
[I think you have the row/cols backwards here. z-scores are usually applied to variables, which R would expect to be in columns. What I write below follows the usual convention. Change accordingly if you really want to standardise by rows.]
sweep()
is your general purpose friend. We compute the means and standard deviations and then sweep (subtract in this case) them out of the data frame df1
:
## compute column means and sd
mns <- colMeans(df2) ## rowMeans if by rows
sds <- apply(df2, 2, sd) ## 2 -> 1 if by rows
## Subtract the respective mean from each column
df3 <- sweep(df1, 2, mns, "-") ## 2 -> 1 if by rows
## Divide by the respective sd
df3 <- sweep(df3, 2, sds, "/") ## 2 -> 1 if by rows
which gives:
R> df3
V1 V2 V3 V4 V5
1 -30 -30 -30 -30 -30
2 -29 -29 -29 -29 -29
3 -28 -28 -28 -28 -28
We can check this has worked by doing the computations for the first column of df3
in a vectorised fashion:
R> (df1[,1] - mean(df2[,1])) / sd(df2[,1])
[1] -30 -29 -28
For this particular situation, one can also use the scale()
function and supply your own center
and scale
, the respective means and standard deviations
R> scale(df1, center = mns, scale = sds)
V1 V2 V3 V4 V5
[1,] -30 -30 -30 -30 -30
[2,] -29 -29 -29 -29 -29
[3,] -28 -28 -28 -28 -28
attr(,"scaled:center")
V1 V2 V3 V4 V5
31 34 37 40 43
attr(,"scaled:scale")
V1 V2 V3 V4 V5
1 1 1 1 1
精彩评论