开发者

R: mean when conditions got NA

开发者 https://www.devze.com 2023-03-08 13:40 出处:网络
Thanks for the previous posts and professional responses.I can almost do my analysis, except those conditions with NA.Here is my data.frame and code used.Could you mind to teach me how to solve the pr

Thanks for the previous posts and professional responses. I can almost do my analysis, except those conditions with NA. Here is my data.frame and code used. Could you mind to teach me how to solve the problem when condition contains NA value?

 df1 <- data.frame(A = c(1,2,4, 5), B=c(1,3,NA,1), C=c(1,1,3, NA), D=c(1,1,2,2))

Using this code, I get df1 as follows:

  A  B  C D
1 1  1  1 1
2 2  3  1 1
3 4 NA  3 2
4 5  1 NA 2

With the helps from Andrie, Sacha Epskamp and Chase (R: get average column A based on a range of values in column B), I got mean values of A when D is between 1 and 3, i.e. 2 in this case, with this code.

mean(df1$A[df1$D>1 & df1$D<3])

I got my answer as 4.5 as expected (averge of 4 and 5 in column A).

However, when I replace column D to column C, which contains NA. My answer could only be NA. while I was expecting to see the answer to be average 1 and 2, by neglecting the 3rd row (larger than 2) and the 4th row (with NA) in column C.

mean(df1$A[df1$C>0 & df1$C<2])

R> NA (i expect the count to be 1.5)

I know na.omit can remove all rows with na in any entries in df1. However, I prefer not to do so, as I would also like to get the mean and counts for every columns, when one columns' entry is NA. (e.g. I also want to do mean(df1$A, [is.na(df1$C)]) analysis.

I also tried to test using na.rm=T in the condition part, but R did not recognize it, as now the NA is in the condition part. For instance:

mean(df1$A[df1$C>0 & df1$C<2, na.rm=T])

Error in df1$A[df1$C > 0 &a开发者_如何学Pythonmp; df1$C < 2, na.rm = T] :
  incorrect number of dimensions

I believe there are smarter way of doing this. Pls kindly advice.


The reason why you were getting an error stating incorrect number of dimensions was because the na.rm=TRUE was inside the square brackets. Thus, R was interpreting this as being the 3rd dimension of an object such as a dataframe, matrix, etc. If the na.rm=TRUE is placed outside, it works fine.

mean(df1$A[df1$C>0 & df1$C<2],na.rm=TRUE)
[1] 1.5
0

精彩评论

暂无评论...
验证码 换一张
取 消