开发者

Summary in R for frequency tables?

开发者 https://www.devze.com 2023-02-08 12:25 出处:网络
I have a set of user recommandations review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c(\"Star\",\"Votes\")))

I have a set of user recommandations

review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))

and wanted to use summary(review) to show开发者_高级运维 basic properties mean, median, quartiles and min max.

But it gives back the summary of both columns. I refrain from using data.frame because the factors 'Star' are ordered. How can I tell R that Star is a ordered list of factors numeric score and votes are their frequency?


I'm not exactly sure what you mean by taking the mean in general if Star is supposed to be an ordered factor. However, in the example you give where Star is actually a set of numeric values, you can use the following:

library(Hmisc)

R> review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))

R> wtd.mean(review[, 1], weights = review[, 2])
[1] 4.0625

R> wtd.quantile(review[, 1], weights = review[, 2])
  0%  25%  50%  75% 100% 
1.00 3.75 5.00 5.00 5.00 


I don't understand what's the problem. Why shouldn't you use data.frame?

rv <- data.frame(star = ordered(review[, 1]), votes = review[, 2])

You should convert your data.frame to vector:

( vts <- with(rv, rep(star, votes)) )
 [1] 5 5 5 5 5 5 5 5 5 5 4 4 3 2 1 1
Levels: 1 < 2 < 3 < 4 < 5

Then do the summary... I just don't know what kind of summary, since summary will bring you back to the start. O_o

summary(vts)
 1  2  3  4  5 
 2  1  1  2 10 

EDIT (on @Prasad's suggestion)

Since vts is an ordered factor, you should convert it to numeric, hence calculate the summary (at this moment I will disregard the background statistical issues):

nvts <- as.numeric(levels(vts)[vts])  ## numeric conversion
summary(nvts)  ## "ordinary" summary
fivenum(nvts)  ## Tukey's five number summary


Just to clarify -- when you say you would like "mean, median, quartiles and min/max", you're talking in terms of number of stars? e.g mean = 4.062 stars? Then using aL3xa's code, would something like summary(as.numeric(as.character(vts))) be what you want?

0

精彩评论

暂无评论...
验证码 换一张
取 消