I love the ability of plyr to split a data frame into multiple data sets and then perform identical operations on each set. The best part is when it shows you the result as a neat compact well labeled table. I love throwing a bunch of calculations into a single line using each(). However, I do not understand why using the summarise function in the ddply argument scuttles the output and makes it come out long and unlabeled. Have a look here to see what I mean. Can you tell me what I am doing wrong? I prefer to use summarise.
Let us first set up an example data frame. Imagine that you had 60 participants in a study. 20 of them were funny, 20 were clever and 20 were nice. Then each subject received a score.
type<-rep(c("funny","clever", "nice"),20)
score<-rnorm(60)+10
data<-data.frame(type,score)
Now I want a table showing the mean score, median score, minimum score and maximum score for each of the 3 types of people
d开发者_运维问答dply(data,.(type), summarise, each(mean,median,min,max)(score))
The line above should have given a nice table (3 rows - 1 for each type, and 4 columns of data). Alas it gives a whole long table with only one column of numbers, none of which are labeled.
ddply(data,.(type), function(jjkk) each(mean,median,min,max)(jjkk$score))
The above line gives me what I want. Can you explain what I am not understanding about the syntax of ddply.
Spelling out the functions, as in:
ddply(data,"type", summarise, mean=mean(score),median=median(score),max=max(score),min=min(score))
produces output in the format you desired.
I think your problem is that each()
is returning a vector, which summarize()
isn't really handling in the way you intend it to.
Hmmm... I'm too tired to think about a one-liner, but reshape
will do the trick:
library(reshape)
library(plyr)
mdtf <- melt(data)
cast(mdtf, type ~ ., each(min, max, mean, median))
type min max mean median
1 clever 7.808648 12.08930 10.125563 10.27269
2 funny 8.302777 12.04066 9.941331 10.07333
3 nice 8.442508 11.80132 10.085667 10.07261
精彩评论