开发者

How does one make summarise from plyr output wide rather than long

开发者 https://www.devze.com 2023-02-15 18:06 出处:网络
I love the ability of plyr to split a data frame into multiple data sets and then perform identical operations on each set. The best part is when it shows you the result as a neat compact well labeled

I love the ability of plyr to split a data frame into multiple data sets and then perform identical operations on each set. The best part is when it shows you the result as a neat compact well labeled table. I love throwing a bunch of calculations into a single line using each(). However, I do not understand why using the summarise function in the ddply argument scuttles the output and makes it come out long and unlabeled. Have a look here to see what I mean. Can you tell me what I am doing wrong? I prefer to use summarise.

Let us first set up an example data frame. Imagine that you had 60 participants in a study. 20 of them were funny, 20 were clever and 20 were nice. Then each subject received a score.

type<-rep(c("funny","clever", "nice"),20)
score<-rnorm(60)+10
data<-data.frame(type,score)

Now I want a table showing the mean score, median score, minimum score and maximum score for each of the 3 types of people

d开发者_运维问答dply(data,.(type), summarise, each(mean,median,min,max)(score))

The line above should have given a nice table (3 rows - 1 for each type, and 4 columns of data). Alas it gives a whole long table with only one column of numbers, none of which are labeled.

ddply(data,.(type), function(jjkk) each(mean,median,min,max)(jjkk$score))

The above line gives me what I want. Can you explain what I am not understanding about the syntax of ddply.


Spelling out the functions, as in:

ddply(data,"type", summarise, mean=mean(score),median=median(score),max=max(score),min=min(score))

produces output in the format you desired.

I think your problem is that each() is returning a vector, which summarize() isn't really handling in the way you intend it to.


Hmmm... I'm too tired to think about a one-liner, but reshape will do the trick:

library(reshape)
library(plyr)
mdtf <- melt(data)
cast(mdtf, type ~ ., each(min, max, mean, median))
    type      min      max      mean   median
1 clever 7.808648 12.08930 10.125563 10.27269
2  funny 8.302777 12.04066  9.941331 10.07333
3   nice 8.442508 11.80132 10.085667 10.07261
0

精彩评论

暂无评论...
验证码 换一张
取 消