Does anyone know a slick way to order the results coming out of a ddply summarise operation?
This is what I'm doing to get the output ordered by descending depth.
ddims <- ddply(diamonds, .(color), summarise, depth = mean(depth), table = mean(table))
ddims <- ddims[order(-ddims$depth),]
With output...
> ddims
color depth table
7 J 61.88722 57.81239
6 I 61.84639 57.57728
5 H 61.83685 57.51781
4 G 61.75711 57.28863
1 D 61.69813 57.40459
3 F 61.69458 57.43354
2 E 61.66209 57.49120
Not too ugly, but I'm hoping for a way do it nicely within ddply(). Anyone know how?
Hadley's ggplot2 book has this example for ddply an开发者_如何学JAVAd subset but it's not actually sorting the output, just selecting the two smallest diamonds per group.
ddply(diamonds, .(color), subset, order(carat) <= 2)
I'll use this occasion to advertise a bit for data.table
, which is faster to run and (in my perception) at least as elegant to write:
library(data.table)
ddims <- data.table(diamonds)
system.time(ddims <- ddims[, list(depth=mean(depth), table=mean(table)), by=color][order(depth)])
user system elapsed
0.003 0.000 0.004
By contrast, without ordering, your ddply
code already takes 30 times longer:
user system elapsed
0.106 0.010 0.119
With all the respect I have for Hadley's excellent work, e.g. on ggplot2
, and general awesomeness, I must confess that for me, data.table
entirely replaced ddply
-- for speed reasons.
Yes, to sort you can just nest the ddply
in another ddply
. Here's how you would use ddply
to sort on one column, for example your table
column:
ddimsSortedTable <- ddply(ddply(diamonds, .(color),
summarise, depth = mean(depth), table = mean(table)), .(table))
color depth table
1 G 61.75711 57.28863
2 D 61.69813 57.40459
3 F 61.69458 57.43354
4 E 61.66209 57.49120
5 H 61.83685 57.51781
6 I 61.84639 57.57728
7 J 61.88722 57.81239
If you are using dplyr
, I would recommend taking advantage of the %.%
operator, which reads to more intuitive code.
data(diamonds, package = 'ggplot2')
library(dplyr)
diamonds %.%
group_by(color) %.%
summarise(
depth = mean(depth),
table = mean(table)
) %.%
arrange(desc(depth))
A bit late to the party, but things might be a bit different with dplyr. Borrowing crayola's solution for data.table:
dat1 <- microbenchmark(
dtbl<- data.table(diamonds)[, list(depth=mean(depth), table=mean(table)), by=color][order(- depth)],
dplyr_dtbl <- arrange(summarise(group_by(tbl_dt(diamonds),color), depth = mean(depth) , table = mean(table)),-depth),
dplyr_dtfr <- arrange(summarise(group_by(tbl_df(diamonds),color), depth = mean(depth) , table = mean(table)),-depth),
times = 20,
unit = "ms"
)
The results show that dplyr with tbl_dt is a bit slower than the data.table approach. However, dplyr with data.frame is faster:
expr min lq median uq max neval
data.table 9.606571 10.968881 11.958644 12.675205 14.334525 20
dplyr_data.table 13.553307 15.721261 17.494500 19.544840 79.771768 20
dplyr_data.frame 4.643799 5.148327 5.887468 6.537321 7.043286 20
Note: I have obviously changed the names so the microbenchmark results are more readable
精彩评论