开发者

R: using ddply to apply functions to subsets of data

开发者 https://www.devze.com 2023-02-15 06:02 出处:网络
I\'m trying to use the ddply method to take a dataframe with various info about 3000 movies and then calculate the mean gross of each genre. I\'m new to R, and I\'ve read all the questions on here rel

I'm trying to use the ddply method to take a dataframe with various info about 3000 movies and then calculate the mean gross of each genre. I'm new to R, and I've read all the questions on here relating to ddply, but I still can't se开发者_运维知识库em to get it right. Here's what I have now:

> attach(movies)
> ddply(movies, Genre, mean(Gross))
Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress,  : 
.fun is not a function.

How am I supposed to write a function that takes the mean of the values in the "Gross" column for each set of movies, grouped by genre? I know this seems like a simple question, but the documentation is really confusing to me, and I'm not too familiar with R syntax yet.

Is there a method other than ddply that would make this easier?

Thanks!!


Here is an example using the tips dataset available in ggplot2

library(ggplot2);
mean_tip_by_day = ddply(tips, .(day), summarize, mean_tip = mean(tip/total_bill))

Hope this is useful


You probably don't need plyr for a simple operation like that. tapply() does the job easily and you won't need to load additional packages. The syntax also seems simpler than Ramnath's:

tapply(tips$tip, tips$day, mean)

Note that plyr is a fantastic tool for many tasks. To me, it just seems like overkill here.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号