I have a dataframe which resembles this- for example, 2 columns and multiple rows:
A 2
A 7
B 1
B 3
B 6
C 2
I want to do some operations on the items in column two within each unique value of column 1.
I have
unique.values <- sort(unique(mydata[,1]))
This part works for getting each unique value, but I don't know how to associate each unique factor with the values that it takes in column two. I need to be able to operate on each one entirely independently and want to be able to count rows etc. Tried using grep, but couldn't m开发者_运维百科ake that work.
Thank you for any help you can give!
Not entirely following your question, but I think this is what you want:
df <- data.frame(read.table(textConnection("
A 2
A 7
B 1
B 3
B 6
C 2")))
library(plyr)
ddply(df, .(V1), nrow)
There are numerous ways to do this kind of thing, so you will need to provide more detail about what you're trying to do if you want a better answer.
Edit
In general, if you have a set of unique values and you want to apply a function to them based on that set, then you can do this with some version of an apply
function. For example, in the example above, here are a few different ways to get the average value based on the first column:
ddply(df, .(V1), function(x) data.frame(mean=mean(x[,2])))
do.call("rbind", by(df, df[,1], function(x) data.frame(mean=mean(x[,2]))))
do.call("rbind", lapply(unique(df[,1]), function(a) data.frame(V1=a, mean=mean(df[df[,1]==a,2]))))
The ave() function or tapply functions will do what you want. It depends one what you want for output. If you want the output vector to be as long as the input vector ave(), but if you want to reduce the data to the levels of your grouping vector tapply().
ave(mydata[,2], mydata[,1], FUN = length) #FUN can be any function
Or, for the reduced version...
tapply(mydata[,2], mydata[,1], FUN = length) #FUN can be any function
Another possibility, using the df
of Shane:
aggregate(df[,2],list(df[,1]),FUN=length)
again, replace length
by any other function that works on vectors. You can specify more than one factor in the list
, then it will do so for every factor combination.
The difference with ave()
is that ave()
gives a vector with the length of the original dataframe. aggregate()
returns a data frame where one variable is the group indicator. tapply()
returns a vector with the length equal to the number of groups. ddply()
returns a data frame with a variable for every specified factor.
The by()
construct is especially useful if you have to do operations on multiple columns, as it is basically a loop over data frames. It returns a list, that can be converted using Shanes construct, or by using matrix()
or rbind()
directly. This gives every time a somewhat different structure, but all of them are useful.
Depending on the format you want your output, you can choose one of these possibilities.
精彩评论