I'm writting my first program in R and as a newbie I'm having some troubles, hope you can help me.
I've got a data frame like this:
> v1<-c(1,1,2,3,3,3,4)
> v2<-c(13,5,15,1,2,7,4)
> v3<-c(0,3,6,13,8,23,5)
> v4<-c(26,25,11,2,8,1,0)
> datos<-data.frame(v1,v2,v3,v4)
> names(datos)<-c("Position","a1","a2","a3")
> datos
posicion a1 a2 a3
1 1 13 0 26
开发者_Go百科2 1 5 3 25
3 2 15 6 11
4 3 1 13 2
5 3 2 8 8
6 3 7 23 1
7 4 4 5 0
What I need is to sum the data in a1
, a2
and a3
(in my real case from a1
to a51
) grouped by Position
. I'm trying with the function aggregate()
but it only works for means, not for sums and I don't know why.
Thanks in advance
You need to tell the aggregate function to use sum, as the default is for it to get the mean of each category. For example:
aggregate(datos[,c("a1","a2","a3")], by=list(datos$Position), "sum")
This is fairly straightforward with the plyr
library.
library("plyr")
ddply(datos, .(Position), colwise(sum))
If you have additional non-numeric columns that shouldn't be averaged, you can use
ddply(datos, .(Position), numcolwise(sum))
ag_df <-- aggregate(.~Position,data=datos,sum)
should give you a data frame containing the sums of the "a" values for each of the positions. The trick here is the . in the formula represents a list of all the "non-grouping" variables in the formula.
Note that you can get much the same result with:
sumdf <- rowsum(datos,datos$Position,na.rm=T)
Except that includes the sums of the positions as well!
If you DON'T want all non-group columns aggregated, you can use cbind as in:
sumdf1 <- aggregate(cbind(a1,a3)~datos$Position,datos,sum)
That sums only the a1 and a3 columns.
精彩评论