dx <- data.frame(CMPD = c("cmpd1","cmpd1","cmpd1","cmpd1","cmpd2","cmpd2",
"cmpd2","cmpd2","cmpd3","cmp开发者_C百科d3","cmpd3","cmpd3"),
MRM = c("309.0/121.1","309.0/121.1","309.0/90.1",
"309.0/90.1","305.2/140.3","305.2/140.3","300.5/107.3",
"300.5/107.3","404.8/126.0","404.8/126.0","401.5/91.0",
"401.5/91.0"),
RESP = c(123.4,234.5,345.6,456.7,567.8,678.9,789.0,12.4,
23.5,34.6,45.7,56.8))
-
>dx
CMPD MRM RESP
1 cmpd1 309.0/121.1 123.4
2 cmpd1 309.0/121.1 234.5
3 cmpd1 309.0/90.1 345.6
4 cmpd1 309.0/90.1 456.7
5 cmpd2 305.2/140.3 567.8
6 cmpd2 305.2/140.3 678.9
7 cmpd2 300.5/107.3 789.0
8 cmpd2 300.5/107.3 12.4
9 cmpd3 404.8/126.0 23.5
10 cmpd3 404.8/126.0 34.6
11 cmpd3 401.5/91.0 45.7
12 cmpd3 401.5/91.0 56.8
I would like to be able to work with this data based on the uniqueness of the combination of CMPD
and MRM
(e.g. rows 1, 2 then rows 3, 4 etc.)
Let me introduce you to my friend, the package plyr
.
This package makes it easy to use a generic strategy of splitting,applying and combining data. One of the most useful functions is ddply
which takes a data frame as input and reduces a data frame as output. You specify the unique combinations to split by, as well as the function you want to apply, and ddply
does the rest.
A good place to learn about plyr
is Hadley's website or his article in the Journal of Statistical Software. There are also hundreds of answers about plyr on StackOverflow. Just follow the plyr-tag or the ddply-tag.
Here are some examples:
library(plyr)
To extract the mean:
> ddply(dx, .(CMPD, MRM), numcolwise(mean))
CMPD MRM RESP
1 cmpd1 309.0/121.1 178.95
2 cmpd1 309.0/90.1 401.15
3 cmpd2 300.5/107.3 400.70
4 cmpd2 305.2/140.3 623.35
5 cmpd3 401.5/91.0 51.25
6 cmpd3 404.8/126.0 29.05
Or the sum:
> ddply(dx, .(CMPD, MRM), numcolwise(sum))
CMPD MRM RESP
1 cmpd1 309.0/121.1 357.9
2 cmpd1 309.0/90.1 802.3
3 cmpd2 300.5/107.3 801.4
4 cmpd2 305.2/140.3 1246.7
5 cmpd3 401.5/91.0 102.5
6 cmpd3 404.8/126.0 58.1
If you want to process entire subsets of the data frame, the common thing to do is to use ddply
from the plyr
package:
ddply(dx, .(CMPD, MRM), .fun = doStuff)
Alternatives are ave
or by
and aggregate
. For the specific example of calculating the ratio, using summarise
can help a lot:
ddply(dx, .(CMPD, MRM), .fun = summarise, ratio = RESP[1]/RESP[2])
This type of task is commonly referred to as 'split-apply-combine' in the R world.
You can use the by
function
by(dx$RESP, list(CMPD = dx$CMPD, MRM = dx$MRM), mean)
It returns a by
object which is not necessarily easy to "work with", but it is possible.
精彩评论