开发者

How to select rows based on 2 columns?

开发者 https://www.devze.com 2023-03-30 04:44 出处:网络
dx <- data.frame(CMPD = c(\"cmpd1\",\"cmpd1\",\"cmpd1\",\"cmpd1\",\"cmpd2\",\"cmpd2\", \"cmpd2\",\"cmpd2\",\"cmpd3\",\"cmp开发者_C百科d3\",\"cmpd3\",\"cmpd3\"),
dx <- data.frame(CMPD = c("cmpd1","cmpd1","cmpd1","cmpd1","cmpd2","cmpd2",
                          "cmpd2","cmpd2","cmpd3","cmp开发者_C百科d3","cmpd3","cmpd3"),
                 MRM = c("309.0/121.1","309.0/121.1","309.0/90.1",
                         "309.0/90.1","305.2/140.3","305.2/140.3","300.5/107.3",
                         "300.5/107.3","404.8/126.0","404.8/126.0","401.5/91.0",
                         "401.5/91.0"),
                 RESP = c(123.4,234.5,345.6,456.7,567.8,678.9,789.0,12.4,
                          23.5,34.6,45.7,56.8))

-

>dx

CMPD         MRM  RESP

1  cmpd1 309.0/121.1 123.4
2  cmpd1 309.0/121.1 234.5
3  cmpd1  309.0/90.1 345.6
4  cmpd1  309.0/90.1 456.7
5  cmpd2 305.2/140.3 567.8
6  cmpd2 305.2/140.3 678.9
7  cmpd2 300.5/107.3 789.0
8  cmpd2 300.5/107.3  12.4
9  cmpd3 404.8/126.0  23.5
10 cmpd3 404.8/126.0  34.6
11 cmpd3  401.5/91.0  45.7
12 cmpd3  401.5/91.0  56.8

I would like to be able to work with this data based on the uniqueness of the combination of CMPD and MRM (e.g. rows 1, 2 then rows 3, 4 etc.)


Let me introduce you to my friend, the package plyr.

This package makes it easy to use a generic strategy of splitting,applying and combining data. One of the most useful functions is ddply which takes a data frame as input and reduces a data frame as output. You specify the unique combinations to split by, as well as the function you want to apply, and ddply does the rest.

A good place to learn about plyr is Hadley's website or his article in the Journal of Statistical Software. There are also hundreds of answers about plyr on StackOverflow. Just follow the plyr-tag or the ddply-tag.

Here are some examples:

library(plyr)

To extract the mean:

> ddply(dx, .(CMPD, MRM), numcolwise(mean))
   CMPD         MRM   RESP
1 cmpd1 309.0/121.1 178.95
2 cmpd1  309.0/90.1 401.15
3 cmpd2 300.5/107.3 400.70
4 cmpd2 305.2/140.3 623.35
5 cmpd3  401.5/91.0  51.25
6 cmpd3 404.8/126.0  29.05

Or the sum:

> ddply(dx, .(CMPD, MRM), numcolwise(sum))
   CMPD         MRM   RESP
1 cmpd1 309.0/121.1  357.9
2 cmpd1  309.0/90.1  802.3
3 cmpd2 300.5/107.3  801.4
4 cmpd2 305.2/140.3 1246.7
5 cmpd3  401.5/91.0  102.5
6 cmpd3 404.8/126.0   58.1


If you want to process entire subsets of the data frame, the common thing to do is to use ddply from the plyr package:

ddply(dx, .(CMPD, MRM), .fun = doStuff)

Alternatives are ave or by and aggregate. For the specific example of calculating the ratio, using summarise can help a lot:

ddply(dx, .(CMPD, MRM), .fun = summarise, ratio = RESP[1]/RESP[2])

This type of task is commonly referred to as 'split-apply-combine' in the R world.


You can use the by function

by(dx$RESP, list(CMPD = dx$CMPD, MRM = dx$MRM), mean)

It returns a by object which is not necessarily easy to "work with", but it is possible.

0

精彩评论

暂无评论...
验证码 换一张
取 消