开发者

Create a new column in data.frame using conditions of each row

开发者 https://www.devze.com 2023-01-12 12:55 出处:网络
I have an R data frame: > tab1 patt conc 1P10788 2P15720 3P1 10655 4P20644 5P25589 6P2 10544 I am trying to create a new column for conc as a percentage of conc at t=0 for each patient. As well

I have an R data frame:

> tab1
  pat  t conc
1  P1  0  788
2  P1  5  720
3  P1 10  655
4  P2  0  644
5  P2  5  589
6  P2 10  544

I am trying to create a new column for conc as a percentage of conc at t=0 for each patient. As well as many other things, I have tried:

tab1$conct0 <- tab1$conc / tab1$conc[tab1$t == 0  & tab1$pat == tab1$pat]

But I am clearly miles off with the correct code that means "conc WHERE t==0 AND pat == pat for this particular row"

I am sure I could use a for loop or something but hoped ther开发者_开发百科e was something easier?

Thanks


With plyr:

library(plyr)
ddply(tab1, "pat", transform, conct0 = conc / conc[t == 0])


I would find the starting concentration for each patient with:

startConc <- tab1[tab1$t == 0,]

which gives (from your example data)

  pat t conc
1  P1 0  788
4  P2 0  644

After that you can use apply

newconc <- apply(tab1, 1, function(x){as.numeric(x[3])/startConc[startConc$pat==x[1],3]})

which gives you

[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205


A slightly makeshift way to do it, but works in this case:

xt <- xtabs(conc~t+pat,tab1)
tab1$conct0 <- as.numeric(t(t(xt)/xt[1,])) # need to use transpose because of the way matrix vector indexing works

The xt[1,] represents the row for t=0; you could also use xt["0",].

Edit

A more robust way:

tabt <- subset(tab1,t==0)
names(tabt)[3] <- "conct0"
tab1 <- merge(tab1,tabt[,c(1,3)])
tab1$conct0 <- tab1$conc/tab1$conct0


I would use tapply. Given your data:

tab1 <- data.frame(
    pat = c(rep("P1", 3), rep("P2", 3)),
    t = c(0, 5, 10, 0, 5, 10),
    conc = c(788, 720, 655, 644, 589, 544))

this one-liner will do it for you in the way you are hinting at in your post:

> tab1$conc / tab1$conc[tab1$t == 0][tapply(tab1$pat, tab1$pat)]
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205

The tapply without any function creates an row index matching patient id (number) for each row. I find this method rather fast and useful. But that assumes your patient ids' are ordered. If that is an issue, we can make sure they fit the patient id order:

> tab1$conc / tab1$conc[tab1$t == 0][order(unique(tab1$pat))][tapply(tab1$pat, tab1$pat)]
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205

If you are using this often I would write a function for it, i.e. like this:

myFract <- function(obj, x = "conc", id = "pat", time = "t", start = NULL) {
    if (is.null(start)) start <- min(obj[, time])
    ii <- which(obj[, time] == start)
    ii <- ii[order(unique(obj[, id]))][tapply(obj[, id], obj[, id])]
    obj[, x] / obj[ii, x]
}

Such that:

> myFract(tab1)
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205


If you can safely assume that your concentration doesn't rise over time then the shortest and fastest calculating answer for this is...

tab1$concp <- ave(tab1$conc, tab1$pat, FUN = function(x) x/max(x))
0

精彩评论

暂无评论...
验证码 换一张
取 消