I have an R data frame:
> tab1
pat t conc
1 P1 0 788
2 P1 5 720
3 P1 10 655
4 P2 0 644
5 P2 5 589
6 P2 10 544
I am trying to create a new column for conc
as a percentage of conc
at t
=0 for each patient. As well as many other things, I have tried:
tab1$conct0 <- tab1$conc / tab1$conc[tab1$t == 0 & tab1$pat == tab1$pat]
But I am clearly miles off with the correct code that means "conc WHERE t==0 AND pat == pat for this particular row"
I am sure I could use a for loop or something but hoped ther开发者_开发百科e was something easier?
Thanks
With plyr:
library(plyr)
ddply(tab1, "pat", transform, conct0 = conc / conc[t == 0])
I would find the starting concentration for each patient with:
startConc <- tab1[tab1$t == 0,]
which gives (from your example data)
pat t conc
1 P1 0 788
4 P2 0 644
After that you can use apply
newconc <- apply(tab1, 1, function(x){as.numeric(x[3])/startConc[startConc$pat==x[1],3]})
which gives you
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205
A slightly makeshift way to do it, but works in this case:
xt <- xtabs(conc~t+pat,tab1)
tab1$conct0 <- as.numeric(t(t(xt)/xt[1,])) # need to use transpose because of the way matrix vector indexing works
The xt[1,]
represents the row for t=0
; you could also use xt["0",]
.
Edit
A more robust way:
tabt <- subset(tab1,t==0)
names(tabt)[3] <- "conct0"
tab1 <- merge(tab1,tabt[,c(1,3)])
tab1$conct0 <- tab1$conc/tab1$conct0
I would use tapply
. Given your data:
tab1 <- data.frame(
pat = c(rep("P1", 3), rep("P2", 3)),
t = c(0, 5, 10, 0, 5, 10),
conc = c(788, 720, 655, 644, 589, 544))
this one-liner will do it for you in the way you are hinting at in your post:
> tab1$conc / tab1$conc[tab1$t == 0][tapply(tab1$pat, tab1$pat)]
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205
The tapply
without any function creates an row index matching patient id (number) for each row. I find this method rather fast and useful. But that assumes your patient ids' are ordered. If that is an issue, we can make sure they fit the patient id order:
> tab1$conc / tab1$conc[tab1$t == 0][order(unique(tab1$pat))][tapply(tab1$pat, tab1$pat)]
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205
If you are using this often I would write a function for it, i.e. like this:
myFract <- function(obj, x = "conc", id = "pat", time = "t", start = NULL) {
if (is.null(start)) start <- min(obj[, time])
ii <- which(obj[, time] == start)
ii <- ii[order(unique(obj[, id]))][tapply(obj[, id], obj[, id])]
obj[, x] / obj[ii, x]
}
Such that:
> myFract(tab1)
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205
If you can safely assume that your concentration doesn't rise over time then the shortest and fastest calculating answer for this is...
tab1$concp <- ave(tab1$conc, tab1$pat, FUN = function(x) x/max(x))
精彩评论