I want to add a column to a data frame which will encode the specific levels of a factor. e.g.
subject rate
1 12
1 10
1 13
4 4
4 6
4 12
2 9
2 2
2 5
6 17
6 开发者_开发技巧10
6 1
in the above data frame I wish add a third column called "treatment" where subjects are assigned to one of two levels "a" or "b". e.g. below
subject rate treatment
1 12 a
1 10 a
1 13 a
4 4 b
4 6 b
4 12 b
2 9 b
2 2 b
2 5 b
6 17 a
6 10 a
6 1 a
Thanks in advance for any help.
Here's another approach using the plyr package:
library(plyr)
#Make some fake data
set.seed(1)
dat <- data.frame(subject = rep(c(1,4,2,6), each = 3), rate = sample(1:20, 12, TRUE))
set.seed(1)
#Assign treatment based on the subject ID. This does not ensure that you will get
#at least one subject in each treatment group.
ddply(dat, "subject", transform, treatment = sample(letters[1:2], TRUE))
EDIT - to address your comment
Given that you want to specify which subject gets assigned to which treatment, Gavin's suggestion of merge
is spot on. I would first make a new data.frame that contains one record for each unique subject, assign their treatment, and then merge them together:
treatments <- data.frame(subject = unique(dat$subject), treats = c("a", "b", "b", "a"))
merge(dat, treatments)
Note that the order of unique(dat$subject)
is 1,4,2,6 which corresponds to the order of the values in the original data.frame. If your real problem contains more than four subjects, you may want to consider a more automated way of assigning treatments groups. One approach I've used in the past is to assign a random number to each respondent, and then assign groups based on a given threshold of that random number. It is essentially the same as the approach above, but can ensure that you get equal numbers in each group. For example:
dat <- ddply(dat, "subject", transform, treatment = runif(1))
dat <- within(dat, treatment <- ifelse(treatment < quantile(treatment, 0.5),"a", "b"))
If you want to assign treatments at random, this will do it:
## subject IDs
subj <- with(dat, unique(subject))
## how many treatment levels?
ntreat <- 2
## sample an identifier for the treaments
set.seed(47)
treats <- sample(letters[seq_len(ntreat)], length(subj), replace = TRUE)
## stick this into a subject/treatment data frame
Treat <- data.frame(cbind(subject = subj, treatment = treats))
This gives:
R> Treat
subject treatment
1 1 b
2 4 a
3 2 b
4 6 b
Edit:
If the treatments have been pre-assigned, then just create the Treat
data frame by hand;
Treat <- data.frame(subject = c(1,4,2,6), treatment = c("a","b","b","a"))
If you have lots of these to do you can use functions like seq()
and rep()
, plus the inbuilt letters
constant to speed up the "data entry".
End edit
We can now use this data frame in a merge with the original data to insert the treatment
for the respective subject
, using merge()
:
R> merge(dat, Treat)
subject rate treatment
1 1 12 b
2 1 10 b
3 1 13 b
4 2 9 b
5 2 2 b
6 2 5 b
7 4 4 a
8 4 6 a
9 4 12 a
10 6 17 b
11 6 10 b
12 6 1 b
I will assume you have some key how to transform this data, like for instance 1,6=>a, 4,2=>b.
Then the ifelse
and %in%
mix should do the job:
df$treatment<-factor(ifelse(df$subject%in%c('1','6'),'a','b'))
The more general option is to copy this factor and alter its levels, but the details are dependent on how do you have your dictionary stored. Simple example:
x<-df$subject; levels(x)<-c('a','b','b','a')
x->df$treatment
(In both examples I assume that subject is a factor)
An another approach may be writing a special function to decide the treatment with respect to subject and apply the function on subject to create a new treatment column.
Here is the code:
data <- data.frame(subject = as.numeric(rep(c(1,2,4,6)), each = 4), rate = sample(1:20, 16, TRUE))
cat = function(x){
if (x == 1 || x == 4){return('a')}
else if (x == 2 || x == 6 ) {return('b')}
else { NaN}
}
data$treat = lapply(data$subject, cat)
head(data)
Output:
> head(data)
subject rate treat
1 1 15 a
2 2 20 b
3 4 8 a
4 6 16 b
5 1 19 a
6 2 5 b
精彩评论