开发者

R: transforming "short form" data to "long form" data without for loops?

开发者 https://www.devze.com 2023-03-02 20:34 出处:网络
Suppose I have an R dataframe like this: Subject SessionProperty.A Property.B Property.C 11001 -1.22527548 -0.9193751 -1.7501693

Suppose I have an R dataframe like this:

  Subject Session  Property.A Property.B Property.C
1     100       1 -1.22527548 -0.9193751 -1.7501693
2     100      10  2.30627980  1.8940830 -0.8443976
3     100       2  2.33243332 -0.5860868 -4.2074489
4     100       3  0.38130810 -0.7336206  4.8016230
5     100       4  1.44685875  0.5066249  2.0138624
6     100       5  0.08907721 -0.3715202  1.4983700

I have heard this style of data frame referred to as "short form" or "wide form". Now suppose I want to make it look like this, which I have heard called "long form":

  Subject Session  Property    Value
1     100       1         A   -1.2252754
2     100       1         B   -0.9193751
3     100       1         C   -1.7501693
4     100       2         A    2.3324333
5     100       2         B   -0.5860868
6     100       2         C   -4.2074489

That is, I have N columns that I want to reduce to just two "name/value" columns, with any other columns in the dataframe extended with repeated values as necessary.

Obviously I could perform this conversion with a bunch of for loops, but that seems really ugly, and it would be a pain to maintain if/when I add m开发者_运维知识库ore property columns.

Is there a way to do this in R with just a few lines of code? Some magic combination of functions I haven't discovered yet?


Use the melt function in package reshape2:

library(reshape2)
dat.m <- melt(dat, id.vars = c("Subject", "Session"))

If you need to clean up the column names and/or values for the variable column:

#change "variable" to "Property"
names(dat.m)[3] <- "Property"
#Drop "Property." from the column values
dat.m$Property <- gsub("Property\\.", "", dat.m$Property)


I like using plyr functions, but the reshape function from base is quite powerful, as illustrated by the solution below.

# create a dummy data frame
dat = data.frame(
  subject = rep(100, 5),
  session = sample(5, 10, replace = T),
  property.a = rnorm(5),
  property.b = rnorm(5),
  property.c = rnorm(5)
)

# convert wide to long, varying columns are 3:5, separator is "."
dat.long = reshape(dat, direction = 'long', varying = 3:5, sep = ".")


reshape package for this is great for this but... a bunch of loops is not the alternative.

perhaps this example is instructive...

longDF <- lapply( 3:4, function(x) cbind(wideDF[1:2], p = names(wideDF)[x], wideDF[x]) )
longDF <- rbind( longDF )

or this one

longDF <- cbind( rep(wideDF[1], 3), rep(wideDF[2], 3), c(wideDF[3], wideDF[4], wideDF[5]) )
0

精彩评论

暂无评论...
验证码 换一张
取 消