开发者

Are there any command similar to colsplit but split the cell to the same column but different row?

开发者 https://www.devze.com 2023-02-22 06:48 出处:网络
I have a table with following lines df <- data.frame(Time=c(1,3),date=c(23,12), people=c(\"Apple&June&Peter\",\"Apple&May&Mary\"),stringsAsFactors=FALSE)

I have a table with following lines

df <- data.frame(Time=c(1,3),date=c(23,12),
       people=c("Apple&June&Peter","Apple&May&Mary"),stringsAsFactors=FALSE)

Time date people 
1    23   Apple&June&Peter
3    12   App开发者_运维技巧le&May&Mary

I need to separate them into different rows:

Time date people
1    23   Apple
1    23   June
1    23   Peter
3    12   Apple
3    12   May
3    12   Mary

I know reshape + colsplit can be used to split the people column into different column on the same row.

How about row? How can I split them into different row but same column?


A base way of doing this, using strsplit :

as.data.frame(
  t(
    do.call(cbind,
      lapply(1:nrow(df),function(x){
        sapply(unlist(strsplit(df[x,3],"&")),c,df[x,1:2],USE.NAMES=FALSE)
      })
    )
  )
)

     V1 Time date
1 Apple    1   23
2  June    1   23
3 Peter    1   23
4 Apple    3   12
5   May    3   12
6  Mary    3   12


df <- data.frame(Time=c(1,3),date=c(23,12),
           people=c("Apple&June&Peter","Apple&May&Mary"),stringsAsFactors=FALSE)
long.people=strsplit(df$people,"&")
el.len=sapply(long.people,length)
new.df=data.frame(Time=rep(df$Time,el.len),date=rep(df$date,el.len),people=unlist(long.people))    
new.df
      Time date people
    1    1   23  Apple
    2    1   23   June
    3    1   23  Peter
    4    3   12  Apple
    5    3   12    May
    6    3   12   Mary


A variation on the reshape solution, using stringr for more convenient splitting of the names strings.

library(reshape)
library(stringr)

wide_df <- cbind(df[, 1:2], str_split_fixed(df[, 3], "&", 3))
long_df <- melt(wide_df, id.vars = c("Time", "date"))
long_df$variable <- NULL
names(long_df)[3] <- "people"
long_df


You can use colsplit and then reshape the resulting data.frame back to long form, then just drop the ID column that the reshape creates:

library(reshape)
df <- data.frame(time=c(1,3),date=c(23,12),people=c("Apple&June&Peter","Apple&May&Mary"))
pnames <- paste("people",seq(3),sep=".")
df.new <- cbind(df[,seq(2)],colsplit(df$people,"&",pnames))
df.new <- reshape(df.new,varying=pnames,direction="long")
df.new <- subset(df.new,select=c(-id))

df.new
    time date people
1.1    1   23  Apple
2.1    1   12  Apple
1.2    2   23   June
2.2    2   12    May
1.3    3   23  Peter
2.3    3   12   Mary
0

精彩评论

暂无评论...
验证码 换一张
取 消