开发者

Find columns with different values in duplicate rows

开发者 https://www.devze.com 2022-12-07 22:02 出处:网络
I have a data set that has some duplicate records. For those records, most of the column values are the same, but 开发者_StackOverflow社区a few ones are different.

I have a data set that has some duplicate records. For those records, most of the column values are the same, but 开发者_StackOverflow社区a few ones are different.

I need to identify the columns where the values are different, and then subset those columns.

This would be a sample of my dataset:

library(data.table)

dat <- "ID location date status observationID observationRep observationVal latitude longitude setSource
FJX8KL loc1 2018-11-17 open 445 1 17.6 -52.7 -48.2 XF47
FJX8KL loc2 2018-11-17 open 445 2 1.9  -52.7 -48.2 LT12"

dat <- setDT(read.table(textConnection(dat), header=T))

And this is the output I would expect:

   observationRep observationVal setSource
1:              1           17.6      XF47
2:              2            1.9      LT12

One detail is: my original dataset has 189 columns, so I need to check all of them.

How to achieve this?


Two issues, first, use text= argument rather than textConnection, second, use as.data.table, since seDT modifies object in place, but it yet isn't there.

dat1 <- data.table::as.data.table(read.table(text=dat, header=TRUE))
dat1[, c('observationRep', 'observationVal', 'setSource')]
#    observationRep observationVal setSource
# 1:              1           17.6      XF47
# 2:              2            1.9      LT12
0

精彩评论

暂无评论...
验证码 换一张
取 消