Input
row.no column开发者_高级运维2 column3 column4
1 bb ee up
2 bb ee down
3 bb ee up
4 bb yy down
5 bb zz up
I have a rule to remove row 1 and 2 and 3, as while column2 and column3 for row 1, 2 and 3 are the same, contradictory data (up
and down
) are found in column 4.
How can I ask R to remove those rows with same name in column2 and column3 but contracting column 3 to result a matrix as follows:
row.no column2 column3 column4
4 bb yy down
5 bb zz up
The functions in package plyr
really shine at this type of problem. Here is a solution using two lines of code.
Set up the data (kindly provided by @GavinSimpson)
dat <- structure(list(row.no = 1:5, column2 = structure(c(1L, 1L, 1L,
1L, 1L), .Label = "bb", class = "factor"), column3 = structure(c(1L,
1L, 1L, 2L, 3L), .Label = c("ee", "yy", "zz"), class = "factor"),
column4 = structure(c(2L, 1L, 2L, 1L, 2L), .Label = c("down",
"up"), class = "factor")), .Names = c("row.no", "column2",
"column3", "column4"), class = "data.frame", row.names = c(NA,
-5L))
Load the plyr
package
library(plyr)
Use ddply
to split, analyse and combine dat. The following line of code analyses splits dat into unique combination of (column2 and column3) separately. I then add a column called unique, which calculates the number of unique values of column4 for each set. Finally, use a simple subsetting to return only those lines where unique==1, and drop column 5.
df <- ddply(dat, .(column2, column3), transform,
row.no=row.no, unique=length(unique(column4)))
df[df$unique==1, -5]
And the results:
row.no column2 column3 column4
4 4 bb yy down
5 5 bb zz up
Here is one potential, if somewhat inelegant, solution
out <- with(dat, split(dat, interaction(column2, column3)))
out <- lapply(out, function(x) if(NROW(x) > 1) {NULL} else {data.frame(x)})
out <- out[!sapply(out, is.null)]
do.call(rbind, out)
Which gives:
> do.call(rbind, out)
row.no column2 column3 column4
bb.yy 4 bb yy down
bb.zz 5 bb zz up
Some explanation, line by line:
- Line 1: splits the data into a list, each component of which is a data frame with rows corresponding to groups formed by unique combinations of
column2
andcolumn3
. - Line 2: iterate over the result from Line 1; if there are more than 1 row in data frame, return NULL, if not return the 1-row data frame.
- Line 3: iterate over the output from Line 2; return only non-NULL components
- Line 4: need to bind, row-wise, the output from Line 3, which we arrange via
do.call()
This can be simplified to two lines, combining Lines 1-3 into a single line:
out <- lapply(with(dat, split(dat, interaction(column2, column3))),
function(x) if(NROW(x) > 1) {NULL} else {data.frame(x)})
do.call(rbind, out[!sapply(out, is.null)])
The above was all done with:
dat <- structure(list(row.no = 1:5, column2 = structure(c(1L, 1L, 1L,
1L, 1L), .Label = "bb", class = "factor"), column3 = structure(c(1L,
1L, 1L, 2L, 3L), .Label = c("ee", "yy", "zz"), class = "factor"),
column4 = structure(c(2L, 1L, 2L, 1L, 2L), .Label = c("down",
"up"), class = "factor")), .Names = c("row.no", "column2",
"column3", "column4"), class = "data.frame", row.names = c(NA,
-5L))
Gavin keeps raising the bar on the quality of answers. Here's my attempt.
# This is one way of importing the data into R
sally <- textConnection("row.no column2 column3 column4
1 bb ee up
2 bb ee down
3 bb ee up
4 bb yy down
5 bb zz up")
sally <- read.table(sally, header = TRUE)
# Order the data frame to make rle work its magic
sally <- sally[order(sally$column3, sally$column4), ]
# Find which values are repeating
sally.rle2 <- rle(as.character(sally$column2))
sally.rle3 <- rle(as.character(sally$column3))
sally.rle4 <- rle(as.character(sally$oclumn4))
sally.can.wait2 <- sally.rle2$values[which(sally.rle3$lengths != 1)]
sally.can.wait3 <- sally.rle3$values[which(sally.rle3$lengths != 1)]
sally.can.wait4 <- sally.rle4$values[which(sally.rle4$lengths != 1)]
# Find which lines have values that are repeating
dup <- c(which(sally$column2 == sally.can.wait2),
which(sally$column3 == sally.can.wait3),
which(sally$column4 == sally.can.wait4))
dup <- dup[duplicated(dup)]
# Display the lines that have no repeating values
sally[-dup, ]
You can try one of the following two methods. Suppose the table is called 'table1'.
Method 1
repeated_rows = c();
for (i in 1:(nrow(table1)-1)){
for (j in (i+1):nrow(table1)){
if (sum((table1[i,2:3] == table1[j,2:3])) == 2){
repeated_rows = c(repeated_rows, i, j)
}
}
}
repeated_rows = unique(repeated_rows)
table1[-repeated_rows,]
Method 2
duplicates = duplicated(table1[,2:3])
for (i in 1:length(duplicates)){
if (duplicates[i] == TRUE){
for (j in 1:nrow(table1)){
if (sum(table1[i,2:3] == table1[j,2:3]) == 2){
duplicates[j] = TRUE;
}
}
}
}
table1[!duplicates,]
精彩评论