开发者

Existing function for seeing if a row exists in a data frame?

开发者 https://www.devze.com 2023-03-03 18:14 出处:网络
Is the开发者_Python百科re an existing function for determining whether a row exists within a data frame?

Is the开发者_Python百科re an existing function for determining whether a row exists within a data frame? I suppose could do an apply/identical, but it seems like I'm missing something.

For example:

given such a data frame:

  a   b
1 1 cat
2 2 dog

Is there an existing function which will allow me to test whether the row (1, cat) exists in the data frame?

Thanks, Zach


Try match_df from plyr (using Marek's sample data):

library(plyr)
X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat")

match_df(X, row_to_find)


For data from @Marek answer.

nrow(merge(row_to_find,X))>0 # TRUE if exists


Taking your example:

X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat") # it has to be data.frame (not a vector) to hold different types

Then

duplicated(rbind(X, row_to_find))[nrow(X)+1]

gives you answer.


I suggest Ben Bolker's solution since nrow(merge(row_to_find,X))>0 solution doesn't work for me (always give TRUE) :

tail(duplicated(rbind(X,row_to_find)),1)>0


For fans of dplyr and the tidyverse, you can use dplyr:anti_join(). According to its documentation, dplyr::anti_join(x, y) "returns all rows from x where there are not matching values in y, keeping just columns from x." Hence for dplyr::anti_join(row, df) the result has zero rows, then row was indeed in df, if it has one row, then row was not in df.

library(dplyr)

df <- tribble(~a, ~b,
              1,  "cat",
              2,  "dog")
#> # A tibble: 2 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  1.00 cat  
#> 2  2.00 dog

row <- tibble(a = 1, b = "cat")
#> # A tibble: 1 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  1.00 cat

nrow(anti_join(row, df)) == 0  # row is in df so should be TRUE
#> Joining, by = c("a", "b")
#> [1] TRUE

row <- tibble(a = 3, b = "horse")
#> # A tibble: 1 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  3.00 horse

nrow(anti_join(row, df)) == 0  # row is not in df so should be FALSE
#> Joining, by = c("a", "b")
#> [1] FALSE


For vector, y, with same number of elements as columns in dataframe, dfrm:

apply(dfrm, 1, function(x) all( x == y) )

Should return a vector of TRUE and FALSE which could in turn be used as an index in [,]

dfrm[ apply(dfrm, 1, function(x) all( x == y) ) , ]

The identical function is probably too stringent, since it will check attributes as well.

> y=c(1,2,3)
> x = data.frame(a=1:10, b=2:11, c=3:12)
> identical(x[1,] , y)
[1] FALSE


Another approach, using base R:

df <- data.frame(a = c(1, 2), b = c("cat", "dog"))
any(df$a == 1 & df$b == "cat")
#> [1] TRUE
0

精彩评论

暂无评论...
验证码 换一张
取 消