开发者

Comparing vectors

开发者 https://www.devze.com 2023-03-06 16:12 出处:网络
I am new to R and am trying to find a better solution for accomplishing this fairly simple task efficiently.

I am new to R and am trying to find a better solution for accomplishing this fairly simple task efficiently.

I have a data.frame M with 100,000 lines (and many columns, out of which 2 columns are relevant to this problem, I'll call it M1, M2). I have another data.frame where column V1 with about 10,000 elements is essential to this task. My task is this:

For each of the element in V1, find where does it occur in M2 and pull out the corresponding M1. I am able to do this using for-loop and it is terribly slow! I am used to Matlab and Perl and this is taking for EVER in R! Surely there's a better way. I would appreciate any valuable suggestions in accomplishing this task...

for (x in c(1:length(V$V1)) {  
    start[x] = M$M1[M$M2 == V$V1[x]]  
}  

There is only 1 element 开发者_C百科that will match, and so I can use the logical statement to directly get the element in start vector. How can I vectorize this?

Thank you!


Here is another solution using the same example by @aix.

M[match(V$V1, M$M2),]

To benchmark performance, we can use the R package rbenchmark.

library(rbenchmark)
f_ramnath = function() M[match(V$V1, M$M2),]
f_aix = function() merge(V, M, by.x='V1', by.y='M2', sort=F)
f_chase = function() M[M$M2 %in% V$V1,] # modified to return full data frame

benchmark(f_ramnath(), f_aix(), f_chase(), replications = 10000)
     test replications elapsed relative
2     f_aix()        10000  12.907 7.068456
3   f_chase()        10000   2.010 1.100767
1 f_ramnath()        10000   1.826 1.000000


Another option is to use the %in% operator:

> set.seed(1)
> M <- data.frame(M1 = sample(1:20, 15, FALSE), M2 = sample(1:20, 15, FALSE))
> V <- data.frame(V1 = sample(1:20, 10, FALSE))
> M$M1[M$M2 %in% V$V1]
[1]  6  8 11  9 19  1  3  5


Sounds like you're looking for merge:

> M <- data.frame(M1=c(1,2,3,4,10,3,15), M2=c(15,6,7,8,-1,12,5))
> V <- data.frame(V1=c(-1,12,5,7))
> merge(V, M, by.x='V1', by.y='M2', sort=F)
  V1 M1
1 -1 10
2 12  3
3  5 15
4  7  3

If V$V1 might contain values not present in M$M2, you may want to specify all.x=T. This will fill in the missing values with NAs instead of omitting them from the result.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号