Comparing vectors_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-06 16:12 出处：网络

I am new to R and am trying to find a better solution for accomplishing this fairly simple task efficiently.

相关专题：r vector

I am new to R and am trying to find a better solution for accomplishing this fairly simple task efficiently.

I have a data.frame M with 100,000 lines (and many columns, out of which 2 columns are relevant to this problem, I'll call it M1, M2). I have another data.frame where column V1 with about 10,000 elements is essential to this task. My task is this:

For each of the element in V1, find where does it occur in M2 and pull out the corresponding M1. I am able to do this using for-loop and it is terribly slow! I am used to Matlab and Perl and this is taking for EVER in R! Surely there's a better way. I would appreciate any valuable suggestions in accomplishing this task...

for (x in c(1:length(V$V1)) {  
    start[x] = M$M1[M$M2 == V$V1[x]]  
}

There is only 1 element 开发者_C百科that will match, and so I can use the logical statement to directly get the element in start vector. How can I vectorize this?

Thank you!

Here is another solution using the same example by @aix.

M[match(V$V1, M$M2),]

To benchmark performance, we can use the R package rbenchmark.

library(rbenchmark)
f_ramnath = function() M[match(V$V1, M$M2),]
f_aix = function() merge(V, M, by.x='V1', by.y='M2', sort=F)
f_chase = function() M[M$M2 %in% V$V1,] # modified to return full data frame

benchmark(f_ramnath(), f_aix(), f_chase(), replications = 10000)
     test replications elapsed relative
2     f_aix()        10000  12.907 7.068456
3   f_chase()        10000   2.010 1.100767
1 f_ramnath()        10000   1.826 1.000000

Another option is to use the %in% operator:

> set.seed(1)
> M <- data.frame(M1 = sample(1:20, 15, FALSE), M2 = sample(1:20, 15, FALSE))
> V <- data.frame(V1 = sample(1:20, 10, FALSE))
> M$M1[M$M2 %in% V$V1]
[1]  6  8 11  9 19  1  3  5

Sounds like you're looking for merge:

> M <- data.frame(M1=c(1,2,3,4,10,3,15), M2=c(15,6,7,8,-1,12,5))
> V <- data.frame(V1=c(-1,12,5,7))
> merge(V, M, by.x='V1', by.y='M2', sort=F)
  V1 M1
1 -1 10
2 12  3
3  5 15
4  7  3

If V$V1 might contain values not present in M$M2, you may want to specify all.x=T. This will fill in the missing values with NAs instead of omitting them from the result.

Comparing vectors

精彩评论

关注公众号

热门标签

图文推荐

Comparing vectors

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：