开发者

Finding the minimum difference between each element of one vector and another vector

开发者 https://www.devze.com 2022-12-09 18:08 出处:网络
I have two vectors of integers, and for each element of the second vector I want to fi开发者_JAVA百科nd the minumum distance to any element of the first vector - for example

I have two vectors of integers, and for each element of the second vector I want to fi开发者_JAVA百科nd the minumum distance to any element of the first vector - for example

obj1 <- seq(0, 1000, length.out=11)
obj2 <- 30:50
min_diff <- sapply(obj2, function(x) min(abs(obj1-x)))
min_diff

returns

[1] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Is there a more efficient way? I want to scale this up to thousands (millions?) of both obj1 & obj2.

Thanks, Aaron


I would use a step function sorted on the first vector. This will avoid loops and is pretty fast in R.

x <- rnorm(1000)
y <- rnorm(1000)
sorted.x <- sort(x)
myfun <- stepfun(sorted.x, 0:length(x))

Now myfun(1) will give you the index of the largest element of sorted.x whose value is less than 1. In my case,

> myfun(1)  
[1] 842
> sorted.x[842]
[1] 0.997574
> sorted.x[843]
[1] 1.014771

So you know that the closest element is either sorted.x[myfun(1)] or sorted.x[myfun(1) + 1]. Consequently (and padding for 0),

indices <- pmin(pmax(1, myfun(y)), length(sorted.x) - 1)
mindist <- pmin(abs(y - sorted.x[indices]), abs(y - sorted.x[indices + 1]))


start by sorting obj1

then you can do a binary search in obj1 for each element of obj2. knowing where the element would be, you can compare the distance to the two nearby elements of obj1, giving you the minimum distance.

runtime (where n1 = |obj1| and n2 = |obj2|): (n1 + n2) log (n1)


To reply to J. Chang: Thank you for providing powerfull and fast solution.

For my use of your 4 lines of code, I need the minimum difference found (to check if it's acceptable):

min_diff.val = min(mindist)

And here was a 'stupid' mistake I just notice after years. I wanted to pick up the value which have the less difference... I bet you can see it coming.

min_diff.idx = which.min(mindist)
sorted.x[indices[min_diff.idx]]

As mindist check sorted.x[indices] and sorted.x[indices+1] at the same time, they could be a confusion for idx since both comparisons have the same index


Example: if indices is 55,56
mindist will check 55,56 (indices) and 56,57 (indices+1)
if mindist found the min is 56 but within indices+1 (56),57
is there a way to save the right index ?

0

精彩评论

暂无评论...
验证码 换一张
取 消