开发者

Select rows with minimum difference

开发者 https://www.devze.com 2023-03-07 06:28 出处:网络
I\'m pretty strong with SQL, but I can\'t think of good solution to this \"look-alike\" data analysis problem:

I'm pretty strong with SQL, but I can't think of good solution to this "look-alike" data analysis problem:

Given a table with a set of integers, I need to match each integer with the integer in a second table that is most similar (smallest absolute difference). Normally I'd do a Cartesian join and order by the difference in numbers, but I need to only get one pairing for each row from each table so no value from either table can be used twice.

Any idea how to accomplish this?

EDIT: Example:

TABLE_A

34
57
108
219
380
5000

TABLE_B

4000
200
400
99
40
33
34
4600

The pairing would be one row from table_a and the closest row from table_b:

RESULT

34,34
57,40
108,99
219,200
380,400
5000,4600

So no row from either table appears twice.

EDIT: more clarification: I'm trying to solve this problem where given 1 row from table_a, we find the 1 row from table_b that's closest. That becomes a pair and is removed. Then take the next row from table_a and repeat. So we're trying to find the best match for each row and optimiz开发者_StackOverflow社区e that pairing, not trying to optimize total differences.


Assuming

where given 1 row from table_a, we find the 1 row from table_b that's closest

select
   *
from
   TABLE_A a
   cross apply
   (select top 1 Number from TABLE_B b order by abs(b.Number - a.Number)) b2

This also assume rows in b can be repeated: try it and see if it does what you want. However, this should fit your sample data so it would answer your question...


select v.*
from

   (select a.value as avalue, b.value as bvalue,
   (abs(a.value - b.value)) as difference 
   from 
   TABLE_A a,
   TABLE_B b) v,

   (select a.value as avalue, b.value as bvalue,
   min((abs(a.value - b.value))) as difference 
   from 
   TABLE_A a,
   TABLE_B b
   group by a.value, b.value) m

where m.avalue = v.avalue and m.bvalue = v.value and m.difference = v.difference


You will probably need to use a cursor to handle this. Copy the data from each table to their own temp table and apply your logic one row at a time.

What makes this difficult, if not impossible without a cursor, is the fact that the order in which you handle each number from the first table will affect the end result.

If your first table looks like this

9
10

And your second table looks like this

5
6

Then your result will look like this if you process the 9 first

9,6
10,5

And the result would look like this if you processed the 10 first

10,6
9,5
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号