开发者

Checking if any of a list of values falls within a table of ranges

开发者 https://www.devze.com 2022-12-28 04:01 出处:网络
I\'m looking to check whether any of a list of integers fall in a list of ranges.The ranges are defined in a table defined something like:

I'm looking to check whether any of a list of integers fall in a list of ranges. The ranges are defined in a table defined something like:

#   Extra   Type    Field       Default Null    Key 
0           int(11) rangeid     0       NO      PRI 
1           int(11) max         0       NO      MUL 
2           int(11) min         0       NO      MUL 

Using MySQL 5.1 and Perl 5.10.

I can check whether a single value, say 7, is in any of the ranges with a statement like

SELECT 1
  FROM range
  WHERE 7 BETWEEN min AND max

If 7 is in any of those ranges, I get a single row back. If it isn't, no rows are returned.

Now I have a list of, say, 50 of these values, not stored in a table at present. I assemble them using map:

my $value_list = '('
  . ( join ', ', map { int $_ } @values )
  . ')'
  ;

I want to see if any of the items in the list fall inside of any of the ranges, but am not particularly concerned with which number nor which range. I'd like to use a syntax such as:

SELECT 1
  FROM range
  WHERE (1, 2, 3, 4, 5, 6, 7, 42, 309, 10000) BETWEEN min AND max

MySQL kindly chastises me for such syntax:

Operand should contain 1 column(s)

I pinged #mysql who were quite helpful. However, having already written this up by the time they responded and thinking it'd be helpful to fix the answer in a more permanent med开发者_运维问答ium, I figured I'd post the question anyhow. Maybe SO will provide a different solution?


This sounded like an interesting problem. I created a test range table like so:

CREATE TABLE `test_ranges` (
  `rangeid` int(11) NOT NULL,
  `max` int(11) NOT NULL,
  `min` int(11) NOT NULL,
  PRIMARY KEY  (`rangeid`),
  KEY `idx_minmax` (`min`,`max`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

I inserted 50,000 rows into that table, each a range with max-min=10, like so:

mysql> select * from test_ranges limit 2;
+---------+-----+-----+
| rangeid | max | min |
+---------+-----+-----+
|       1 |  15 |   5 | 
|       2 |  20 |  10 | 
+---------+-----+-----+
2 rows in set (0.00 sec)

My perl code to get the ranges that match a list of integers is to create a temporary table to hold the integers, and ask MySQL to do the matching for me:

$DB->do_sql("CREATE TEMPORARY TABLE test_vals ( val int NOT NULL ) ENGINE=InnoDB");
for (12, 345, 394, 1450, 999, 9999, 99999, 999999 ) {
  $DB->do_sql("INSERT INTO test_vals VALUES (?)", $_);
}
$answer = $DB->do_sql("SELECT DISTINCT * from test_vals, test_ranges WHERE val BETWEEN min AND max");

That returns me the correct list. In the mysql client that would look like:

mysql> SELECT DISTINCT * from test_vals, test_ranges WHERE val BETWEEN min AND max;
+-------+---------+--------+-------+
| val   | rangeid | max    | min   |
+-------+---------+--------+-------+
|    12 |       1 |     15 |     5 | 
|    12 |       2 |     20 |    10 | 
|   345 |      67 |    345 |   335 | 
|   345 |      68 |    350 |   340 | 
|   345 |      69 |    355 |   345 | 
|   394 |      77 |    395 |   385 | 
|   394 |      78 |    400 |   390 | 
|  1450 |     288 |   1450 |  1440 | 
|  1450 |     289 |   1455 |  1445 | 
|  1450 |     290 |   1460 |  1450 | 
|   999 |     198 |   1000 |   990 | 
|   999 |     199 |   1005 |   995 | 
|  9999 |    1998 |  10000 |  9990 | 
|  9999 |    1999 |  10005 |  9995 | 
| 99999 |   19998 | 100000 | 99990 | 
| 99999 |   19999 | 100005 | 99995 | 
+-------+---------+--------+-------+
16 rows in set (0.00 sec)

Or, for just the list of matching values:

mysql> SELECT DISTINCT val from test_vals, test_ranges WHERE val BETWEEN min AND max;
+-------+
| val   |
+-------+
|    12 | 
|   345 | 
|   394 | 
|   999 | 
|  1450 | 
|  9999 | 
| 99999 | 
+-------+
7 rows in set (0.00 sec)

MySQL (at least 5.0, which I'm on) claims via EXPLAIN that it does not use an index for the comparison in a normal way. However, it reports "Range checked for each record" which essentially means it does what you'd think: treats the values from the test_vals table as constants and looks them up in the test_ranges table using the index idx_minmax.

mysql> explain SELECT DISTINCT * from test_vals, test_ranges WHERE val BETWEEN min AND max \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_vals
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 8
        Extra: Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_ranges
         type: ALL
possible_keys: idx_minmax
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 48519
        Extra: Range checked for each record (index map: 0x2)
2 rows in set (0.00 sec)

It's quite fast, but I don't know how many more rows you'll have than the 8 and 50K that I tested with. My guess is that creating a temporary table like this would be the optimal solution if you have more than a small handful of values you're looking up.


You can construct an SQL query in Perl that will work with multiple values as follows:

sub check_range {
    'SELECT 1 FROM range WHERE ' .
        join ' OR ' =>
        map "($_ BETWEEN min AND max)" => @_
}

print check_range( 1, 2, 3, 4, 5, 6, 7, 42, 309, 10000 ), "\n";

> SELECT 1 FROM range WHERE (1 BETWEEN min AND max) OR (2 BETWEEN min AND max)
> OR (3 BETWEEN min AND max) OR (4 BETWEEN min AND max) ...


To be perfectly honest, if the list being checked is in single-digits size, i'd either loop through checking one-by-one in Perl (the check being your query), or if you are worried about connection/query start overhead, populate them into a temp table and loop over it in the SQL loop, pulling out 1 cvalue at a time into a variable, deleting that value from temp table and running - again - your own one-check query on that variable, inside the loop.

Here's Sybase code - hopefully it translates to MySQL easily

-- previously, CREATE TABLE #your_temp_table (num int)
CREATE TABLE #in_range (num int)
DECLARE @seven int -- This is a JOKE! NEVER use a variable name like that!!!
WHILE (exists (select 1 from #your_temp_table)) 
BEGIN
    SELECT @seven = min(num) from #your_temp_table
    DELETE #your_temp_table WHERE num = @seven
    INSERT #in_range
        SELECT @seven
        FROM range
        WHERE @seven BETWEEN min AND max
END
SELECT num from #in_range
DROP TABLE #in_range

I have a feeling this could be done a lot more elegantly but this at least works in the abscence of a better solution :)

0

精彩评论

暂无评论...
验证码 换一张
取 消