开发者

fix mysql query to return random row within subgroup

开发者 https://www.devze.com 2023-01-17 09:29 出处:网络
I\'m using the following query to randomly draw one row from the subset, for each ID1-ID2 pair, of records that have the minimum distance in time (YEAR and MMDD fields).

I'm using the following query to randomly draw one row from the subset, for each ID1-ID2 pair, of records that have the minimum distance in time (YEAR and MMDD fields).

CREATE TABLE temp4 AS 
     SELECT * 
      FROM temp3 
  GROUP BY ID1, ID2 
  ORDER BY DATEDIFF( CONCAT(YEAR,'-',LEFT(MMDD,2),'-',RIGHT(MMDD,2)), CONCAT(ID3_YEAR,'-',LEFT(ID3_MMDD,2),'-',RIGHT(ID3_MMDD,2)) ) ASC, RAND() 
     LIMIT 0, 1;

From a previous question I've posted here, this is how the table looks like

ID1 ID2 YEAR  MMDD  ID3 YEAR_ID3  MMDD_ID3
---------------------------------------
1   2   1991  0821  55  1991      0822    
1   2   1991  0821  57  1991      0822    
1   2   1991  0821  88  1992      0101
1   3   1990  0131  89  2000      0202    
1   3   1990  0131  89  2001      0102

FOR EACH ID1-ID2 pair, I need to select the ID3 with

THE MINIMUM DISTANCE IN TERMS OF TIME (both YEAR field and MMDD field, i.e. I need to compare YEAR and MMDD vs. YEAR_ID3 and MMDD_ID3)

IF MORE THAN ONE ID3 SATISFIES THE MINIMUM REQUIREMENT ABOVE (i.e. they both have the same YEAR_ID3 and MMDD_ID3), I NEED TO SELECT ONE RANDOMLY.

IN THE ABOVE EXAMPLE, THE QUERY SHOULD RETURN

1,2,1991,0821,55 (OR 1,2,1991,0821,57 - ACCORDING TO THE RANDOM DRAW)

1,3,1990,0131,89

THE ONE I'VE PASTED ABOVE ONLY RETURNS ONE ROW... :(

SOMEHOW THERE WAS A GREAT SOLUTION BELOW IN THE COMMENTS SECTION POSTED BY OMG... but it disappeared?!?!?

I'm pasting it here

DROP TABLE IF EXISTS temp4;
CREATE TABLE temp4 AS
SELECT x.id1,
       x.id2,
       x.YEAR,
       x.MMDD,
       x.id3,
       x.id3_YEAR,
       x.id3_MMDD
 FROM (SELECT t.*,
               ABS(DATEDIFF(CONCAT(CAST(t.id3_YEAR AS CHAR(4)),'-', LEFT(t.id3_MMDD,2),'-',RIGHT(t.id3_MMDD,2)),
                        CONCAT(CAST(t.YEAR AS CHAR(4)),'-', LEFT(t.MMDD,2),'-',RIGHT(t.MMDD,2))))  AS diff,
               CASE 
                 WHEN @id1 = t.id1 AND @id2 = t.id2 THEN @rownum := @rownum + 1
                 ELSE @rownum := 1
  开发者_如何学Python             END AS rk,
               @id1 := t.id1,
               @id2 := t.id2
          FROM temp3 t
          JOIN (SELECT @rownum := 0, @id1  := 0, @id2 := 0) r
      ORDER BY t.id1, t.id2, diff, RAND()) x
 WHERE x.rk = 1;

I'm pasting here a SQL dump of a test table

DROP TABLE IF EXISTS `temp3`;
CREATE TABLE IF NOT EXISTS `temp3` (
  `id1` char(7) NOT NULL,
  `id2` char(7) NOT NULL,
  `YEAR` year(4) NOT NULL,
  `MMDD` char(4) NOT NULL,
  `id3` char(7) NOT NULL,
  `id3_YEAR` year(4) NOT NULL,
  `id3_MMDD` char(4) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;


INSERT INTO `temp3` VALUES('1', '2', 1992, '0107', '55', 1991, '0528');
INSERT INTO `temp3` VALUES('1', '2', 1992, '0107', '57', 1991, '0701');
INSERT INTO `temp3` VALUES('1', '3', 1992, '0107', '88', 2000, '0101');
INSERT INTO `temp3` VALUES('1', '3', 1992, '0107', '44', 2000, '0101');


The usual way to extract a random record(s) from a query is

SELECT [] FROM table ORDER BY RAND() LIMIT []

0

精彩评论

暂无评论...
验证码 换一张
取 消