Speed up MySQL query containing 300k+ records_问答_开发者

I need to lookup all my products (sku's) their latest stock quantity.

I have one table (called "stock") with 315k+ records containing this information (a new batch of data is added every day, for most sku's). The reference data is in another table (called "stockfile").

This is the query to do it:

SELECT s1 . * , f1 . *
FROM stock s1
JOIN stockfile f1 ON ( s1.stockfileid = f1.stockfileid )
LEFT OUTER JOIN ( stock s2
JOIN stockfile f2 ON ( s2.stockfileid = f2.stockfileid )
) ON ( s1.sku = s2.sku
AND ( f1.date < f2.date
OR f1.date = f2.date
AND f1.stockfileid < f2.stockfileid) )
WHERE s2.sku IS NULL

These are the table definitions

SHOW CREATE TABLE stock:

CREATE TABLE `stock` (
 `stockid` bigint(20) NOT NULL AUTO_INCREMENT,
 `sku` char(25) NOT NULL,
 `quantity` int(5) NOT NULL,
 `creationdate` datetime NOT NULL,
 `stockfileid` smallint(5) unsigned NOT NULL,
 `touchdate` datetime NOT NULL,
 PRIMARY KEY (`stockid`),
 KEY `stock_sku` (`sku`),
 KEY `stock_stockfileid` (`stockfileid`)
) ENGINE=MyISAM AUTO_INCREMENT=316039 DEFAULT CHARSET=latin1

SHOW CREATE TABLE stockfile:

CREATE TABLE `stockfile` (
 `stockfileid` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
 `filename` varchar(25) NOT NULL,
 `creationdate` datetime DEFAULT NULL,
 `touchdate` datetime DEFAULT NULL,
 `date` datetime DEFAULT NULL,
 `begindate` datetime DEFAULT NULL,
 `enddate` datetime DEFAULT NULL,
 PRIMARY KEY (`stockfileid`),
 KEY `stockfile_date` (`date`)
) ENGINE=MyISAM AUTO_INCREMENT=266 DEFAULT CHARSET=latin1

Without any extra indexes it takes... forever. I added these and it sped up to about 250 seconds:

CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);

This is the EXPLAIN on the original query, with these indexes.

id  select_type  table   type    possible_keys               key        key_len  ref                     rows   Extra
1   SIMPLE 开发者_开发问答      s1      ALL    stock_stockfileid           NULL       NULL     NULL                    316038
1   SIMPLE       f1      eq_ref  PRIMARY                     PRIMARY    2        kompare.s1.stockfileid  1     
1   SIMPLE       s2      ref    stock_sku,stock_stockfileid stock_sku  25       kompare.s1.sku          12     Using where
1   SIMPLE       f2      eq_ref  PRIMARY,stockfile_date      PRIMARY    2        kompare.s2.stockfileid  1

Is there another way to speed things up?

Thanks to Bill Karwin for solving the original query!

I'm not sure I got your query right, but if it's safe to suppose that maximal date has also a maximal stockfileid (like your OR condition half-suggests) maybe something like this query would be of help:

SELECT s1.*, f1.*
 FROM
  stock s1 JOIN stockfile f1 USING (stockfileid)
  JOIN (
   SELECT sku, max(date) AS maxdate, max(stockfileid) AS maxfileid
   FROM stock JOIN stockfile USING (stockfileid)
   GROUP BY sku
  ) AS dfi ON (s1.sku,f1.date,f1.stockfileid)=(dfi.sku,maxdate,maxfileid);

Not sure whether this is what you want and whether it's faster, but it should be. On the other hand, you don't need to take date into account at all, if fileid has it all. Anyway, I think this kind prefiltering may help as a starting point.

The default values in my.cnf typically are set for systems with VERY little memory by today's standards. If you are using those default values, that may be the single best place to look for performance gains. Ensure you are allocating all of the memory you can spare to MySQL.

mysqltuner can make good starting recommendations for allocating memory between the various parts of MySQL that can use it.

If you created your indices before adding most of the data, you may see vast improvement by performing ANALYZE TABLE on your tables. I saw one query drop from 24 seconds to 1 second just by doing that.

Your EXPLAIN indicates that MySQL is doing a table scan to satisfy WHERE s2.sku IS NULL prior to narrowing the search. That's very expensive.

f1.date < f2.date
OR f1.date = f2.date

should be able to be re-written as

f1.date <= f2.date

though I doubt that matters to the optimizer.

Could you explain in plain English what you are trying to do with the query? That might help shed light on how it could be simplified.

Normalization of the data would go far in speeding up the queries, furthermore, if you are running on a slow machine this will adversely affect the speed of in which your results will be returned. Show me an example query to this table then I can better understand what your attempt is on that angle.

I'm not sure if this is something you could do with your app, but instead of computing the quantity for each sku every single time you run the query, it would be more efficient to store the sku and quantity in a separate table and then just update the data whenever a new stockfile is received. That way you incur the cost of calculating this once per scorefile and not once per query. It's a bit of an upfront cost to compute this but it saves you a lot down the line.