开发者

MySQL multiple dependent subqueries, painfully slow

开发者 https://www.devze.com 2022-12-30 15:45 出处:网络
I have a working query that retrieves the data that I need, but unfortunately it is painfully slow (runs over 3 minutes). I have indexes in place, but I think the problem is the multiple dependent sub

I have a working query that retrieves the data that I need, but unfortunately it is painfully slow (runs over 3 minutes). I have indexes in place, but I think the problem is the multiple dependent subqueries. I've been trying to rewrite the query using joins but I can't seem to get it to work. Any help would be greatly appreciated.

The tables:

Basically, I have 2 tables. The first (prices) holds the prices of items in a store. Each row is the price of an item that day, and new rows are added every day with an updated price.

The second table (watches_US) holds the item information (name, description, etc).

CREATE T开发者_如何学PythonABLE `prices` (
`prices_id` int(11) NOT NULL auto_increment,
`prices_locale` enum('CA','DE','FR','JP','UK','US') NOT NULL default 'US',
`prices_watches_ID` char(10) NOT NULL,
`prices_date` datetime NOT NULL,
`prices_am` varchar(10) default NULL,
`prices_new` varchar(10) default NULL,
`prices_used` varchar(10) default NULL,
PRIMARY KEY  (`prices_id`),
KEY `prices_am` (`prices_am`),
KEY `prices_locale` (`prices_locale`),
KEY `prices_watches_ID` (`prices_watches_ID`),
KEY `prices_date` (`prices_date`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=61764 ;

CREATE TABLE `watches_US` (
`watches_ID` char(10) NOT NULL,
`watches_date_added` datetime NOT NULL,
`watches_last_update` datetime default NULL,
`watches_title` varchar(255) default NULL,
`watches_small_image_height` int(11) default NULL,
`watches_small_image_width` int(11) default NULL,
`watches_description` text,
PRIMARY KEY  (`watches_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;   

The query retrieves the last 10 prices changes over a period of 30 hours, ordered by the size of the price change. So I have subqueries to get the newest price, the oldest price within 30 hours, and then to calculate the price change.

Here's the query:

SELECT watches_US.*, prices.*, watches_US.watches_ID as current_ID,
    ( SELECT prices_am FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price, 
    ( SELECT prices_date FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price_date, 
    ( SELECT prices_am FROM prices WHERE ( prices_watches_ID = current_ID AND prices_locale = 'US') AND ( prices_date >= DATE_SUB(new_price_date,INTERVAL 30 HOUR) ) ORDER BY prices_date ASC LIMIT 1 ) as old_price,
    ( SELECT ROUND(((new_price - old_price)/old_price)*100,2) ) as percent_change,
    ( SELECT (new_price - old_price) ) as absolute_change
FROM watches_US 
LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID 
WHERE ( prices_locale = 'US' )
AND ( prices_am IS NOT NULL )
AND ( prices_am != '' )
HAVING ( old_price IS NOT NULL )
AND ( old_price != 0 )
AND ( old_price != '' )
AND ( absolute_change < 0 )
AND ( prices.prices_date = new_price_date )
ORDER BY absolute_change ASC
LIMIT 10

How would I rewrite this to use joins instead, or otherwise optimize this so it doesn't take over 3 minutes to get a result? Any help would be greatly appreciated!

Thank you kindly.

UPDATE

Using the answers from below, I got the query down to this, which takes 2 seconds to run:

SELECT watches_US.*, prices.*,
    ( SELECT prices_am FROM prices prices2 WHERE ( prices2.prices_watches_ID = watches_US.watches_ID AND prices2.prices_locale = 'US') AND ( prices2.prices_date >= DATE_SUB(prices.prices_date,INTERVAL 30 HOUR) ) ORDER BY prices2.prices_date ASC LIMIT 1 ) as old_price,
    ( SELECT ROUND(((prices.prices_am - old_price)/old_price)*100,2) ) as percent_change,
    ( SELECT (prices.prices_am - old_price) ) as absolute_change
FROM watches_US 
LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID AND prices.prices_locale = 'US'
WHERE ( prices.prices_am IS NOT NULL )
AND ( prices.prices_am != '' )
AND ( prices.prices_date IN (SELECT MAX(prices_date) FROM prices WHERE prices_watches_ID = watches_US.watches_ID AND prices_locale = 'US' ) )
HAVING ( old_price IS NOT NULL )
AND ( old_price != 0 )
AND ( old_price != '' )
AND ( absolute_change < 0 )
ORDER BY absolute_change ASC
LIMIT 10

It could probably still do with some work, but it's usable as is. Thank you all for your help!


Here's a partial idea:

SELECT watches_US.*, prices.*, watches_US.watches_ID as current_ID,
    prices2.prices_am as new_price, 
    prices2.prices_date as new_price_date, 
    ( SELECT prices_am FROM prices WHERE ( prices_watches_ID = current_ID AND prices_locale = 'US') AND ( prices_date >= DATE_SUB(new_price_date,INTERVAL 30 HOUR) ) ORDER BY prices_date ASC LIMIT 1 ) as old_price,
    ( SELECT ROUND(((new_price - old_price)/old_price)*100,2) ) as percent_change,
    ( SELECT (new_price - old_price) ) as absolute_change
FROM watches_US 
LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID 
LEFT OUTER JOIN prices prices2 ON prices2.prices_watches_ID = watches_US.watches_ID 
WHERE ( prices_locale = 'US' )
AND ( prices_am IS NOT NULL )
AND ( prices_am != '' )
AND ( prices2.prices_date IN (SELECT MAX(price_date) FROM prices WHERE prices_watches_ID = watches_US.watches_ID AND prices_locale = 'US' )
HAVING ( old_price IS NOT NULL )
AND ( old_price != 0 )
AND ( old_price != '' )
AND ( absolute_change < 0 )
AND ( prices.prices_date = new_price_date )
ORDER BY absolute_change ASC
LIMIT 10

The changes are the second join on prices that's used to get new_price and new_price_date with the a WHERE clause to select only the most recent entry. You could probably clean it up a bit but I wanted to get it out there.


There are several problems with this SQL:

  • You are performing the same query multiple times:

    ( SELECT prices_am FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price, ( SELECT prices_date FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price_date,

You should be performing the query just once, give it a name and select multiple columns from it, e.q. SELECT ... sub1.prices_am, sub1.prices_date FROM ... SELECT () sub1 if I 'm not mistaken.

  • Do not for any reason use HAVING. It kills your performance, as it makes the database retrieve all rows in your query and then filter some of them out as the HAVING clause describes.


I'd start by making sure you have numeric values where you are doing comparisons and expressions. Any index that involves a type-conversion will be non-functional. Your prices are varchars.

0

精彩评论

暂无评论...
验证码 换一张
取 消