This is a weird one. I am trying to use Views in MySQL (I'm reasonably new to MySQL with more experience with Sybase and SQL Server). Any way this new project we are using MySQL as it seems to have good performance. However to make querying for a web front end simpler we decided to create a few views, all work well, but they take forever to run.
The views are very simple, just select statements (these tables do have a few million rows in them). Say for example this query:
SELECT CAST(classifier_results.msgDate as DATE) AS mdate
,classifier_results.objClass AS objClass
,COUNT(classifier_results.objClass) AS obj
,classifier_results.subjClass AS subjClass
,COUNT(classifier_results.subjClass) AS subj
FROM classifier_results
WHERE (classifier_results.msgDate >= (curdate() - 20))
GROUP BY
CAST(classifier_results.msgDate as DATE)
,classifier_results.objClass
,classifier_results.subjClass
ORDER BY classifier_results.msgDate DESC
When run as a normal select takes around 1.5 seconds to return a result.
However when this query is put into a view (as is) - i.e.
CREATE VIEW V1a_sentiment_AI_current AS
SELECT CAST(classifier_results.msgDate as DATE) AS mdate
,classifier_results.objClass AS objClass
,COUNT(classifier_results.objClass) AS obj
,classifier_results.subjClass AS subjClass
,COUNT(classifier_results.subjClass) AS subj
FROM classifier_results
WHERE (classifier_results.msgDate >= (curdate() - 20))
GROUP BY
CAST(classifier_results.msgDate as DATE)
,classifier_results.objClass
,classifier_results.subjClass
ORDER BY classifier_results.msgDate DESC
The query takes about 10 times longer (22-30 seconds). So I'm thinking maybe there is some optimization or query caching that doesnt work with Views or maybe there is some setting we've missed in the MySQL config. But is there any way to speed up th开发者_运维技巧is view so its just a nice placeholder for this query?
Running EXPLAIN on the two queries: The normal select gives:
1, SIMPLE, classifier_results, ALL, idx_date, , , , 594845, Using where; Using temporary; Using filesort
The view select gives:
1, PRIMARY, , ALL, , , , , 100,
2, DERIVED, classifier_results, ALL, idx_date, , , , 594845, Using where; Using temporary; Using filesort
Try re-creating your view using this:
CREATE ALGORITHM = MERGE VIEW `V1a_sentiment_AI_current` AS
SELECT CAST(classifier_results.msgDate as DATE) AS mdate
,classifier_results.objClass AS objClass
,COUNT(classifier_results.objClass) AS obj
,classifier_results.subjClass AS subjClass
,COUNT(classifier_results.subjClass) AS subj
FROM classifier_results
WHERE (classifier_results.msgDate >= (curdate() - 20))
GROUP BY
CAST(classifier_results.msgDate as DATE)
,classifier_results.objClass
,classifier_results.subjClass
ORDER BY classifier_results.msgDate DESC
More information on MySQL's view processing algorithms can be found here.
This is a really common problem. It can be very hard to write DRY, re-usable SQL. There is a workaround I've found though.
Firstly, as others have pointed out, you can and should use VIEWs to do this wherever possible using the set ALGORITHM = MERGE, so that any queries using them are optimised on the merged SQL statement's where clause rather than having the VIEW evaluated for the entire view which can be catastrophically large.
In this case, since you cannot use MERGE because of the group/count aspect, you might want to try using a stored procedure that creates a temporary session table as a workaround.
This technique allows you to write reusable queries that can be accessed from middleware / framework code and called from inside other stored procedures, so you can keep code contained, maintainable and reusable.
I.e. if you know in advance that the query will be filtered on certain conditions, put those in a stored procedure. (It may be more efficient to post-filter the data set, or a combination - it depends how you use the data and what common sets are needed).
CREATE PROCEDURE sp_create_tmp_V1a_sentiment_AI_current(parm1, parm2 etc)
BEGIN
drop temporary table if exists tmp_V1a_sentiment_AI_current;
create temporary table tmp_V1a_sentiment_AI_current
as
SELECT CAST(classifier_results.msgDate as DATE) AS mdate
,classifier_results.objClass AS objClass
,COUNT(classifier_results.objClass) AS obj
,classifier_results.subjClass AS subjClass
,COUNT(classifier_results.subjClass) AS subj
FROM classifier_results
WHERE (classifier_results.msgDate >= (curdate() - 20))
-- and/or other filters on parm1, parm2 passed in
GROUP BY
CAST(classifier_results.msgDate as DATE)
,classifier_results.objClass
,classifier_results.subjClass
ORDER BY classifier_results.msgDate DESC;
END;
Now, any time you need to work with this data, you call the procedure and then either select the result (possibly with additional where clause parameters) or join with it in any other query.
The table is a session temporary table so it will persist beyond the call to the procedure. The calling code can either drop it once it's finished with the data or it'll go automatically when the session goes or a subsequent call to the sproc is made.
Hope that's helpful.
MERGE can't be used here because of the count() aggregates in the select list; it might help in these cases to specify TEMPTABLE to save the engine from having to decide between them. Once I'd decided which algorithm to use I'd look at the EXPLAIN plan and try to add an index hint or locate a missing index.
精彩评论