MySQL - Views - Super slow query_问答_开发者_运维开发者技术经验分享

This is a weird one. I am trying to use Views in MySQL (I'm reasonably new to MySQL with more experience with Sybase and SQL Server). Any way this new project we are using MySQL as it seems to have good performance. However to make querying for a web front end simpler we decided to create a few views, all work well, but they take forever to run.

The views are very simple, just select statements (these tables do have a few million rows in them). Say for example this query:

SELECT CAST(classifier_results.msgDate as DATE) AS mdate
       ,classifier_results.objClass AS objClass
       ,COUNT(classifier_results.objClass) AS obj
       ,classifier_results.subjClass AS subjClass
       ,COUNT(classifier_results.subjClass) AS subj 
FROM classifier_results 
WHERE (classifier_results.msgDate >= (curdate() - 20)) 
GROUP BY 
  CAST(classifier_results.msgDate as DATE)
  ,classifier_results.objClass
  ,classifier_results.subjClass 
ORDER BY classifier_results.msgDate DESC

When run as a normal select takes around 1.5 seconds to return a result.

However when this query is put into a view (as is) - i.e.

CREATE VIEW  V1a_sentiment_AI_current AS    
SELECT CAST(classifier_results.msgDate as DATE) AS mdate
       ,classifier_results.objClass AS objClass
       ,COUNT(classifier_results.objClass) AS obj
       ,classifier_results.subjClass AS subjClass
       ,COUNT(classifier_results.subjClass) AS subj 
FROM classifier_results 
WHERE (classifier_results.msgDate >= (curdate() - 20)) 
GROUP BY 
  CAST(classifier_results.msgDate as DATE)
  ,classifier_results.objClass
  ,classifier_results.subjClass 
ORDER BY classifier_results.msgDate DESC

The query takes about 10 times longer (22-30 seconds). So I'm thinking maybe there is some optimization or query caching that doesnt work with Views or maybe there is some setting we've missed in the MySQL config. But is there any way to speed up th开发者_运维技巧is view so its just a nice placeholder for this query?

Running EXPLAIN on the two queries: The normal select gives:

1, SIMPLE, classifier_results, ALL, idx_date, , , , 594845, Using where; Using temporary; Using filesort

The view select gives:

1, PRIMARY, , ALL, , , , , 100,
2, DERIVED, classifier_results, ALL, idx_date, , , , 594845, Using where; Using temporary; Using filesort

Try re-creating your view using this:

CREATE ALGORITHM = MERGE VIEW `V1a_sentiment_AI_current` AS    
SELECT CAST(classifier_results.msgDate as DATE) AS mdate
   ,classifier_results.objClass AS objClass
   ,COUNT(classifier_results.objClass) AS obj
   ,classifier_results.subjClass AS subjClass
   ,COUNT(classifier_results.subjClass) AS subj 
FROM classifier_results 
WHERE (classifier_results.msgDate >= (curdate() - 20)) 
GROUP BY 
  CAST(classifier_results.msgDate as DATE)
  ,classifier_results.objClass
  ,classifier_results.subjClass 
ORDER BY classifier_results.msgDate DESC

More information on MySQL's view processing algorithms can be found here.

This is a really common problem. It can be very hard to write DRY, re-usable SQL. There is a workaround I've found though.

Firstly, as others have pointed out, you can and should use VIEWs to do this wherever possible using the set ALGORITHM = MERGE, so that any queries using them are optimised on the merged SQL statement's where clause rather than having the VIEW evaluated for the entire view which can be catastrophically large.

In this case, since you cannot use MERGE because of the group/count aspect, you might want to try using a stored procedure that creates a temporary session table as a workaround.

This technique allows you to write reusable queries that can be accessed from middleware / framework code and called from inside other stored procedures, so you can keep code contained, maintainable and reusable.

I.e. if you know in advance that the query will be filtered on certain conditions, put those in a stored procedure. (It may be more efficient to post-filter the data set, or a combination - it depends how you use the data and what common sets are needed).

CREATE PROCEDURE sp_create_tmp_V1a_sentiment_AI_current(parm1, parm2 etc)
BEGIN

  drop temporary table if exists tmp_V1a_sentiment_AI_current;

  create temporary table tmp_V1a_sentiment_AI_current
  as
  SELECT CAST(classifier_results.msgDate as DATE) AS mdate
         ,classifier_results.objClass AS objClass
         ,COUNT(classifier_results.objClass) AS obj
         ,classifier_results.subjClass AS subjClass
         ,COUNT(classifier_results.subjClass) AS subj 
  FROM classifier_results 
  WHERE (classifier_results.msgDate >= (curdate() - 20)) 
  -- and/or other filters on parm1, parm2 passed in
  GROUP BY 
    CAST(classifier_results.msgDate as DATE)
    ,classifier_results.objClass
    ,classifier_results.subjClass 
  ORDER BY classifier_results.msgDate DESC;

END;

Now, any time you need to work with this data, you call the procedure and then either select the result (possibly with additional where clause parameters) or join with it in any other query.

The table is a session temporary table so it will persist beyond the call to the procedure. The calling code can either drop it once it's finished with the data or it'll go automatically when the session goes or a subsequent call to the sproc is made.

Hope that's helpful.

MERGE can't be used here because of the count() aggregates in the select list; it might help in these cases to specify TEMPTABLE to save the engine from having to decide between them. Once I'd decided which algorithm to use I'd look at the EXPLAIN plan and try to add an index hint or locate a missing index.