My table looks like this:
`MyDB`.`Details` (
`id` bigint(20) NOT NULL,
`run_id` int(11) NOT NULL,
`element_name` varchar(255) NOT NULL,
`value` text,
`line_order` int(11) default NULL,
`column_order` int(11) default NULL
);
I have the following SELECT statement in a stored procedure
SELECT
RULE
,TITLE
,SUM(IF(t.PASSED='Y',1,0)) AS PASS
,SUM(IF(t.PASSED='N',1,0)) AS FAIL
FROM
(
SELECT
a.line_order
,MAX(CASE WHEN a.element_name = 'PASSED' THEN a.`value` END) AS PASSED
,MAX(CASE WHEN a.element_name = 'RULE' THEN a.`value` END) AS RULE
,MAX(CASE WHEN a.element_name = 'TITLE' THEN a.`value` END) AS TITLE
FROM
Details a
WHERE
run_id = runId
GROUP BY line_order
) t
GROUP BY RULE, TITLE;
*runId is an input parameter to the stored procedure.
This query takes about 14 seconds to run. The table has 214856 rows, and the particular run_id I am filtering on has 162204 records. It's not on a super high power machine, but I feel like I could be doing this more efficiently. My main goal is to summarize by Rule and Title and show Pass and Fail count columns.
The table create script:
CREATE TABLE `MyDB`.`Details` (
`id` bigint(20) NOT NULL,
`run_id` int(11) NOT NULL,
`element_name` varchar(255) NOT NULL,
`value` text,
`line_order` int(11) default NULL,
`column_order` int(11) default NULL,
KEY `report_id` (`run_id`),
KEY `element_name` (`element_name`),
CONSTRAINT `Details_ibfk_1` FOREIGN KEY (`run_id`) REFERENCES `RunHistory` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The explain:
select `t`.`RULE` AS `RULE`,`t`.`TITLE` AS `TITLE`,sum(if((`t`.`PASSED` = _utf8'Y'),1,0)) AS `PASS`,sum(if((`t`.`PASSED` = _utf8'N'),1,0)) AS `FAIL` from (select `TAA`.`Details`.`line_order` AS `line_order`,max((case when (`TAA`.`Details`.`element_name` = _utf8'PASSED') then `TAA`.`Details`.`value` end)) AS `PASSED`,max((case when (`TAA`.`Details`.`element_name` = _utf8'RULE') then `TAA`.`Details`.`value` end)) AS `RULE`,max((case when (`TAA`.`Details`.`element_name` = _utf8'TITLE') then `TAA`.`Details`.`value` end)) AS `TITLE` from `TAA`.`Details开发者_StackOverflow` where (`TAA`.`Details`.`run_id` = 66) group by `TAA`.`Details`.`line_order`) `t` group by `t`.`RULE`,`t`.`TITLE`
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 3068, 'Using temporary; Using filesort'
2, 'DERIVED', 'Details', 'ref', 'report_id', 'report_id', '4', '', 107563, 'Using where; Using temporary; Using filesort'
Wow. Here is a couple of hints:
- The first problem here is using TEXT fields. When you use TEXT or BLOB, all temporary tables will be created on disk instead of memory. Try using varchar(N) instead, keep N as small as possible. It is strange that MySQL allows GROUPing BY on TEXT fields at all.
- I would consider creation of a composite index on ('run_id', 'line_order','element_name'). Just to try, such an index may significantly affect performance in other parts of your system.
Well the biggest gain would be if you could normalize your data away from attribute-value approach and avoid transforming them. Can you do that?
Can you also do SHOW INDEXES IN Details?
EDIT: It seems the newtorder index suggestion worked for you. Could you just satisfy my curiosity and check two things:
Here is a rewrite of the query that turns the aggregate into joins, could you check how does it compare to the original one?
SELECT
RULE
,TITLE
,SUM(IF(t.PASSED='Y',1,0)) AS PASS
,SUM(IF(t.PASSED='N',1,0)) AS FAIL
FROM
(
SELECT
a.line_order,
a.value AS TITLE,
b.value AS RULE,
c.value AS PASSED
FROM
Details a INNER JOIN
Details b ON a.line_order = b.line_order AND a.run_id = runId AND a.element_name = 'PASSED' INNER JOIN
Details c ON b.line_order = c.line_order AND b.run_id = runId AND b.element_name = 'RULE' AND c.element_name = 'TITLE'
) t
GROUP BY RULE, TITLE;
A few notes regarding the join
- I assumed that for a TITLE there will be at least one RULE and PASSED value, if not turn the INNER JOINS to LEFT JOINS
- Also I have moved the where conditions inside the join condition, this can sometimes be a hint to mysql to use indexes where it normally would not, but I am not sure about that and if you have inclination you can experiment with join conditions moving them from the ON into WHERE
Finally, could you also check what happens to the speed of your query if you add index on ('run_id', 'line_order','element_name', 'value')? I am not sure it will improve things (it will increase the size of the index, but will cut access to the individual rows), so it would be interesting to see the numbers (check the plan to see if it is really used).
Finally, another note on the original query - it might be possible to at least do aggregation in one step. Would you like to investigate that further?
精彩评论