开发者

MySQL: Optimizing COUNT(*) and GROUP BY

开发者 https://www.devze.com 2023-03-20 05:17 出处:网络
I have a simple MyISAM table resembling the following (trimmed for readability -- in reality, there are more columns, all of which are constant width and some of which are nullable):

I have a simple MyISAM table resembling the following (trimmed for readability -- in reality, there are more columns, all of which are constant width and some of which are nullable):

CREATE TABLE IF NOT EXISTS `history` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `time` int(11) NOT NULL,
  `event` int(11) NOT NULL,
  `source` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `event` (`event`),
  KEY `time` (`time`),
);

Presently the table contains only about 6,000,000 rows (of which currently about 160,000 match the query below), but this is expected to increase. Given a particular event ID and grouped by source, I want to know how many events with that ID were logged during a particular interval of time. The answer to the query might be something along the lines of "Today, event X happened 120 times for source A, 105 times for source B, and 900 times for source C."

The query I concocted does perform this task, but it performs monstrously badly, taking well over a minute to execute when the timespan is set to "all time" and in excess of 30 seconds for as little as a week back:

SELECT COUNT(*) AS count FROM history
WHERE event=2000 AND time >= 0 AND time < 1310563644
GROUP BY source
ORDER BY count DESC

This is not for real-time use, so even if the query takes a second or two that would b开发者_运维问答e fine, but several minutes is not. Explaining the query gives the following, which troubles me for obvious reasons:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE          history ref     event,time      event   4           const   160399  Using where; Using temporary; Using filesort

I've experimented with various multi-column indexes (such as (event, time)), but with no improvement. This seems like such a common use case that I can't imagine there not being a reasonable solution, but my Googling all boil down to versions of the query I already have, with no particular suggestions on how to avoid the temporary (and even then, why performance is so abysmal).

Any suggestions?


You say you have tried multi-column indexes. Have you also tried single-column indexes, one per column?

UPDATE: Also, the COUNT(*) operation over a GROUP BY clause is probably a lot faster, if the grouped column also has an index on it... Of course, this depends on the number of NULL values that are actually in that column, which are not indexed.

For event, MySQL can execute a UNIQUE SCAN, which is quite fast, whereas for time, a RANGE SCAN will be applied, which is not so fast... If you separate indexes, I'd expect better performance than with multi-column ones.

Also, maybe you could gain something by partitioning your table by some expected values / value ranges:

http://dev.mysql.com/doc/refman/5.5/en/partitioning-overview.html


I offer you to try this multi-column index:

ALTER TABLE `history` ADD INDEX `history_index` (`event` ASC, `time` ASC, `source` ASC);

Then if it doesn't help, try to force index on this query:

SELECT COUNT(*) AS count FROM history USE INDEX (history_index)
WHERE event=2000 AND time >= 0 AND time < 1310563644
GROUP BY source
ORDER BY count DESC


If the source are known or you want to find the count for specific source, then you can try like this.

select count(source= 'A' or NULL) as A,count(source= 'B' or NULL) as B from history; and for ordering you can do it in your application code. Also try with indexing event and source together.

This will be definitely faster than the older one.

0

精彩评论

暂无评论...
验证码 换一张
取 消