Here is the query:
select timespans.id as timespan_id, count(*) as num
from reports, timespans
where timespans.after_date >= '2011-04-13 22:08:38' and
timespans.after_date <= reports.authored_at and
reports.authored_at < timespans.before_date
group by timespans.id;
Here are the table defs:
CREATE TABLE `reports` ( `id` int(11) NOT NULL auto_increment, `source_id` int(11) default NULL, `url` varchar(255) default NULL, `lat` decimal(20,15) default NULL, `lng` decimal(20,15) default NULL, `content` text, `notes` text, `authored_at` datetime default NULL, `created_at` datetime default NULL, `updated_at` datetime default NULL, `data` text, `title` varchar(255) default NULL, `author_id` int(11) default NULL, `orig_id` varchar(255) default NULL, PRIMARY KEY (`id`), KEY `index_reports_on_title` (`title`), KEY `index_content_on_reports` (`content`(128)) CREATE TABLE `timespans` ( `id` int(11) NOT NULL auto_increment, `after_date` datetime default NULL, `before_date` datetime default NULL, `after_offset` int(11) default NULL, `before_offset` int(11) default NULL, `is_common` tinyint(1) default NULL, `created_at` datetime default NULL, `updated_at` datetime default NULL, `is_search_chunk` tinyint(1) default NULL, `is_day` tinyint(1) default NULL, PRIMARY KEY (`id`), KEY `index_timespans_on_after_date` (`after_date`), KEY `index_timespans_on_before_date` (`before_date`)
And here is the explain:
+----+-------------+-----------+-------+----------------------------------------------------------开发者_Python百科----+-------------------------------+---------+------+--------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+ | 1 | SIMPLE | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9 | NULL | 84 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | reports | ALL | NULL | NULL | NULL | NULL | 183297 | Using where | +----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
And here is the explain after I create an index on authored_at. As you can see, the index is not actually getting used (I think...)
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+ | 1 | SIMPLE | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9 | NULL | 86 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | reports | ALL | index_reports_on_authored_at | NULL | NULL | NULL | 183317 | Range checked for each record (index map: 0x8) | +----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
There are about 142k rows in the reports table, and far fewer in the timespans table.
The query is taking about 3 seconds now.
The strange thing is that if I add an index on reports.authored_at, it actually makes the query far slower, about 20 seconds. I would have thought it would do the opposite, since it would make it easy to find the reports at either end of the range, and throw the rest away, rather than having to examine all of them.
Can someone clarify? I'm stumped.
Instead of two separate indexes for the timespan table, try merging them into a single multi-column index with before_date and after_date in a single index. Then add that index to authored_at as well.
i rewrite you query like this:
select t.id, count(*) as num from timespans t
join reports r where t.after_date >= '2011-04-13 22:08:38'
and r.authored_at >= '2011-04-13 22:08:38'
and r.authored_at < t.before_date
group by t.id order by null;
and change indexes of tables
alter table reports add index authored_at_idx(authored_at);
You can used partition feature of database on column after_date
. It will help u a lot.
精彩评论