开发者

How can I optimize a Mysql query that searches for rows in a certain date range

开发者 https://www.devze.com 2023-02-25 06:00 出处:网络
Here is the query: select timespans.id as timespan_id, count(*) as num from reports, timespans wheretimespans.after_date >= \'2011-04-13 22:08:38\' and

Here is the query:

select timespans.id as timespan_id, count(*) as num
 from reports, timespans
 where  timespans.after_date >= '2011-04-13 22:08:38' and
        timespans.after_date <= reports.authored_at and
        reports.authored_at < timespans.before_date
 group by timespans.id;

Here are the table defs:

CREATE TABLE `reports` (
  `id` int(11) NOT NULL auto_increment,
  `source_id` int(11) default NULL,
  `url` varchar(255) default NULL,
  `lat` decimal(20,15) default NULL,
  `lng` decimal(20,15) default NULL,
  `content` text,
  `notes` text,
  `authored_at` datetime default NULL,
  `created_at` datetime default NULL,
  `updated_at` datetime default NULL,
  `data` text,
  `title` varchar(255) default NULL,
  `author_id` int(11) default NULL,
  `orig_id` varchar(255) default NULL,
  PRIMARY KEY  (`id`),
  KEY `index_reports_on_title` (`title`),
  KEY `index_content_on_reports` (`content`(128))

CREATE TABLE `timespans` (
  `id` int(11) NOT NULL auto_increment,
  `after_date` datetime default NULL,
  `before_date` datetime default NULL,
  `after_offset` int(11) default NULL,
  `before_offset` int(11) default NULL,
  `is_common` tinyint(1) default NULL,
  `created_at` datetime default NULL,
  `updated_at` datetime default NULL,
  `is_search_chunk` tinyint(1) default NULL,
  `is_day` tinyint(1) default NULL,
  PRIMARY KEY  (`id`),
  KEY `index_timespans_on_after_date` (`after_date`),
  KEY `index_timespans_on_before_date` (`before_date`)

And here is the explain:

+----+-------------+-----------+-------+----------------------------------------------------------开发者_Python百科----+-------------------------------+---------+------+--------+----------------------------------------------+
| id | select_type | table     | type  | possible_keys                                                | key                           | key_len | ref  | rows   | Extra                                        |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9       | NULL |     84 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | reports   | ALL   | NULL                                                         | NULL                          | NULL    | NULL | 183297 | Using where                                  | 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+

And here is the explain after I create an index on authored_at. As you can see, the index is not actually getting used (I think...)

+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
| id | select_type | table     | type  | possible_keys                                                | key                           | key_len | ref  | rows   | Extra                                          |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
|  1 | SIMPLE      | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9       | NULL |     86 | Using where; Using temporary; Using filesort   | 
|  1 | SIMPLE      | reports   | ALL   | index_reports_on_authored_at                                 | NULL                          | NULL    | NULL | 183317 | Range checked for each record (index map: 0x8) | 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+

There are about 142k rows in the reports table, and far fewer in the timespans table.

The query is taking about 3 seconds now.

The strange thing is that if I add an index on reports.authored_at, it actually makes the query far slower, about 20 seconds. I would have thought it would do the opposite, since it would make it easy to find the reports at either end of the range, and throw the rest away, rather than having to examine all of them.

Can someone clarify? I'm stumped.


Instead of two separate indexes for the timespan table, try merging them into a single multi-column index with before_date and after_date in a single index. Then add that index to authored_at as well.


i rewrite you query like this:

select t.id, count(*) as num from timespans t 
  join reports r where t.after_date >= '2011-04-13 22:08:38' 
  and r.authored_at >= '2011-04-13 22:08:38' 
  and r.authored_at < t.before_date 
group by t.id order by null;

and change indexes of tables

alter table reports add index authored_at_idx(authored_at);


You can used partition feature of database on column after_date. It will help u a lot.

0

精彩评论

暂无评论...
验证码 换一张
取 消