开发者

Create index to optimize slow query

开发者 https://www.devze.com 2023-02-20 17:17 出处:网络
There is a query that takes too long on a 250,000 rows table. I need to speed it up: create table occurrence (

There is a query that takes too long on a 250,000 rows table. I need to speed it up:

create table occurrence (
occurrence_id int(11) primary key auto_increment,
client_id varchar(16) not null,
occurrence_cod varchar(50) not null,
entry_date datetime not null,
zone varchar(8) null default null
)
;

insert into occurrence (client_id, occurrence_cod, entry_date, zone)
values
('1116', 'E401', '2011-03-28 18:44', '004'),
('1116', 'R401', '2011-03-28 17:44', '004'),
('1116', 'E401', '2011-03-28 16:44', '004'),
('1338', 'R401', '2011-03-28 14:32', '001')
;

select client_id, occurrence_cod, entry_date, zone
from occurrence o
where
    occurrence_cod = 'E401'
    and
    entry_date = (
        select max(entry_date)
        from occurrence
        where client_id = o.client_id
    )
;
+-----------+----------------+---------------------+------+
| client_id | occurrence_cod | entry_date          | zone |
+-----------+----------------+---------------------+------+
| 1116      | E401           | 2011-03-28 16:44:00 | 004  |
+-----------+----------------+---------------------+------+
1 row in set (0.00 sec)

The table structure is from a commercial application and can 开发者_如何转开发not be altered.

What would be the best index(es) to optimize it? Or a better query?

EDIT:

It is the last occurrence of the E401 code for each client and only if the last occurrence is that code.


The ideal indexes for such a query would be:

index #1: [client_id] + [entry_date]
index #2: [occurence_cod] + [entry_date]

Nevertheless those indexes can be simplified if it happens that data have some characteristics. This will save file space, and also time when data are updated (insert/delete/update).

If there is rarely more than one "occurence" record for each [client_id], then index #1 can be only [client_id].

By the same way, if there is rarely more than one "occurence" record for each [occurence_cod], then index #1 can be only [occurence_cod].

It may be more useful to turn index #2 into [entry_date] + [occurence_cod]. This will enable you to use the index for criteria that are only on [entry_date].

Regards,


Unless you are truly trying to get the row with the max date, if and only if the occurrence_cod matches, this should work:

select client_id, occurrence_cod, entry_date, zone 
from occurrence o 
where occurrence_cod = 'E401'
ORDER BY entry_date DESC
LIMIT 1;

It will return the most recent row with occurrence_cod='E401'


select 
  a.client_id, 
  a.occurrence_cod, 
  a.entry_date, 
  a.zone
from occurrence a 
  inner join (
    select client_id, occurence_cod, max(entry_date) as entry_date 
    from occurence
  ) as b
on 
  a.client_id = b.client_id and 
  a.occurence_cod = b.occurence_cod and 
  a.entry_date = b.entry_date

where
    a.occurrence_cod = 'E401'

Using this approach you're avoiding the subselect per row, and it should be faster to compare two big sets of data than a big set of data for each row of the set.


I'd re-write the query:

select client_id, occurrence_cod, max(entry_date), zone
from occurrence 
group by client_id, occurrence_cod, zone;

(assuming the other lines are indeed identical, and entry date is the only thing that changes).


Did you try putting an index on occurrence_cod?


Try this if other approaches not available.

  1. create a new table: last_occurrence.
  2. Every time user occurred, update the corresponding row in this last_occurrence table.

by doing this, you just need to use the following sql to get your result :)

select * from last_occurrence

0

精彩评论

暂无评论...
验证码 换一张
取 消