There is a query that takes too long on a 250,000 rows table. I need to speed it up:
create table occurrence (
occurrence_id int(11) primary key auto_increment,
client_id varchar(16) not null,
occurrence_cod varchar(50) not null,
entry_date datetime not null,
zone varchar(8) null default null
)
;
insert into occurrence (client_id, occurrence_cod, entry_date, zone)
values
('1116', 'E401', '2011-03-28 18:44', '004'),
('1116', 'R401', '2011-03-28 17:44', '004'),
('1116', 'E401', '2011-03-28 16:44', '004'),
('1338', 'R401', '2011-03-28 14:32', '001')
;
select client_id, occurrence_cod, entry_date, zone
from occurrence o
where
occurrence_cod = 'E401'
and
entry_date = (
select max(entry_date)
from occurrence
where client_id = o.client_id
)
;
+-----------+----------------+---------------------+------+
| client_id | occurrence_cod | entry_date | zone |
+-----------+----------------+---------------------+------+
| 1116 | E401 | 2011-03-28 16:44:00 | 004 |
+-----------+----------------+---------------------+------+
1 row in set (0.00 sec)
The table structure is from a commercial application and can 开发者_如何转开发not be altered.
What would be the best index(es) to optimize it? Or a better query?
EDIT:
It is the last occurrence of the E401 code for each client and only if the last occurrence is that code.
The ideal indexes for such a query would be:
index #1: [client_id] + [entry_date]
index #2: [occurence_cod] + [entry_date]
Nevertheless those indexes can be simplified if it happens that data have some characteristics. This will save file space, and also time when data are updated (insert/delete/update).
If there is rarely more than one "occurence" record for each [client_id], then index #1 can be only [client_id].
By the same way, if there is rarely more than one "occurence" record for each [occurence_cod], then index #1 can be only [occurence_cod].
It may be more useful to turn index #2 into [entry_date] + [occurence_cod]. This will enable you to use the index for criteria that are only on [entry_date].
Regards,
Unless you are truly trying to get the row with the max date, if and only if the occurrence_cod matches, this should work:
select client_id, occurrence_cod, entry_date, zone
from occurrence o
where occurrence_cod = 'E401'
ORDER BY entry_date DESC
LIMIT 1;
It will return the most recent row with occurrence_cod='E401'
select
a.client_id,
a.occurrence_cod,
a.entry_date,
a.zone
from occurrence a
inner join (
select client_id, occurence_cod, max(entry_date) as entry_date
from occurence
) as b
on
a.client_id = b.client_id and
a.occurence_cod = b.occurence_cod and
a.entry_date = b.entry_date
where
a.occurrence_cod = 'E401'
Using this approach you're avoiding the subselect per row, and it should be faster to compare two big sets of data than a big set of data for each row of the set.
I'd re-write the query:
select client_id, occurrence_cod, max(entry_date), zone
from occurrence
group by client_id, occurrence_cod, zone;
(assuming the other lines are indeed identical, and entry date is the only thing that changes).
Did you try putting an index on occurrence_cod
?
Try this if other approaches not available.
- create a new table: last_occurrence.
- Every time user occurred, update the corresponding row in this last_occurrence table.
by doing this, you just need to use the following sql to get your result :)
select * from last_occurrence
精彩评论