Recently, we've noticed a particular query popping up in our slow query logs taking quite some time. I've analyzed it to the best of my ability, but can't figure out for the life of me why it's taking so long, and why the indexes we've set up aren't being used.
Here's a simplified (i.e., readable) version of the query for the purpose of example:
SELECT processstage.id AS processstage_id,
processstage.job_id AS processstage_job_id,
processstage.event_id AS processstage_event_id, ...
FROM processstage INNER JOIN jobevent ON jobevent.id = processstage.event_id
WHERE processstage.due_date <= '2009-10-28 16:07:59' AND (EXISTS (
SELECT 1 FROM job
WHERE jobevent.job_id = job.id AND job.brand_id = 1
)) ORDER BY processstage.due_date;
Also, for good measure, it's important to note the size of the table, processstage
:
mysql> SELECT COUNT(id) FROM processstage;
+-----------+
| COUNT(id) |
+-----------+
| 596183 |
+-----------+
When I run EXPLAIN on the query, I find that the processstage table is reading over a huge number of rows (see "Using where; Using filesort"), as no index is being used (that I can tell):
mysql> EXPLAIN SELECT processstage.id AS processstage_id, processstage.job_id AS processstage_job_id, processstage.event_id AS processstage_event_id FROM processstage INNER JOIN jobevent ON jobevent.id = processstage.event_id WHERE processstage.due_date <= '2009-10-28 16:07:59' AND (EXISTS (SELECT 1 FROM job WHERE jobevent.job_id = job.id AND job.brand_id = 1)) ORDER BY processstage.due_date;
+----+--------------------+--------------+--------+---------------------------------------------------+---------+---------+------------------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+--------+---------------------------------------------------+---------+---------+------------------------------+--------+-----------------------------+
| 1 | PRIMARY | processstage | ALL | ix_processstage_due_date,processstage_event_id_fk | NULL | NULL | NULL | 606045 | Using where; Using filesort |
| 1 | PRIMARY | jobevent | eq_ref | PRIMARY | PRIMARY | 4 | processstage.event_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | job | eq_ref | PRIMARY,ix_job_brand_id | PRIMARY | 4 | jobevent.job_id | 1 | Using where |
+----+--------------------+--------------+--------+---------------------------------------------------+---------+---------+------------------------------+--------+-----------------------------+
3 rows in set (0.00 sec)
Oddly enough, we've got indexes on the columns utilized in the query's WHERE clause:
mysql> SHOW INDEXES FROM processstage;
+--------------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| processstage | 0 | PRIMARY | 1 | id | A | 614150 | NULL | NULL | | BTREE | |
| processstage | 1 | ix_processstage_job_id | 1 | job_id | A | 开发者_运维百科 47242 | NULL | NULL | YES | BTREE | |
| processstage | 1 | ix_processstage_stop_date | 1 | stop_date | A | 614150 | NULL | NULL | YES | BTREE | |
| processstage | 1 | ix_processstage_order | 1 | order | A | 16 | NULL | NULL | YES | BTREE | |
| processstage | 1 | ix_processstage_start_date | 1 | start_date | A | 122830 | NULL | NULL | YES | BTREE | |
| processstage | 1 | ix_processstage_milestone | 1 | milestone | A | 12794 | NULL | NULL | YES | BTREE | |
| processstage | 1 | ix_processstage_due_date | 1 | due_date | A | 51179 | NULL | NULL | YES | BTREE | |
| processstage | 1 | ix_processstage_process_id | 1 | process_id | A | 76768 | NULL | NULL | YES | BTREE | |
| processstage | 1 | processstage_event_id_fk | 1 | event_id | A | 3722 | NULL | NULL | YES | BTREE | |
+--------------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Any ideas as to why our indexes are going seemingly unused? Perhaps I'm missing something very obvious or am approaching this entirely the wrong way. Any input or suggestion is very much appreciated!
try using USE INDEX. if mysql decides that it needs to read a certain fraction of the table, it will resort to a table scan. with USE INDEX, you're telling mysql to consider table scans very expensive.
How many jobevents do you get where job.brand = 1? If it's a reasonable amount you could try the following queries to alleviate some joins and the sub query.
First Query:
select distinct jobevent.id from jobevent
inner join job on job.id = jobevent.job_id
where job.brand = 1
followed by
select processstage.id as processstage_id,
processstage.job_id as processstage_job_id,
processstage.event_id as processstage_event_id, ...
from processstage
where processstage.due_date <= '2009-10-28 16:07:59' and
processstage.event_id in (list of event ids from the previous query)
order by processstage.due_date;
Previous Thoughts:
Have you tried doing a regular join to jobs instead of the dependent sub query? Something like:
SELECT processstage.id AS processstage_id,
processstage.job_id AS processstage_job_id,
processstage.event_id AS processstage_event_id, ...
FROM processstage
INNER JOIN jobevent ON jobevent.id = processstage.event_id
INNER JOIN job ON job.id = jobevent.job_id
WHERE processstage.due_date <= '2009-10-28 16:07:59' AND
job.brand = 1
ORDER BY processstage.due_date;
Do you have job events that don't have jobs?
Some ideas:
- Do an explicit type conversion from your string '2009-10-28 16:07:59' to a date. What may happen now is that your processstage.due_date are converted to strings before the comparison. Doing a type conversion (not sure about the systax in MySQL, but should be something like
CAST (<your date string> as DATE)
will help the optimizer figure out to use the index. - How many records satisfy the date condition? If most of the table fits the condition or the index is not very selective, there may not be any advantage to using the index.
精彩评论