开发者

index help for a MySQL query using greater-than operator and ORDER BY

开发者 https://www.devze.com 2022-12-22 19:32 出处:网络
I have a table with at least a couple million rows and a schema of all integers that looks roughly like this:

I have a table with at least a couple million rows and a schema of all integers that looks roughly like this:

start
stop
first_user_id
second_user_id

The rows get pulled using the following queries:

SELECT * 
  FROM tbl_name 
 WHERE stop >= M 
   AND first_user_id=N  
   AND second_user_id=N 
ORDER BY start ASC

SELECT * 
  FROM tbl_name 
 WHERE stop >= M 
   AND first_user_id=N 
ORDER BY start ASC

I cannot figure out the best indexes to speed up these queries. The problem seems to be the ORDER BY because when I take that out the queries are fast.

I've tried all different types of indexes using the standard index format:

ALTER TABLE tbl_name ADD INDEX index_name (index_col_1,index_col_2,...)

And none of them seem to speed up the queries. Does anyone have any idea what index would work? Also, should I be trying a different type of index? I can't guarantee the uniqueness of each row so I've avoided UNIQUE indexes.

Any guidance/help would be appreciated. Thanks!

Update: here are a list of the indexes, I didn't include this originally since I've taken a shotgun approach and added a ton of indexes looking for one that works:

start_index: [start, first_user_id, second_user_id]
stop_index: [stop, first_user_id, second_user_id]
F1_index: [first_user_id]
F2_index: [second_user_id]
F3_index: [another_id]
test_1_index: [first_user_id,stop,start]
test_2_index: [first_user_id,start,stop]
test_3_index: [start,stop,first_user_id,second_u开发者_如何学Pythonser_id]
test_4_index: [stop,first_user_id,second_user_id,start]
test_5_index: [stop,start]

And here is the EXPLAIN output.

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: listing
type: index_merge
possible_keys: stop_index,F1_index,F3_index,test_1_index,test_2_index,test_4_index,test_5_index
key: F1_index,F3_index
key_len: 5,5
ref: NULL
rows: 238
Extra: Using intersect(F1_index,F3_index); Using where; Using filesort

Update for posterity

We ended up completely re-evaluating how we were querying the table and chose these indexes:

index_select_1: [first_user_id,start,stop]
index_select_2: [first_user_id,second_user_id,start,stop]

and then we select on the table with queries like these:

SELECT * 
  FROM tbl_name 
 WHERE first_user_id=N
   AND start >= M 
ORDER BY start ASC

SELECT * 
  FROM tbl_name 
 WHERE first_user_id=N   
   AND second_user_id=N
   AND start >= M 
ORDER BY start ASC

Thanks to everyone that answered, you really helped me think through the problem.


Could you make your sample table and EXPLAIN results match? Because, obviously it is not a same situation and we don't know if you maybe made a mistake in abstracting your real query only by looking at the provided EXPLAIN results. If you don't want to show too much of a structure then reverse it and create the quoted table structure and provide EXPLAIN result on that (maybe you will catch the problem that way).

Now one thing is certain - sorting is using filesort, which is bad.

To simplify (we'll come back to it) - compound indexes useful for sorting need to have the sort field in front.

Example idx(ID, Start)

ID      Start
1
        5
        8
        8
        10
        25
2
        3
        9
        10
        40
        41
        42
        42
...

In the above example the index is not of much help for sorting if you don't have where condition in which ID is limited to only one value.

But, this exception is important since you have single row selectivity on one or both id fields.

So from your indexes the only indexes that have start at the beginning are

start_index: [start, first_user_id, second_user_id]
test_3_index: [start,stop,first_user_id,second_user_id]

Mysql ignores the index

start_index: [start, first_user_id, second_user_id]

because it has better choices in terms of selectivity - it would need to do an index scan with this index and it has indexes that will allow it to do index intersect jumping directly to (unsorted) results. It expects better selectivity from the intersect and selectivity drives the planer.

Once the result is obtained mysql should realize that it could use another index to sort the results, but it seems that it can not see how cheap that would be.

So to help the planer you could create an index that will capitalize on your single value selectivity with index such as:

two_ids_with_sort: [first_user_id, second_user_id, start]

I assume that above would work very well on your second query where you have conditions on both id's giving you access to presorted start record pointers. The following query should do the same for the first query:

one_id_with_sort: [first_user_id, start]

Only if you end up with a lot of records in the result sets I would look into indexing it further.

There are two paths there a) adding the field stop to the end of the index b) creating two more similar indexes with stop instead of start (index intersect could be used there and wider range of queries could benefit from it)

But do test all of the above theories.

Couple of general suggestions

  • write your conditions in most selective manner first
  • when testing indexes start with single column indexes first and then expand to compound indexes (for example for sorting on start I would add index only on start)
  • too many indexes are not so good in mysql as the query planer is not able to quickly run through all the possible combinations and can not properly estimate costs of all the operations (so it cuts corners and the best index combination and plan might be left out)
  • therefore test indexes with USE INDEX (index1) FOR ORDER BY in your select to gauge a benefit for a certain index over planer, see more here (esp FORCE option; also - aim to leave only the useful indexes and see if planer will be able to utilize them then, if not, as a last resort only, force the indexes in your queries for which performance is crucial. keep in mind that this is a bad practice in terms of administration and design).


Try to avoid using ranges (e.g. >, >=, <, <=) as the left most portion of a WHERE clause. MySQL is unable to use an index for any fields in the WHERE clause to the right of a range.

At first glance I would say to at least create an index on first_user_id,stop,second_user_id. Then specify the query accordingly:

select * from tbl_name where first_user_id=N and stop >= M and second_user_id=N

UPDATE: D'oh, so I completely contradicted myself in the above query - since incorporating second_user_id into the index is useless when specifying it in the WHERE after the stop "range", so let's try this again.

ALTER TABLE tbl_name ADD INDEX index_1 (first_user_id,stop) ALTER TABLE tbl_name ADD INDEX index_2 (first_user_id,second_user_id,stop)


The strange thing is that your query only returns 238 rows (?)

Since you stated that the query is faster without the ORDER BY, may I suggest that you do the sort after the query ?
That may be quickest way to fix the problem.

Also, don't forget to remove unused indexes afterwards :)


edit

That's a wild guess (because I'm not sure mysql won't factorize the query to its current form), but you could try to do the following:

SELECT * FROM (
    SELECT * 
      FROM tbl_name 
     WHERE stop >= M 
       AND first_user_id=N 
    ) AS derived
ORDER BY start ASC
0

精彩评论

暂无评论...
验证码 换一张
取 消