开发者

SQL Query Slow? Should it be?

开发者 https://www.devze.com 2023-01-16 07:22 出处:网络
Using SQLite, Got a table with ~10 columns. Theres ~25million rows. That table has an INDEX on \'sid, uid, area, type\'.

Using SQLite, Got a table with ~10 columns. Theres ~25million rows.

That table has an INDEX on 'sid, uid, area, type'.

I run a sel开发者_运维知识库ect like so:

SELECT sid from actions where uid=1234 and area=1 and type=2

That returns me 1571 results, and takes 4 minutes to complete.

Is that sane?

I'm far from an SQL expert, so hopefully someone can fill me in on what I'm missing. Why could this possibly take 4+ minutes with everything indexed?

Any recommended resources to learn about achieving high SQL performance? I feel like a lot of the Google results just give me opinions or anecdotes, I wouldn't mind a solid book.


Create uid+area+type index instead, or uid+area+type+sid


Since the index starts with the sid column, it must do a scan (start at the beginning, read to the end) of either the index or the table to find your data matching the other 3 columns. This means it has to read all 25 million rows to find the answer. Even if it's reading just the rows of the index rather than the table, that's a lot of work.

Imagine a phone book of the greater New York metropolitan area, organized by (with an 'index' on) Last Name, First Name.

You submit SELECT [Last Name] FROM NewYorkPhoneBook WHERE [First Name] = 'Thelma'

It has to read all 25 million entries to find all those Thelmas. Unless you either specify the last name and can then turn directly to the page where that last name first appers (a seek), or have an index organized by First Name (a seek on the index followed by a seek on the table, aka a "bookmark lookup"), there's no way around it.

The index you would create to make your query faster is on uid, area, type. You could include sid, though leave it out if sid is part of the primary key.

Note: Tables often do have multiple indexes. Just note that the more indexes, the slower the write performance. Unnecessary indexes can slow overall performance, sometimes radically so. Testing and eventually experience will help guide you in this. Also, reasoning it out as a real-world problem (like my phone book examples) can really help. If it wouldn't make sense with phone books (and separate phone book indexes) then it probably won't make sense in the database.

One more thing: even if you put an index on those columns, if your query is going to end up pulling a great percentage of the rows in the main table, it will still be cheaper to scan the table rather than do the bookmark lookup (seek the index then seek the table for each row found). The exact "tipping point" of whether to do a bookmark lookup with a seek, or to do a table scan isn't something I can tell you off the top of my head, but it is based on solid math.


  • The index is not really usefull as it does start with the wrong field... which means a table scan.

  • Looks like you have a normal computer there, not something made for databases. I run table scans over 650 million rows in about a minute on my lower end db server, but that means reading about a gigabyte per second from the discs, which are a RAID of 10k RM discs - RAID 10. Just to say that basically... that databases love IO, and that in a degree that you have never seen before. Basically larger db servers have many discs to satisfy the IOPS (IO per second) requirement. I have seen a server with 190 discs.

So, you ahve two choices: beed up your IOPS capability (means spending money), or set up indices that get used because they are "proper".

Proper means: an index only is usefull if the fields it contains are used from left to right. Not necessarily in the same order... but if a field is missed there is a chance the SQL System decides it is not worth pursuing the index and instead goes table scan (as in your case).


When you create your new index on uid, area and type, you should also do a select distinct on each one to determine which has the fewest distinct entries, then create your index such that the fewer the differences the earlier they show up in the index definition.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号