开发者

How to optimize indexes for a busy table?

开发者 https://www.devze.com 2022-12-15 18:49 出处:网络
i have this table of comments (little over 1 milion rows) that gets around 10.000 inserts and around 100.000 queries against it every day, and minor deletions and updates. the query that gets the comm

i have this table of comments (little over 1 milion rows) that gets around 10.000 inserts and around 100.000 queries against it every day, and minor deletions and updates. the query that gets the comments causes performance issues that sometimes it locks up entire database and i am getting a lot of timeouts. please help me adjust my indexes and anything else so that it performs better. below i included the info about it, if you need more please ask. i rebuild all indexes daily and run a sql server 2008 web edition on a 2008 server.

thank you :)

structure:

id (int, identity)
profile_id (int)
owner_id (int)
added_date (datetime)
comments varchar(4000)
logical_delete (datetime)

indexes:

id (PK, clustered)
profile_id (70% fill)
owner_id (70% fill)
added_date (70% fill)
profile_id + logical_delete (70%)

query:

    select 
        c.id, c.owner_id, c.comments, c.is_public, c.added_date, 
        u.first_name, u.last_name, c.profile_id
    from [profile_comment] c with(nolock) 
    inner join [user] u with(nolock) on u.id = c.owner_id 
    where c.profile_id = @profile_id and c.logical_delete is null
    order by c.added_date desc 

execution plan:

  |--Nested Loops(Inner Join, OUTER REFERENCES:([c].[owner_id], [Expr1005]) WITH ORDERED PREFETCH)
       |--Sort(ORDER BY:([c].[added_date] DESC)) **[5%]**
       |    |--Nested Loops(Inner Join, OUTER REFERENCES:([c].[id], [Expr1004]) WITH UNORDERED PREFETCH) **[0%]** 
       |         |--Index Seek(OBJECT:([DB].[dbo].[profile_comment].[IX_profile_comment_combined1] AS [c]), SEEK:([c].[profile_id]=(1) AND [c].[logical_delete]=NULL) ORDERED FORWARD) **[1%]**
       |         |--Clustered Index Seek(OBJECT:([JakLeci].[dbo].[profile_comment].[PK__profile_comment__primary] AS [c]), SEEK:([c].[id开发者_Python百科]=[JakLeci].[dbo].[profile_comment].[id] as [c].[id]) LOOKUP ORDERED FORWARD) **[47%]**
       |--Clustered Index Seek(OBJECT:([DB].[dbo].[user].[PK__user__id] AS [u]), SEEK:([u].[id]=[DB].[dbo].[profile_comment].[owner_id] as [c].[owner_id]) ORDERED FORWARD)  **[47%]**


A clustered index on (profile_id, added_date DESC) should do the trick. That'll give you a quick lookup by profile_id, already sorted by added_date. The only remaining operations would be filtering on logical_delete and a loop join on user (which should be clustered on user_id).

Depending on the number of rows returned though, you could still be reading quite a bit off disk. Your comments column is pretty wide. You may want to consider limiting the number of rows returned by added_date (or a TOP), or caching the results.

I can't imagine this is causing high CPU usage, and you're using NOLOCK so you shouldn't be blocking other queries. If this really is the cause of your timeouts, then it must be I/O. You may want to check out memory usage and the disk subsystem to be sure you're getting decent performance. Check logical reads and CPU time before and after to determine if you're helping.

You could also probably drop some of your indexes to speed inserts. I'm not sure that 70% fill is doing much other than wasting space either, but I might be wrong about that.


Nested Loops(Inner Join, OUTER REFERENCES:([c].[owner_id], [Expr1005]) WITH ORDERED PREFETCH)
       |--Sort(ORDER BY:([c].[added_date] DESC)) **[5%]**
       |    |--Nested Loops(Inner Join, OUTER REFERENCES:([c].[id], [Expr1004]) WITH UNORDERED PREFETCH) **[0%]** 
       |         |--Index Seek(OBJECT:([DB].[dbo].[profile_comment].[IX_profile_comment_combined1] AS [c]), SEEK:([c].[profile_id]=(1) AND [c].[logical_delete]=NULL) ORDERED FORWARD) **[1%]**
       |         |--Clustered Index Seek(OBJECT:([JakLeci].[dbo].[profile_comment].[PK__profile_comment__primary] AS [c]), SEEK:([c].[id]=[JakLeci].[dbo].[profile_comment].[id] as [c].[id]) LOOKUP ORDERED FORWARD) **[47%]**
       |--Clustered Index Seek(OBJECT:([DB].[dbo].[user].[PK__user__id] AS [u]), SEEK:([u].[id]=[DB].[dbo].[profile_comment].[owner_id] as [c].[owner_id]) ORDERED FORWARD)  **[47%]**

Here is how I read this plan: the query starts with a seek for profile_id = @profile_id and logical_deleted is null on the IX_Profile_comment_combined, it then does a nested join loop on the clustered index, it sorts the result by added_date and then it does a nested loop on the user.

One thing that you could quickly eliminate is the SORT, by changing the definition of the IX_profile_combined to be:

CREATE INDEX IX_profile_combined
 ON profile_comment(logical_deleted, profile_id, added_date)

Because logical_deleted is a very low selectivity column, it should be the leftmost key in the index. Because the output for a specific @profile_id has to be order by added_date then added_date has to be right of profile_id in the key order.

Other than the sort, the query plan looks good to me. But I'm rather curious how come a query that supposedly is the biggest hog in the system consumes only 1% on getting all the candidate row, then whoops to 93% time on two clustered index lookups and only 5% on the sort. This doesn't add up as symptoms of a problem query. Is @profile_id a very low selectivity key? Is the plan you posted collected from a non-representative run, one that performed well?


Why don't you run your query in management studio and have it show you the actual execution plan, to try and see where the problem is?

Without that, the only thing that I could suggest is NOT making your primary, identity key the clustered index, unless you regularly query on that. You could try adding an index on profile_id and added_date (or something else to make it unique) and make that the clustered index, since you are querying by profile_id and you are probably rarely querying on an identity field.

I'd need to actually have access to the database to see what is actually going on (doing what I recommended at the top) to figure out where the true culprit lies. It could be on the join, but I highly doubt it, since I'm guessing the id column in the user table is the primary, clustered index as well.


you'll get the best performance with a clustered index over the search columns:

(profile_id, is_deleted, owner_id, added_date desc)

the drawback is that looking up a row by just by id will need to use a slower non-clustered index because you can only have one clustered index.

if that's not an option, the next best thing is to create a multi-column covering index including:

(profile_id, is_deleted, owner_id, added_date desc, id, comments, is_public)

because comments is rather large, you can include comments as a non-key column using the INCLUDE syntax:

create index idx on c (profile_id, is_deleted, owner_id, added_date desc, id,
  is_public) include (comments)

comments won't be indexed, but you'll be able to retrieve it quickly when the other keys are matched.

more details here:

http://www.sql-server-performance.com/tips/covering_indexes_p1.aspx

and of course u.id should be the primary key in users.

best to make is_deleted non-null (tinyint defaulted to 0).

with NOLOCK / READUNCOMMITTED, the select will not create any locks or block.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号