开发者

MySQL performance gain by reducing index size?

开发者 https://www.devze.com 2023-02-17 12:59 出处:网络
I have a table with ~1.2m rows in it. It has 6 columns indexed, including one varchar(255) field that contains urls.

I have a table with ~1.2m rows in it. It has 6 columns indexed, including one varchar(255) field that contains urls.

I need to be able to scan the table to see whether a url exis开发者_JS百科ts in the table, hence the index, but I'm wondering whether I would see a performance gain by reducing the index size to around 50?

Of course this would mean that it may have to scan more rows when searching for a url in the database.. but I only have to do this query about once every 30 seconds, so I'm wondering if the smaller index size would be worth it. Thoughts?


Two reasons why lowering maybe better - (Assuming your index is useful)

1) Indexes too get loaded in memory, so there maybe a rare possibility that your index size grows to an extent that it is not completely cacheable in memory. Thats when you will see a performance hit (with all the new hardware specs... hardly a possibility with 1.2M rows, but still worth noting).

2) Manytimes, just the first 'n' characters are good enough to be able to quickly identify each record. You may not need to index the whole 255 characters at all.

Two reason why you may not care -

1) As stated, you may never see your indexes growing to be out of your key buffer, so why worry.

2) You will need to determine the first 'n' characters, and even after that the performance will less than or equal to a full index... never more. Do you really need to spend time on that? Is it worth the possible loss of accuracy?


From my SQL indexing tutorial (covers MySQL as well):

Tip: Always aim to index the original data. That is often the most useful information you can put into an index.

This is a general rule I suggest until there is a very strong reason to do something different.

Space is not the issue, in most cases.

Performance wise, the index tree depth grows logarithmically with the number of index leaf nodes. That means, cutting the index size half is probably not reducing the tree depth at all. Hence, the performance gain might be limited to the improved cache-hit-rate. But you mentioned you execute that query once every 30 seconds. On a moderately loaded machine, that means you index will not be cached at all (except, maybe, you search for the same URL every 30 seconds).

After all: I don't see any reason to act against the general advice mentioned above.

If you really want to save index space, try to find redundant indexes first (e.g., those starting with the same columns). These are typically the low-hanging fruits.


Keep a md5 hash of your url that is fixed 32 length.


I doubt you would see any difference by changing the index to only use the first 50 characters.

Since it's a VARCHAR column, the indexed values will only be as long as each URL anyway, so looking at typical URL's you may only be indexing around 50 characters per URL already.

Even if the URL's are all significantly longer, reducing the index size may just increase the chance that that part of the index is already in memory, but again i doubt you would notice any difference. This might only be useful if it was very high volume and you needed to start micro-optimising for additional performance.


index size only matters on disk space, So you wont be having serious problems by that.

Having or not having an index could be based on your CRUD operations, do you have more selects or more insert/update/deletes ?

0

精彩评论

暂无评论...
验证码 换一张
取 消