I have a table of about 360,000 records and here's performing a query on two indexed fields:
SELECT COUNT(*)
FROM emails
WHERE
department_id IN(1,2,3,4)
AND category_id IN (5,6,7,8)
(Time: 0.9624802)
id: 1
select_type: SIMPLE
table: emails
type: range
possible_keys: emails_department_id_idx,emails_category_id_idx
key: emails_category_id_idx
key_len: 5
ref: NULL
rows: 54018
Extra: Using where
So only one index is being used there. (I can get an index merge to work when using simpler comparisons or range criteria, but I need to do checks like this against a list of IDs).
Here I created two new tables to map this relationship, and using JOIN's I replicated the same results:
SELECT COUNT(*)
FROM emails
LEFT JOIN email_to_department ON (email_to开发者_如何转开发_department.email_id = emails.id AND email_to_department.department_id IN (1,2,3,4))
LEFT JOIN email_to_category ON (email_to_category.email_id = emails.id AND email_to_category.category_id IN (5,6,7,8))
WHERE
email_to_department.department_id IS NOT NULL
AND email_to_category.category_id IS NOT NULL
(Time: 0.5217777)
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: email_to_category
type: range
possible_keys: PRIMARY,category_id
key: category_id
key_len: 4
ref: NULL
rows: 61282
Extra: Using where; Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: email_to_department
type: ref
possible_keys: PRIMARY,department_id
key: PRIMARY
key_len: 4
ref: testdb.email_to_category.email_id
rows: 1
Extra: Using where; Using index
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: emails
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: testdb.email_to_category.email_id
rows: 1
Extra: Using index
3 rows in set (0.38 sec)
So each query now uses an index, and trims almost half the time off. Is this bad design? Should I write the rest of these relationships in the same way?
If I add more criteria in the same way, the JOIN'd version of the query seems to get faster and the other remains more or less the same.
Doing a simple query on just a single indexed field is very very fast of course:
SELECT COUNT(*)
FROM emails
WHERE department_id IN(1,2,3,4)
Is there another strategy I might use to make these kinds of queries faster still? There are other properties that need to be filtered on as well, and in different combinations, so creating multi-column indexes won't really help.
This is really a question about database normalization. You can look for information on that topic in many places.
Basic answer Since there is a huge literature about this, and there are a lot of differences, I will just point out that it boils down to trade-offs; Speed versus storage requirements, or ease of use versus data duplication. You may want to understand what normalization is in order to understand why you would or would not want to do it.
Further reading This is a deep topic, you may want to learn more - there are hundreds of books and thousands of scholarly papers on these issues. For instance, look at this previous SO question about designing a database: Database design: one huge table or separate tables?, or this one: First-time database design: am I overengineering? or Database Normalization Basics on About.com.
I also agree with Jaitsu about creating a composite key on the two fields. As a rule, you should always put the column with the higher cardinality first in the composite index. That way you can have a more selective index.
精彩评论