Using JOIN'ed mapping tables better than multiple fields in the same table?_问答_开发者

I have a table of about 360,000 records and here's performing a query on two indexed fields:

SELECT COUNT(*)
FROM emails
WHERE
department_id IN(1,2,3,4)
AND category_id IN (5,6,7,8)

(Time: 0.9624802)

           id: 1
  select_type: SIMPLE
        table: emails
         type: range
possible_keys: emails_department_id_idx,emails_category_id_idx
          key: emails_category_id_idx
      key_len: 5
          ref: NULL
         rows: 54018
        Extra: Using where

So only one index is being used there. (I can get an index merge to work when using simpler comparisons or range criteria, but I need to do checks like this against a list of IDs).

Here I created two new tables to map this relationship, and using JOIN's I replicated the same results:

SELECT COUNT(*)
FROM emails
LEFT JOIN email_to_department ON (email_to开发者_如何转开发_department.email_id = emails.id AND email_to_department.department_id IN (1,2,3,4))
LEFT JOIN email_to_category ON (email_to_category.email_id = emails.id AND email_to_category.category_id IN (5,6,7,8))
WHERE
email_to_department.department_id IS NOT NULL
AND email_to_category.category_id IS NOT NULL

(Time: 0.5217777)

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: email_to_category
         type: range
possible_keys: PRIMARY,category_id
          key: category_id
      key_len: 4
      ref: NULL
     rows: 61282
    Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: email_to_department
         type: ref
possible_keys: PRIMARY,department_id
          key: PRIMARY
      key_len: 4
          ref: testdb.email_to_category.email_id
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: emails
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: testdb.email_to_category.email_id
         rows: 1
        Extra: Using index
3 rows in set (0.38 sec)

So each query now uses an index, and trims almost half the time off. Is this bad design? Should I write the rest of these relationships in the same way?

If I add more criteria in the same way, the JOIN'd version of the query seems to get faster and the other remains more or less the same.

Doing a simple query on just a single indexed field is very very fast of course:

SELECT COUNT(*)
FROM emails
WHERE department_id IN(1,2,3,4)

Is there another strategy I might use to make these kinds of queries faster still? There are other properties that need to be filtered on as well, and in different combinations, so creating multi-column indexes won't really help.

This is really a question about database normalization. You can look for information on that topic in many places.

Basic answer Since there is a huge literature about this, and there are a lot of differences, I will just point out that it boils down to trade-offs; Speed versus storage requirements, or ease of use versus data duplication. You may want to understand what normalization is in order to understand why you would or would not want to do it.

Further reading This is a deep topic, you may want to learn more - there are hundreds of books and thousands of scholarly papers on these issues. For instance, look at this previous SO question about designing a database: Database design: one huge table or separate tables?, or this one: First-time database design: am I overengineering? or Database Normalization Basics on About.com.

I also agree with Jaitsu about creating a composite key on the two fields. As a rule, you should always put the column with the higher cardinality first in the composite index. That way you can have a more selective index.