Any way to speed up this query?_问答_开发者_运维开发者技术经验分享

This query is really slow. Takes between 9 and 10 seconds...

SELECT DISTINCT a.*
FROM addresses a
LEFT JOIN contacts c
ON c.id = a.contact_id
LEFT JOIN organizations o
ON o.id = a.organization_id
ORDER BY c.last_name, c.first_name, o.name
LIMIT 0, 24

If I comment out the ORDER BY clause the query runs much faster -- about 5 milliseconds. But I need the ORDER BY to support paging of the search results. And the users need the addresses to be sorted by contact and organization.

Table structure

addresses
---------
id int NOT NULL
contact_id int       # could be NULL
organization_id int  # could be NULL

contacts
--------
id int NOT NULL
first_name varchar(255)
last_name varchar(255)

organizations
-------------
id int NOT NULL
name varchar(255)

They're all InnoDB tables.

I have these indexes on the contacts table:

  KEY `idx_contacts_first_name` (`first_name`),
  KEY `idx_contacts_last_name` (`last_name`),
  KEY `idx_contacts_first_name_last_name` (`first_name`,`last_name`)

And on the organizations table:

  KEY `idx_organization_name` (`name`)

Amount of data

Addresses:     22,271
Contacts:      17,906
Organizations:  8,246

DESCRIBE output

mysql> DESCRIBE
    -> SELECT DISTINCT a.*
    -> FROM addresses a
    -> LEFT JOIN contacts c
    -> ON c.id = a.contact_id
    -> LEFT JOIN organizations o
    -> ON o.id = a.organization_id
    -> ORDER BY c.last_name, c.first_name, o.name
    -> LIMIT 0, 24;
+----+-------------+-------+--------+---------------+---------+---------+--------------------------------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                                        | rows  | Extra                           |
+----+-------------+-------+--------+---------------+---------+---------+--------------------------------------------+-------+---------------------------------+
|  1 | SIMPLE      | a     | ALL    | NULL          | NULL    | NULL    | NULL                                       | 22387 | 开发者_StackOverflowUsing temporary; Using filesort | 
|  1 | SIMPLE      | c     | eq_ref | PRIMARY       | PRIMARY | 4       | contactdb_v2_development.a.contact_id      |     1 | Distinct                        | 
|  1 | SIMPLE      | o     | eq_ref | PRIMARY       | PRIMARY | 4       | contactdb_v2_development.a.organization_id |     1 | Distinct                        | 
+----+-------------+-------+--------+---------------+---------+---------+--------------------------------------------+-------+---------------------------------+
3 rows in set (0.00 sec)

I tried your example, with similar amounts of data, and on my lowly laptop (Pentium M 1,7 GHz) the query takes less than a second (on first run, later runs even less).

Did you just by chance forget the PK on the id column? You don't mention it, so just asking... if you forget that, performance will obviously be horrible - not to mention that every DBA will cringe at tables without a PK.

Otherwise, try this:

DESCRIBE <your query>

This will give you MySQL's query plan. Post that (edit your question), and it should be clearer what's taking so long.

On further thought:

The query will always have problematic performance, because you are asking the database to read and sort all addresses and display them. The ORDER BY means it has to read everything before giving anything back, so it'll always be slow. What is even the point of diplaying the entire database like this? Will users page through several thousand records?

Consider e.g. allowing a search query. With a WHERE condition the query will be much faster.

If you're not too resource-constrained on the server side and this thing isn't going to scale up too far, you don't have a lot of data so you could simply do your ordering and paging at that level.

Try adding this index:

idx_contacts_last_name_first_name (last_name,first_name)

BTW: you can delete idx_contacts_first_name since it is duplicative and if you add this index you can delete idx_contacts_last_name.

Try changing your SQL to something like the following:

SELECT a.column1, a.column2, ...
FROM addresses a
LEFT JOIN contacts c
ON c.id = a.contact_id
LEFT JOIN organizations o
ON o.id = a.organization_id
GROUP BY a.column1, a.column2, ...
ORDER BY c.last_name, c.first_name, o.name
LIMIT 0, 24

I've found GROUP BY to be much faster than DISTINCT in general, though I don't why that would be.

Let's see.

Addresses: 22,271
Contacts: 17,906
Organizations: 8,246

addresses a LEFT JOIN contacts c gives about 20,000 * 20,000 ~ 400 million comparisons for about 20,000 results

LEFT JOIN organizations gives about 10,000 * 20,000 ~ 200 million comparisons for about 20,000 results

which we sort mainly on contact rows, then discard all but 24 of them. Seems the distinctness of the addresses is of minimal importance.

Since we're mostly sorting by contacts, how about we do a subselect on contacts, keeping somewhat more (say, by a factor of about 4) than we need:

SELECT * FROM contacts ORDER BY last_name, first_name LIMIT 100

Then Join those to their addresses keeping the top hundred or so

     SELECT a.* 
       FROM (SELECT * FROM contacts ORDER BY last_name, first_name LIMIT 0, 100) AS c
  LEFT JOIN addresses a
         ON c.id = a.contact_id
      LIMIT 0, 100

Then join these to the organizations

 SELECT * 
   FROM (
        SELECT * 
          FROM (SELECT * FROM contacts ORDER BY last_name, first_name LIMIT 0, 100) AS c
     LEFT JOIN addresses a
            ON c.id = a.contact_id
         LIMIT 0, 100
         ) AS ca LEFT JOIN organizations o
      ON o.id = ca.organization_id
ORDER BY ca.last_name, ca.first_name, o.name
   LIMIT 0, 24

I'm sure the syntax is screwed up, but I'm equally sure the principle of cutting down the results set at each stage points an instructive way. I have probably made a couple of trade offs too, such that the result closely approximates the 10 seconds answer, but gets there much quicker.