开发者

How to optimize complex query?

开发者 https://www.devze.com 2023-01-26 07:44 出处:网络
I am developing a marketing-type system.On the front page, one of the requirements is for sales staff to see the number of sales opportunities they currently have.

I am developing a marketing-type system. On the front page, one of the requirements is for sales staff to see the number of sales opportunities they currently have.

ie.

Birthdays     | 10
Anniversaries | 15
Introductions | 450
Recurring     | 249

The problem is I am UNIONing all these and the query is taking over 10s in some cases. (We have caching in place so this is only a problem the first time a user logs in for the day).

There is a lot of other criteria involved:

  • included in the count should only be the latest one per customer per type (ie. if a customer has two introductions, it should only be counted once - I am using the greatest-n-per-group method of accomplishing this)
  • for Birthdays and Anniversaries, the date should be +/- 7 days from today
  • for all of them, only records in the past 60 days should be counted
  • these records need to be joined with the customers table to make sure that the opportunity's sales person matches the customer's current sales person

Here is the generated query (It is long):

SELECT 'Birthdays' AS `type`, COUNT(*) AS `num` 
FROM `opportunities` 
INNER JOIN `customers` 
    ON `opportunities`.`customer_id` = `customers`.`customer_id` 
    AND `opportunities`.`sales_person_id` = `customers`.`sales_person_id` 
LEFT JOIN `opportunities` AS `o2` 
    ON `opportunities`.`customer_id` = `o2`.`customer_id` 
    AND `opportunities`.`marketing_message` = `o2`.`marketing_message` 
    AND opportunities.communication_alert_date < o2.communication_alert_date 
WHERE ((`opportunities`.`org_code` = ?)) 
AND (opportunities.marketing_message = 'Birthday Alert') 
AND ((opportunities.communication_alert_date BETWEEN 
    DATE_SUB(NOW(), INTERVAL 7 DAY) AND DATE_ADD(NOW(), INTERVAL 7 DAY))) 
AND (opportunities.communication_alert_date >= DATE_SUB(NOW(), INTERVAL 60 DAY)) 
AND (o2.customer_id IS NULL) 

UNION ALL 

SELECT 'Anniversaries' AS `type`, COUNT(*) AS `num` 
FROM `opportunities` 
INNER JOIN `customers` 
    ON `opportunities`.`customer_id` = `customers`.`customer_id` 
    AND `opportunities`.`sales_person_id` = `customers`.`sales_person_id` 
LEFT JOIN `opportunities` AS `o2` 
    ON `opportunities`.`customer_id` = `o2`.`customer_id` 
    AND `opportunities`.`marketing_message` = `o2`.`marketing_message` 
    AND opportunities.communication_alert_date < o2.communication_alert_date 
WHERE ((`opportunities`.`org_code` = ?)) 
AND (opportunities.marketing_message = 'Anniversary Alert') 
AND ((opportunities.communication_alert_date BETWEEN 
    DATE_SUB(NOW(), INTERVAL 7 DAY) AND DATE_ADD(NOW(), INTERVAL 7 DAY))) 
AND (opportunities.communication_alert_date >= DATE_SUB(NOW(), INTERVAL 60 DAY)) 
AND (o2.customer_id IS NULL) 

UNION ALL 

SELECT 'Introductions' AS `type`, COUNT(*) AS `num` 
FROM `opportunities` 
INNER JOIN `customers` 
    ON `opportunities`.`customer_id` = `customers`.`customer_id` 
    AND `opportunities`.`sales_person_id` = `customers`.`sales_person_id` 
LEFT JOIN `opportunities` AS `o2` 
    ON `opportunities`.`customer_id` = `o2`.`customer_id` 
    AND `opportunities`.`marketing_message` = `o2`.`marketing_message` 
    AND opportunities.communication_alert_date < o2.communication_alert_date 
WHERE ((`opportunities`.`org_code` = ?)) 
AND ((opportunities.Intro_Letter = 'Yes')) 
AND (opportunities.communication_alert_date >= DATE_SUB(NOW(), INTERVAL 60 DAY)) 
AND (o2.customer_id IS NULL) 

UNION ALL 

SELECT 'Recurring' AS `type`, COUNT(*) AS `num` 
FROM `opportunities` 
INNER JOIN `customers` 
    ON `opportunities`.`customer_id` = `customers`.`customer_id` 
    AND `opportunities`.`sales_person_id` = `customers`.`sales_person_id` 
LEFT JOIN `opportunities` AS `o2` 
    ON `opportunities`.`customer_id` = `o2`.`customer_id` 
    AND `opportunities`.`marketing_message` = `o2`.`marketing_message` 
    AND opportunities.communication_alert_date < o2.communication_alert_date 
WHERE ((`opportunities`.`org_code` = ?)) 
AND ((opportunities.marketing_message != 'Anniv开发者_StackOverflowersary Alert' 
AND opportunities.marketing_message != 'Birthday Alert' 
AND opportunities.Intro_Letter != 'Yes')) 
AND (opportunities.communication_alert_date >= DATE_SUB(NOW(), INTERVAL 60 DAY)) 
AND (o2.customer_id IS NULL)

I have indexes on the following fields in the opportunities table:

  • org_code
  • customer_id
  • Intro_Letter
  • marketing_message
  • sales_person_id
  • org_code, marketing_message
  • org_code, Intro_Letter
  • org_code, marketing_message, Intro_Letter

Any help optimizing this would be greatly appreciated. I am open to creating other tables or views if need be.


A good place to start would be removing the string comparisons and putting them in a table with assigned IDs and adding numerical columns in the place of

opportunities.marketing_message != 'Birthday Alert'

So you'd have...

[id]    [name]
1       Birthday Alert
2       Anniversary

Numerical comparisons are always much quicker even with indexing. Doing this would also allow you to easily add new types in the future.

This part is redundant, you don't need AND (opportunities.communication_alert_date >= DATE_SUB(NOW(), INTERVAL 60 DAY)) because the clause right before it will do the job.

AND ((opportunities.communication_alert_date BETWEEN 
    DATE_SUB(NOW(), INTERVAL 7 DAY) AND DATE_ADD(NOW(), INTERVAL 7 DAY))) 
AND (opportunities.communication_alert_date >= DATE_SUB(NOW(), INTERVAL 60 DAY))


I agree with existing comments that the alert text needs to be in a type table, with a foreign key relationship to the OPPORTUNITIES table.

Leave it to Zend to two queries when you only need one:

   SELECT CASE
            WHEN marketing_message = 'Birthday Alert' THEN 'Birthdays'
            WHEN marketing_message = 'Anniversary Alert' THEN 'Anniversaries'
          END AS msg,
          COUNT(*)
     FROM OPPORTUNITIES o
     JOIN CUSTOMERS c ON c.customer_id = o.customer_id
                 AND c.sales_person_id = o.sales_person_id
LEFT JOIN OPPORTUNITIES o2 ON o2.customer_id = o.customer_id
                      AND o2.marketing_message = o.marketing_message
                      AND o2.communication_alert_date < o.communication_alert_date
    WHERE o.org_code ?
      AND o.marketing_message IN ('Birthday Alert', 'Anniversary Alert') 
      AND o.communication_alert_date BETWEEN DATE_SUB(NOW(), INTERVAL 7 DAY) 
                                         AND DATE_ADD(NOW(), INTERVAL 7 DAY)
      AND o.communication_alert_date >= DATE_SUB(NOW(), INTERVAL 60 DAY)
      AND o2.customer_id IS NULL
 GROUP BY msg


You could make it easier to read by getting rid of all of the grouping parenthesis in the where clause. That would at least make it easier to see whats going on and to optimize


In each subquery you have:

LEFT JOIN `opportunities` AS `o2` 
    ON `opportunities`.`customer_id` = `o2`.`customer_id` 
...
AND (o2.customer_id IS NULL)

That means you want only opportunities o2 that have NULL for customer_id. Because of that these queries can be written with 2 INNER joins instead of 1 OUTER and 1 INNER join what is probably faster. Something like this:

SELECT `o1`.`Birthdays` AS `type`, COUNT(*) AS `num` 
FROM `opportunities` as `o2`
INNER JOIN `opportunities` AS `o1` 
    ON `o1`.`marketing_message` = `o2`.`marketing_message` 
    AND o1.communication_alert_date < o2.communication_alert_date 
INNER JOIN `customers` 
    ON `o1`.`customer_id` = `customers`.`customer_id` 
    AND `o1`.`sales_person_id` = `customers`.`sales_person_id` 
WHERE (o2.customer_id IS NULL)
AND (o2.marketing_message = 'Birthday Alert') 
AND ((`o1`.`org_code` = ?)) 
AND ((o1.communication_alert_date BETWEEN 
    DATE_SUB(NOW(), INTERVAL 7 DAY) AND DATE_ADD(NOW(), INTERVAL 7 DAY))) 
AND (o1.communication_alert_date >= DATE_SUB(NOW(), INTERVAL 60 DAY)) 


In addition to the answers provided, I replaced the LEFT JOIN with a subquery to return only the most recent instances by type. This seemed to help immensely.

ie (for just the Birthday and Anniversary count):

SELECT 
    CASE
        WHEN marketing_message = 'Birthday Alert' THEN 'Birthdays'
        WHEN marketing_message = 'Anniversary Alert' THEN 'Anniversaries'
    END AS `type`, 
    COUNT(*) AS `num` 
FROM (
    SELECT `opp_sub`.* 
    FROM (
        SELECT `opportunities`.`marketing_message`, `opportunities`.`customer_id`
        FROM `opportunities`
        INNER JOIN `customers` 
            ON `opportunities`.`customer_id` = `customers`.`customer_id` 
            AND `opportunities`.`sales_person_id` = `customers`.`sales_person_id` 
        WHERE (opportunities.communication_alert_date >= DATE_SUB(NOW(), INTERVAL 60 DAY)) 
        AND (`opportunities`.`dealer_code` = ?)
        AND (opportunities.marketing_message IN ('Anniversary Alert', 'Birthday Alert')) 
        AND (opportunities.communication_alert_date 
            BETWEEN DATE_SUB(NOW(), INTERVAL 7 DAY) 
                AND DATE_ADD(NOW(), INTERVAL 7 DAY))
        ORDER BY `opportunities`.`communication_alert_date` DESC
    ) AS `wool_sub` 
    GROUP BY `customer_id`, `marketing_message`
) AS `c_table` 


If you look at this http://dev.mysql.com/doc/refman/5.0/en/using-explain.html You will se that checking your query with the EXPLAIN keyword gives you information about how the query executes. Then you can see exactly where the performance is poor.

0

精彩评论

暂无评论...
验证码 换一张
取 消