开发者

MySQL Subquery with User-Defined Variables

开发者 https://www.devze.com 2023-02-12 18:15 出处:网络
I\'m trying to accomplish a query that requires a calculated column using a subquery that passes the date reference via a variable. I\'m not sure if I\'m not \"doing it right\" but essentially the que

I'm trying to accomplish a query that requires a calculated column using a subquery that passes the date reference via a variable. I'm not sure if I'm not "doing it right" but essentially the query never finishes and spins for minutes on end. This is my query:

select @groupdate:=date_format(order_date,'%Y-%m'), count(distinct customer_email) as num_cust,
(
  select count(distinct cev.customer_email) as num_prev
  from _pj_cust_email_view cev
  inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @groupdate) and (cev.customer_email=prev_purch.customer_email)
  where cev.order_date > @groupdate
) as prev_cust_count
from _pj_cust_email_view
group by @groupdate;

Subquery has an inner join accomplishes the self-join that only gives me the count of people that have purchased prior to the date in @groupdate. The EXPLAIN is below:

+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
| id | select_type          | table               | type | possible_keys | key       | key_len | ref                       | rows   | Extra                           |
+----+-开发者_开发知识库---------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
|  1 | PRIMARY              | _pj_cust_email_view | ALL  | NULL          | NULL      | NULL    | NULL                      | 140147 | Using temporary; Using filesort |
|  2 | UNCACHEABLE SUBQUERY | cev                 | ALL  | IDX_EMAIL     | NULL      | NULL    | NULL                      | 140147 | Using where                     |
|  2 | UNCACHEABLE SUBQUERY | prev_purch          | ref  | IDX_EMAIL     | IDX_EMAIL | 768     | cart_A.cev.customer_email |      1 | Using where                     |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+

And the structure of the table _pj_cust_email_view is as such:

'_pj_cust_email_view', 'CREATE TABLE `_pj_cust_email_view` (
  `order_date` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
  `customer_email` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  KEY `IDX_EMAIL` (`customer_email`),
  KEY `IDX_ORDERDATE` (`order_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'

Again, as I said earlier, I'm not really sure that this is the best way to accomplish this. Any criticism, direction is appreciated!

Update

I've made a little progress, and I'm now doing the above procedurally by iterating through all known months instead of months in the database and setting the vars ahead of time. I don't like this still. This is what I've got now:

Sets the user defined vars

set @startdate:='2010-08', @enddate:='2010-09';

Gets total distinct emails in the given range

select count(distinct customer_email) as num_cust
from _pj_cust_email_view
where order_date between @startdate and @enddate;

Gets the total count of customers who had purchased prior to the given range

select count(distinct cev.customer_email) as num_prev
  from _pj_cust_email_view cev
  inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @startdate) and (cev.customer_email=prev_purch.customer_email)
  where cev.order_date between @startdate and @enddate;

Where @startdate is set to the start of the month and @enddate signifies the end of that month's range.

I really feel like this still can be done in one full query.


I don't think you need to use subqueries at all, nor do you need to iterate over months.

Instead, I recommend you create a table to store all months. Even if you prepopulate it with 100 years of months, it would only have 1200 rows in it, which is trivial.

CREATE TABLE Months (
    start_date DATE, 
    end_date DATE, 
    PRIMARY KEY (start_date, end_date)
);
INSERT INTO Months (start_date, end_date) 
VALUES ('2011-03-01', '2011-03-31');

Store the actual start and end dates, so you can use the DATE data type and index the two columns properly.

edit: I think I understand your requirement a bit better, and I've cleaned up this answer. The following query may be right for you:

SELECT DATE_FORMAT(m.start_date, '%Y-%m') AS month,
  COUNT(DISTINCT cev.customer_email) AS current,
  GROUP_CONCAT(DISTINCT cev.customer_email) AS current_email,
  COUNT(DISTINCT prev.customer_email) AS earlier,
  GROUP_CONCAT(DISTINCT prev.customer_email) AS earlier_email
FROM Months AS m 
LEFT OUTER JOIN _pj_cust_email_view AS cev
  ON cev.order_date BETWEEN m.start_date AND m.end_date
INNER JOIN Months AS mprev
  ON mprev.start_date <= m.start_date
LEFT OUTER JOIN _pj_cust_email_view AS prev
  ON prev.order_date BETWEEN mprev.start_date AND mprev.end_date
GROUP BY month;

If you create the following compound index in your table:

CREATE INDEX order_email on _pj_cust_email_view (order_date, customer_email);

Then the query has the best chance of being an index-only query, and will run a lot faster.

Below is the EXPLAIN optimization report from this query. Note type: index for each table.

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 6
          ref: NULL
         rows: 4
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: mprev
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 6
          ref: NULL
         rows: 4
        Extra: Using where; Using index; Using join buffer
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: cev
         type: index
possible_keys: order_email
          key: order_email
      key_len: 17
          ref: NULL
         rows: 10
        Extra: Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: prev
         type: index
possible_keys: order_email
          key: order_email
      key_len: 17
          ref: NULL
         rows: 10
        Extra: Using index

Here's some test data:

INSERT INTO Months (start_date, end_date) VALUES
('2011-03-01', '2011-03-31'),
('2011-02-01', '2011-02-28'),
('2011-01-01', '2011-01-31'),
('2010-12-01', '2010-12-31');

INSERT INTO _pj_cust_email_view VALUES
('ron', '2011-03-10'),
('hermione', '2011-03-15'),
('hermione', '2011-02-15'),
('hermione', '2011-01-15'),
('hermione', '2010-12-15'),
('neville', '2011-01-10'),
('harry', '2011-03-19'),
('harry', '2011-02-10'),
('molly', '2011-03-25'),
('molly', '2011-01-10');

Here's the result given that data, including the concatenated list of emails to make it easier to see.

+---------+---------+--------------------------+---------+----------------------------------+
| month   | current | current_email            | earlier | earlier_email                    |
+---------+---------+--------------------------+---------+----------------------------------+
| 2010-12 |       1 | hermione                 |       1 | hermione                         | 
| 2011-01 |       3 | neville,hermione,molly   |       3 | hermione,molly,neville           | 
| 2011-02 |       2 | hermione,harry           |       4 | harry,hermione,molly,neville     | 
| 2011-03 |       4 | molly,ron,harry,hermione |       5 | molly,ron,hermione,neville,harry | 
+---------+---------+--------------------------+---------+----------------------------------+


Although Bill has a nice query using multiple tables, this one does it with the SQL variables too, so no extra table. The inner query joins to your _pj_cust_email_view table and does a limit 10 to signify only going back 10 months from the current month. So, no hard-coding of dates, it is computed on the fly... if you want more or less months, just change the LIMIT clause.

By setting the @dt := as the LAST field in the internal query, only THEN does the date basis get assigned for the next record cycle to create your qualifying dates...

select justDates.FirstOfMonth,
       count( distinct EMCurr.customer_Email ) UniqThisMonth,
       count( distinct EMLast.customer_Email ) RepeatCustomers
   from 
      ( SELECT 
                 @dt FirstOfMonth,
                 last_day( @dt ) EndOfMonth,
                 @dt:= date_sub(@dt, interval 1 month) nextCycle
            FROM 
                 (select @dt := date_sub( current_date(), interval dayofmonth( current_date())-1 day )) vars,
                _pj_cust_email_view limit 10 
                ) JustDates
        join _pj_cust_email_view EMCurr
           on EMCurr.order_Date between JustDates.FirstOfMonth and JustDates.EndOfMonth
        left join _pj_cust_email_view EMLast
           on EMLast.order_Date < JustDates.FirstOfMonth
           and EMCurr.customer_Email = EMLast.customer_Email
    group by 
       1
0

精彩评论

暂无评论...
验证码 换一张
取 消