开发者

Is there a quicker way of doing this type of query (finding inactive accounts)?

开发者 https://www.devze.com 2023-04-03 19:33 出处:网络
I have a very large table of wagering transactions. Let\'s say for the sake of the question I want to find the accounts of people who have wagered in the last year but not wagered in the last month,so

I have a very large table of wagering transactions. Let's say for the sake of the question I want to find the accounts of people who have wagered in the last year but not wagered in the last month, so I do something like this...

--query one

select accountnumber into #wageredrecently from activity 
where _date >='2011-08-10' and tran开发者_如何学Pythonsaction_type = 'Bet'
group by accountnumber

--query two

select accountnumber,firstname,lastname,email,sum(handle)
from activity a, customers c
where a.accountnumber = c.accountno
and transaction_type = 'Bet'

and _date >='2010-09-10'
and accountnumber not in (select * from #wageredrecently)
group by accountnumber,firstname,lastname,email

The problem is, this takes ages to get the data. Is there a quicker way to acheive the same in sql?

Edit, just to be specific about the time: It takes just over 3 minutes, which is far too long for a query that is destined for a php intranet page.

Edit (11/09/2011): I've found out that the problem is the customers table. It's actually a view. It previously had good performance but now all of a sudden its performance is terrible, a simple query on it takes almost as long as the above query pair. I have therefore chosen an alternative table of customer data (that actually is a table, and not a view) and now the query pair takes about 15 seconds.


You should try to join customers after you have found and aggregated the rows from activity (I assume that handle is a column in activity).

select c.accountno, 
       c.firstname,
       c.lastname,
       c.email,
       a.sumhandle
from customers as c
  inner join (
                select accountnumber,
                       sum(handle) as sumhandle
                from activity
                where _date >= '2010-09-10' and
                      transaction_type = 'bet' and 
                      accountnumber not in (
                                              select accountnumber 
                                              from activity
                                              where _date >= '2011-08-10' and
                                                    transaction_type = 'bet'
                                           )
                group by accountnumber
             ) as a
    on c.accountno = a.accountnumber              

I also included your first query as a sub-query instead. I'm not sure what that will do for performance. It could be better, it could be worse, you have to test on your data.


I don't know your exact business need, but rarely will someone need access to innactive accounts over several months at a moments notice. Depending on when you pruge data, this may get worse.

You could create an indexed view that contains the last transaction date for each account:

max(_date) as RecentTransaction

If this table gets too large, it could be partioned by year or month of the activity.


Have you considered adding an index on _date to the activity table? It's probably taking so long because it has to do a full table scan on that column when you're comparing the dates. Also, is transaction_type indexed as well? Otherwise, the other index wouldn't do you any good.


Answering my question as the problem wasn't the structure of the query but one of the tables being used. It was a view and its performance was terrible. I change to an actual table with customer data in and reduced the execution time down to about 15 seconds.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号