I have been assigned the task of creating some graphical stats for a website, out of some saved data.
Facts: - there are 3 databases in use. dbCurrent, dbStats, dbBackup. dbCurrent is the main database of the website dbStats hold various tables of statistics and tracking data dbBackup holds the last five years stats/tracking tables.
- the data i will use come from two databases ( dbStats, dbBackup )
- the table names are: stats2006, stats2007, stats2008, etc, except the current stats which is just "stats". Each table has data for its year.
- the tabl开发者_开发知识库e structure for each year of data is the same: primaryID field is integer productID field is integer dateMonitor field is integer (unixtimestamp) pageName field is varchar (20)
- productID, dateMonitor, pageName fields have also indexes
in other words, what product was viewed on what date and from what page.
So, what i thought is create a loop out of each table and get my data. Each query of mine looks like:
Select COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions
where $tablename and $conditions are variables based on each loop. All conditions are similar to:
- dateMonitor between date1 and date2
- pageName='some val'
- productID IN ($comma_separated_values)
- combination of the above
All of these are working decently so far (for a single product).
When i try to create a report to compare 'x' products in 'y' years (chosen dynamically from admin/moderator) the script runs for more than 15 minutes.
I am looking for a way to improve the performance of the script. Logic/structure i use so far, follows:
Loop through products to find the ids to use (typical format is: x,y,z (comma separated values)
Open Loop through years/months
Execute one sql query for each affected table/database to get the number of affected rows.
Close year loop
Send data to graph script (jquery jqPlot to be exact) to print on screen
Any help/idea appreciated.
EDIT: Based on @Narf suggestion's with UNION ALL, i constructed 1 single query based on 12 sub-select statements:
SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='1167606001' AND dateMonitor<='1170284399' AND dateMonitor='test'
UNION ALL
SELECT COUNT(*) AS monthlyTotal FROM db1.table2 WHERE dateMonitor>='1170284401' AND dateMonitor<='1172703599' AND dateMonitor='test' ...
Each select statement refers to a single month duration. Demo code:
for ($m=1; $m<=12; $m++)
{
$startDate = mktime(0, 0, 1, $m, 1, $myYear);
$daysOfMonth = date("t", mktime(10, 10, 10, $m, 10, $myYear));
$endDate = mktime(23, 59, 59, $m, $daysOfMonth, $myYear);
$query_chk1 .= "SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='$startDate' AND dateMonitor<='$endDate' AND pageName='test' UNION ALL ";
}
$query_chk1 = substr($query_chk1, 0, -10);
EDIT2: after creating combined indexes (as suggested by @ypercube), I see some slight decreased time in execution time.
Now i have an average execution time of 11 min (original time was 15-17 min)
This helped a lot to decrease execution time.
Thank you.
There's not much that you can do, at least since you've indexed all your columns ... here's the best that I can come up with:
SELECT COUNT(*)
FROM `stats`
WHERE `productID IN(1,2,3)
AND `dateMonitor` >= <unixtime from>
AND `dateMonitor` <= <unixtime to>
AND `pageName`='<value>'
... and how:
- As ypercube has commented - using
COUNT(*)
is faster. - I don't know this for sure, but I believe that using
>=
and<=
instead ofBETWEEN
for integers should be faster.
Another thing that you should try is executing all the queries (if more than one) at once. It would be harder for me to explain it correctly in words, and I see that you have a good grasp of SQL, so you should be able to get the logic, so here's an example:
Let's say that we need to search for products with ids of 123, 13, 5 and 6 from May 2006 through April 2008, and pageName 'test':
We calculate the timestamps prior to generating the query and determine exactly which tables we need to search in.
SELECT COUNT(*) AS
myCounter
FROMstats2006
WHEREproductID
IN(5,6,13,123) ANDdateMonitor
>= 1146430800 ANDpageName
='test'/* Here we only need to check the timestamp against May 1st 2006, 00:00:00 */
UNION ALL
SELECT COUNT(*) AS
myCounter
FROMstats2007
WHEREproductID
IN(5,6,13,123) ANDpageName
='test'/* Here we don't need to check the
dateMonitor
field because the whole year matches our period */UNION ALL
SELECT COUNT(*) AS
myCounter
FROMstats2008
WHEREproductID
IN(5,6,13,123) ANDdateMonitor
<= 1209589199 ANDpageName
='test'/* Here we only need to check the timestamp against April 30th 2008, 23:59:59 */
When you so compare 'x' products in 'y' years why don't you use GROUP BY? Eg:
Select productID, COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions GROUP BY productID
this will cut amount of quires and should speed up the process.
精彩评论