SELECT projectID, urlID, COUNT(1) AS totalClicks, projectPage,
(SELECT COUNT(1)
FROM tblStatSessionRoutes, tblStatSessions
WHERE tblStatSessionRoutes.statSessionID = tblStatSessions.ID
AND tblStatSessions.projectID = tblAdClicks.projectID
AND (tblStatSessionRoutes.leftPageID = tblAdClicks.projectPage OR
tblStatSessionRoutes.rightPageID = tblAdClicks.projectPage)) AS totalViews
FROM tblAdClicks
WHERE projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)
GROUP BY projectID, urlID, projectPage
ORDER BY CASE projectID
WHEN 170 THEN
1
ELSE
0
END, projectID
This is by no means an especially complex query, but because the database is normalised to a good level, and we are dealing with a significant amount of data, this query can be quite slow for the user.
Does anyone have tips on how to improve the speed of it? If I strategically denormalise parts of the database would this help? Will running it in a stored proc offer significant improvements?
The way I handle the data is efficient in my code, the bottleneck really is with this quer开发者_运维问答y.
Thanks!
De-normalising your database should be a last resort since (to choose just one reason) you don't want to encourage data inconsistencies which de-normalisation will allow.
First thing is to see if you can get some clues from the query execution plan. It could be, for example, that your sub-selects are costing too much, and would be better done first into temp tables which you then JOIN in your main query.
Also, if you see lots of table-scans, you could benefit from improved indexes.
If you haven't already, you should spend a few minutes re-formatting your query for readability. It's amazing how often the obvious optimisation will jump out at you while doing this.
I would try to break up that
projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)
and use a JOIN instead:
SELECT
projectID, urlID, COUNT(1) AS totalClicks, projectPage,
(SELECT COUNT(1) ....) AS totalViews
FROM
dbo.tblAdClicks a
INNER JOIN
dbo.tblProjects p ON a.ProjectID = p.ProjectID
WHERE
p.UserID = 5
GROUP BY
a.projectID, a.urlID, a.projectPage
ORDER BY
CASE a.projectID
WHEN 170 THEN 1
ELSE 0
END, a.projectID
Not sure just how much this will help - should help a bit, I hope!
Other than that, I would check if you have indices on the relevant columns, e.g. on a.ProjectID
(to help with the JOIN), and maybe on a.urlID
and a.ProjectPage
(to help with the GROUP BY
)
If your dbms has a tool that explains its query plan, use that first. (Your first correlated subquery might be running once per row.) Than make sure every column referenced in a WHERE clause has an index.
This subquery--WHERE projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)--can surely benefit from being cut and implemented as a view. Then join to the view.
It's not unusual to treat clickstream data as a data warehouse application. If you need to go that route, I'd usually implement a separate data warehouse rather than denormalize a well-designed OLTP database.
I doubt that running it as a stored proc will help you.
I would try to remove the correlated subquery (the inner (SELECT COUNT(1) ...)
). Having to join against your session routes where either the left page or the right page matches makes things a bit tricky. Something along the lines of (but I haven't tested this):
SELECT tblAdClicks.projectID, tblAdClicks.urlID, COUNT(1) AS totalClicks, tblAdClicks.projectPage,
SUM(CASE WHEN leftRoute.statSessionID IS NOT NULL OR rightRoute.statSessionID IS NOT NULL THEN 1 ELSE 0 END) AS totalViews
FROM tblAdClicks
JOIN tblProjects ON tblProjects.projectID = tblAdClicks.projectID
LEFT JOIN tblStatSessions ON tblStatSessions.projectID = tblAdClicks.projectID
LEFT JOIN tblStatSessionRoutes leftRoute ON leftRoute.statSessionID = tblStatSessions.ID AND leftRoute.leftPageID = tblAdClicks.projectPage
LEFT JOIN tblStatSessionRoutes rightRoute ON rightRoute.statSessionID = tblStatSessions.ID AND rightRoute.rightPageID = tblAdClicks.projectPage
WHERE tblProjects.userID = 5
GROUP BY tblAdClicks.projectID, tblAdClicks.urlID, tblAdClicks.projectPage
ORDER BY CASE tblAdClicks.projectID WHEN 170 THEN 1 ELSE 0 END, tblAdClicks.projectID
If I were to add some cache tables to help this, as I indicated I'd try to reduce the two queries against tblStatSessionRoutes for both left and right page to a single query. If you know that leftPageID will never be equal to rightPageID, it should be possible to simply use a trigger to populate an additional table with the left and right views in separate rows, for example.
精彩评论