I am looking at reworking our website's reporting pages to be faster, and I am on the fence about how I should implement paging. Our database is large, > 150开发者_StackOverflow中文版 million records. Most of our reports require complicated data that comes from up to 5 to 10 tables so each may have 5 or 6 joins and some inner selects. Obviously, they are not fast queries.
To implement paging on the database side, for each web request, I need to query the database for the rows for the current page (say 100 of 10,000), but I also have to query the database again to get the total number of possible rows. As a result, I am essentially running the whole query twice because the query to get the total number of records will still need to do all of the joins and inner selects to determine the total.
Would it not be better to run the query once, return all results, cache it in session, and page it using web code? I known I am initially pulling more data, but I am only running a query that can take 30 - 60 seconds once instead of twice.
This is kind of a technology generic question, but in case it matters, I am using .net 4.0 and Oracle 11g.
From my experience, paging is always faster if left to the database. After all a database is built to query and manipulate massive amounts of data.
If you return large amounts of data in .NET and "cache" it in session you will quickly run out of memory on your server.
you could get the number of rows at the same time as your paged rows with analytics like this:
SELECT [cols], nb_rows
FROM (SELECT [cols], nb_rows, rownum r
FROM (SELECT [cols], count(*) over() nb_rows
FROM [your_query])
WHERE rownum <= :M)
WHERE r >= :N
This would make sure you only run the query once and will be less stressful to your network bandwidth.
For further analysis, see Speed of paged queries in Oracle.
Caching the result of the whole query might make sense if:
- Users regularly move forward/backward through the result set without the need to refresh the data
- Network bandwidth and available memory (application side) are sufficient (most likely this would only be possible if the number of simultaneous users is kept small)
Do it in the db. For Oracle, you might try something like:
select *
from ( select a.*, rownum r
from ( select *
from t
where x = :host_variable
order by y ) a
where rownum <= :HigherBound )
where r >= :LowerBound
where LowerBound and HigherBound define your page bounds (for page 1 showing 10 per page, you'd have lower=1 and higher=10)
The trick here is:
- Make sure your inner select with order by is fairly fast (uses proper indexes).
- Use of rownum lets Oracle do a stopkey, which helps limit rows it needs to process.
As for your situation, if you have a complicated query that takes time to run and returns a large amount of data, you'll definitely want to create a materialized view first, add an index or two to support queries on the snapshot table, and then refresh complete when needed. Your query above will be from the mat view, NOT from the base tables.
精彩评论