开发者

Is it possible to optimize these SQL queries further?

开发者 https://www.devze.com 2023-02-07 14:17 出处:网络
I have a Rails app (running on a Heroku account) which is grabbing a bunch of statistics for the home page regarding the number of records which match certain criteria. Each count is displayed as a nu

I have a Rails app (running on a Heroku account) which is grabbing a bunch of statistics for the home page regarding the number of records which match certain criteria. Each count is displayed as a number on the page. My table (listings) consists of about 22,500 records. On production it takes the page about 350ms to load (still below the threshold but not great for a home page).

Please bear with the number of queries here, I wanted to portray the redundancy of what I'm trying to do. This feels like it could be done much more efficiently. Any ideas?

SELECT COUNT(1) FROM listings WHERE (city in ('Syracuse'))
SELECT COUNT(1) FROM listings WHERE (city in ('Syracuse')) AND (created_at >= '2011-01-30 18:28:44.656702')
SELECT COUNT(1) FROM listings WHERE (city in ('Cicero', 'Clay', 'Lysander', 'VanBuren', 'Salina'))
SELECT COUNT(1) FROM listings WHERE (city in ('Cicero', 'Clay', 'Lysander', 'VanBuren', 'Salina')) AND (created_at >= '2011-01-30 18:28:44.811090')
SELECT COUNT(1) FROM listings WHERE (city in ('DeWitt', 'Manlius', 'Pompey'))
SELECT COUNT(1) FROM listings WHERE (city in ('DeWitt', 'Manlius', 'Pompey')) AND (created_at >= '2011-01-30 18:28:44.954442')
SELECT COUNT(1) FROM listings WHERE (city in ('Onondaga', 'Elbridge', 'Geddes', 'Camillus'))
SELECT COUNT(1) FROM listings WHERE (city in ('Onondaga', 'Elbridge', 'Geddes', 'Camillus')) AND (created_at >= '2011-01-30 18:28:45.105438')
SELECT COUNT(1) FROM listings WHERE (city in ('Fabius', 'Lafayette', 'Marcellus', 'Otisco', 'Skaneateles', 'Spafford', 'Tully'))
SELECT COUNT(1) FROM listings WHERE (city in ('Fabius', 'Lafayette', 'Marcellus', 'Otisco', 'Skaneateles', 'Spafford', 'Tully')) AND (created_at >= '2011-01-30 18:28:45.258860')
SELECT COUNT(1) FROM listings WHERE (city in ('West Monroe', 'Hastings', 'Constantia', 'Palermo', 'Mexico', 'Parish', 'Schroeppel'))
SELECT COUNT(1) FROM listings WHERE (city in ('West Monroe', 'Hastings', 开发者_运维百科'Constantia', 'Palermo', 'Mexico', 'Parish', 'Schroeppel')) AND (created_at >= '2011-01-30 18:28:45.411138') 

One option I considered is using the after_add and after_remove hooks on my Listing model to update a separate table with these statistics. My only concern with this is the maintenance issues involved. However, new listings are only added a few times throughout the day so updating said table shouldn't cause performance issues in itself.

Thanks!


Various approaches, not all database-oriented.

You can combine all the selects into a single query like so:

SELECT COUNT(CASE WHEN city = 'Syracuse' THEN 1 END) as syracuse,
       COUNT(CASE WHEN city = 'Syracuse' AND created_at >= '2011-01-30 18:28:44.656702' THEN 1 END) as syracuse_recent,
       /* etc... */
FROM listings

This will be just one scan over the table to collect all the stats.

Alternatively/additionally, cache the statistics extracted from the database in memory in your application, or use something like memcached. If there's no need for the statistics to be up-to-the-minute accurate, this offloads the query from the database completely, after the initial population.


First you should examine which indexes you have on the tables (try adding and removing indexes on individual fields and also composite indexes in both direction).

Also make sure to analyze exactly what does the 350ms compose of (with firebug or something like YSlow).

Finally if you really have rare updates and you want to maintain a summary table hooks are not the only way - you can also write trigger that will do this job for you.


Personally, I would add two new tables, one that contains groups of cities and the other a many to many link table between the groups and cities. You would need "city_group_id", "city_group_name", "dt_count_threshold". The second table would be "city_group_id", "city_id". Then you can perform selects against the many to many link table and join the city table with your date/time restriction.

-- unrestricted count
selec cg.city_group_name, count(*) as cnt
from dbo.city_group cg
join dbo.city_group_city cgc on cg.city_group_id = cgc.city_group_id
group by city_group_name

-- restricted
selec cg.city_group_name, count(*) as cnt
from dbo.city_group cg
join dbo.city_group_city cgc on cg.city_group_id = cgc.city_group_id
join dbo.city c on c.city_id = cgc.city_id
group by city_group_name
where c.created_at >= cg.dt_count_threshold

Keep in mind, these are untested queries so there might be some minor adjustments needed. And make sure all indexes are setup correctly to avoid table scans.

0

精彩评论

暂无评论...
验证码 换一张
取 消