I was wondering if anybody would have some input on how I could possibly optimize this MySQL query. I think I'm doing the right thing with indexes so don't think I can get this query any faster (it's currently taking more than 3 seconds to run) but would really love somebody with more experience to prove me wrong.
Here's the query:
SELECT `rooms`.*,
((IFNULL(SUM(av.host_daily_price), 0) +
rooms.host_daily_price * (4 - COUNT(DISTINCT av.id))) / 4) / 1 as 'price',
rooms.*,
(ACOS(least(1, COS(0.7115121627883911) * COS(1.291278129536698) *
COS(RADIANS(rooms.lat)) * COS(RADIANS(rooms.lng)) +
COS(0.7115121627883911) * SIN(-1.291278129536698) *
COS(RADIANS(rooms.lat)) * SIN(RADIANS(rooms.lng)) +
SIN(0.7115121627883911) * SIN(RADIANS(rooms.lat)))) * 3963.19) AS distance
FROM `rooms`
LEFT JOIN availabilities AS av
ON (av.room_id = rooms.id AND
av.date BETWEEN '2010-12-29' AND '2011-01-01')
WHERE (rooms.deleted_at IS NULL) AND
(`rooms`.`hidden` = 0) AND
(rooms.id <> 7713) AND
(rooms.city_id = 1 AND
rooms.max_guests >= 4 AND
rooms.minimum_stay <= 3 AND
rooms.room_type IN ('room','apartment','house')) AND
(av.inquiry_id IS NULL)
GROUP BY rooms.id
HAVING SUM(IFNULL(status, 0)) = 0 AND
(COUNT(*) = 4 OR `rooms`.default_available = 1)
ORDER BY distance ASC
LIMIT 12;
The output of Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE rooms ref PRIMARY,index_rooms_on_city_id,index_rooms_on_room_type,index_rooms_on_city_id_and_updated_at index_rooms_on_city_id 5 const 2412 Using where; Using temporary; Using filesort
1 SIMPLE av ref index_availabilities_on_room_id,index_availabilities_on_room_id_and_date,index_availabilities_on_room_id_and_date_and_status index_availabilities_on_room开发者_Go百科_id 5 roomorama.rooms.id 79 Using where
Let me know if any other information would be useful!
You should seriously consider storing the distance instead of calculating it, specially if you are sorting by that value.
How about:
1 - Only fetch all the data from rooms once. As @OMG Ponies noted in his comment, all columns from rooms are being fetched twice in the query.
2 - If the trig function being performed on constants in the query are truly constant (e.g. COS(0.7115121627883911)) replace them with the computed values, i.e.
COS(0.7115121627883911) = .7573753305446695179374104150422980521625
COS(1.291278129536698) = .2758925773610728508649620468976736490713
COS(0.7115121627883911) = .7573753305446695179374104150422980521625
SIN(-1.291278129536698) = -.9611884756680473394167554039882007538993
3 - This query appears to be doing the spherical trig to get the correct distance between two points on the surface of the earth. It might be faster and equally useful to compute an approximate distance by doing something like
dist = SQRT( (lat2-lat1)^2 + ((long2-long1) * COS(RADIANS(lat1+lat2)/2)))^2 ) * 60
This (should) give the distance between the points (lat1, long1) and (lat2, long2) in miles. Adjust the trailing constant for whatever distance measure you prefer.
Share and enjoy.
Without seeing how your tables are indexed, it's hard to tell if there are any specific problems. I don't know the MySQL explain output, so I'm not going to fake anything.
However, one thing you CAN do is create conditional indexes. For example, in addition to having your common indexes like
create index rooms_by_id on rooms(room_id);
you can have indexes that already have the conditions built in, thus reducing the number of index records that have to be traversed
create index rooms_by_id_usable on rooms(room_id)
WHERE (deleted_at IS NOT NULL) and (hidden <> 0)
If 20% of your rooms
records match deleted_at IS NULL and hidden <> 0)
, then this alternate index rooms_by_id_usable
will be 20% smaller than rooms_by_id
, and will take (roughly) 20% less time to traverse.
This all depends on the MySQL optimizer and how it chooses to use indexes and so on. I know in PostgreSQL that this works very nicely indeed.
Is the performance significantly improved if the distance calculation is removed? If so, it might be worth storing the sin and cos of lat and lng on the room table (and using the stored values in the query) - these functions are relatively processor-intensive, so deriving them for all of a relatively large dataset may significantly impact performance.
Switch to PostgreSQL and use PostGIS. It's a geospatial extension to PostgreSQL which does this sort of thing natively and efficiently.
If you're genuinely stuck with MySQL, then no dice, but PostgreqSQL is free, open source, easy to use, and fast, so it's an entirely viable alternative to MySQL (to say the least) if you have the ability to choose.
精彩评论