Need some help optimizing a complex SQL Query_问答_开发者

I was wondering if anybody would have some input on how I could possibly optimize this MySQL query. I think I'm doing the right thing with indexes so don't think I can get this query any faster (it's currently taking more than 3 seconds to run) but would really love somebody with more experience to prove me wrong.

Here's the query:

SELECT `rooms`.*,
       ((IFNULL(SUM(av.host_daily_price), 0) + 
         rooms.host_daily_price * (4 - COUNT(DISTINCT av.id))) / 4) / 1 as 'price', 
       rooms.*, 
       (ACOS(least(1, COS(0.7115121627883911) * COS(1.291278129536698) *
                      COS(RADIANS(rooms.lat)) * COS(RADIANS(rooms.lng)) +
                      COS(0.7115121627883911) * SIN(-1.291278129536698) *
                      COS(RADIANS(rooms.lat)) * SIN(RADIANS(rooms.lng)) +
                      SIN(0.7115121627883911) * SIN(RADIANS(rooms.lat)))) * 3963.19) AS distance 
FROM `rooms`
LEFT JOIN availabilities AS av 
  ON (av.room_id = rooms.id AND
      av.date BETWEEN '2010-12-29' AND '2011-01-01')    
WHERE (rooms.deleted_at IS NULL) AND
      (`rooms`.`hidden` = 0) AND
      (rooms.id <> 7713) AND
      (rooms.city_id = 1 AND
         rooms.max_guests >= 4 AND
         rooms.minimum_stay <= 3 AND
         rooms.room_type IN ('room','apartment','house')) AND
      (av.inquiry_id IS NULL) 
GROUP BY  rooms.id 
HAVING SUM(IFNULL(status, 0)) = 0 AND
       (COUNT(*) = 4 OR `rooms`.default_available = 1) 
ORDER BY distance ASC
LIMIT 12;

The output of Explain:

id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   SIMPLE  rooms   ref PRIMARY,index_rooms_on_city_id,index_rooms_on_room_type,index_rooms_on_city_id_and_updated_at   index_rooms_on_city_id  5   const   2412    Using where; Using temporary; Using filesort

1   SIMPLE  av  ref index_availabilities_on_room_id,index_availabilities_on_room_id_and_date,index_availabilities_on_room_id_and_date_and_status    index_availabilities_on_room开发者_Go百科_id 5   roomorama.rooms.id  79  Using where

Let me know if any other information would be useful!

You should seriously consider storing the distance instead of calculating it, specially if you are sorting by that value.

How about:

1 - Only fetch all the data from rooms once. As @OMG Ponies noted in his comment, all columns from rooms are being fetched twice in the query.

2 - If the trig function being performed on constants in the query are truly constant (e.g. COS(0.7115121627883911)) replace them with the computed values, i.e.

COS(0.7115121627883911) = .7573753305446695179374104150422980521625
COS(1.291278129536698) = .2758925773610728508649620468976736490713
COS(0.7115121627883911) = .7573753305446695179374104150422980521625
SIN(-1.291278129536698) = -.9611884756680473394167554039882007538993

3 - This query appears to be doing the spherical trig to get the correct distance between two points on the surface of the earth. It might be faster and equally useful to compute an approximate distance by doing something like

dist = SQRT( (lat2-lat1)^2 + ((long2-long1) * COS(RADIANS(lat1+lat2)/2)))^2 ) * 60

This (should) give the distance between the points (lat1, long1) and (lat2, long2) in miles. Adjust the trailing constant for whatever distance measure you prefer.

Share and enjoy.

Without seeing how your tables are indexed, it's hard to tell if there are any specific problems. I don't know the MySQL explain output, so I'm not going to fake anything.

However, one thing you CAN do is create conditional indexes. For example, in addition to having your common indexes like

create index rooms_by_id on rooms(room_id);

you can have indexes that already have the conditions built in, thus reducing the number of index records that have to be traversed

create index rooms_by_id_usable on rooms(room_id)
    WHERE (deleted_at IS NOT NULL) and (hidden <> 0)

If 20% of your rooms records match deleted_at IS NULL and hidden <> 0), then this alternate index rooms_by_id_usable will be 20% smaller than rooms_by_id, and will take (roughly) 20% less time to traverse.

This all depends on the MySQL optimizer and how it chooses to use indexes and so on. I know in PostgreSQL that this works very nicely indeed.

Is the performance significantly improved if the distance calculation is removed? If so, it might be worth storing the sin and cos of lat and lng on the room table (and using the stored values in the query) - these functions are relatively processor-intensive, so deriving them for all of a relatively large dataset may significantly impact performance.

Switch to PostgreSQL and use PostGIS. It's a geospatial extension to PostgreSQL which does this sort of thing natively and efficiently.

If you're genuinely stuck with MySQL, then no dice, but PostgreqSQL is free, open source, easy to use, and fast, so it's an entirely viable alternative to MySQL (to say the least) if you have the ability to choose.