开发者

Optimising SQL distance query

开发者 https://www.devze.com 2022-12-22 17:32 出处:网络
I\'m running an MySQL query that returns results based on location开发者_开发知识库. However I have noticed recently that its really slowing down my PHP app. I used CodeIgniter and the profiler shows

I'm running an MySQL query that returns results based on location开发者_开发知识库. However I have noticed recently that its really slowing down my PHP app. I used CodeIgniter and the profiler shows the query taking 4.2seconds. The geoname table has 500,000 rows. I have some indexes on the key columns, how else can speed up this query?

Here is my SQL:

SELECT `products`.`product_name`
     , `geoname`.`geonameid`
     , `geoname`.`latitude`
     , `geoname`.`longitude`
     , `products`.`product_id`
     , AVG(ratings.vote) as rating
     , count(comments.comment_id) as total_comments
     ,   (6371 * acos(cos(radians(38.7666667)) 
               * cos(radians(geoname.latitude)) 
               * cos(radians(geoname.longitude) - radians(-3.3833333)) 
             +   sin(radians(38.7666667)) 
               * sin(radians(geoname.latitude)))
         ) AS distance
FROM (`foods`)
JOIN `geoname` ON `geoname`.`geonameid` = `products`.`geoname_id`
LEFT JOIN `ratings` 
  ON `ratings`.`var_id` = `products`.`product_id`
LEFT JOIN `comments` 
  ON `comments`.`var_id` = `products `.`product_id`
WHERE `products`.`product_id` != 82
GROUP BY `products`.`product_id`
HAVING `distance` < 99
ORDER BY `distance`
LIMIT 10


Let's start with the query itself the cos(radians(geoname.latitude)) and other functions seem like an invariant, so we can do a little preprocessing and store the calculated values in the table. (calculating trig functions mostly involve using a series expansio which is costly).

6371 * acos(cos(radians(38.7666667)) - this is equal to radians(38.76667) * 6371 so why us it? it costs.

Secondly if You don't care THAT much about precision You can precalc the radians itself for let's say 10000 points from 0 to pi/2 - that should give a nice approximation, up to 4 decimal numbers eg less than a km

(6371 * acos(cos(radians(38.7666667))
 * cos(radians(geoname.latitude))
 * cos(radians(geoname.longitude) - radians(-3.3833333))
+ sin(radians(38.7666667))
* sin(radians(geoname.latitude))))

also remember that sin(a) when a > pi/2 and a < pi equals to sin(pi - a) when a> pi and a < 3/2 pi equals to -sin(a-pi) and when a > 3/2 pi and a < 2pi it's equal to -sin (2pi - a). similiar functions can be made for cos function.

Try this and see if it helps. luke


If you ask MySQL to EXPLAIN PLAN, I think you'll find that the distance calculations are rendering your indexes useless. You're forcing the query engine to do a TABLE SCAN.

The only way to save the situation would be to put distance in a separate column and index that.


If you can approximate any search location to, say, 1000 on 10000 points in space, you could, in fact, store distances in a helper table along the lines of:

create table distance (
position1_id int,
position2_id int,
distance int -- probably precise enough
)

with an index on position1_id and distance. The table would have anywhere from a 10^6 to 10^8 rows, but using index data, I imagine you could quickly retrieve the nearest position2_id. Even if this wouldn't be precise enough for you (because of having to settle for limited resolution), it would allow you to quickly eliminate probably >99% of the locations you don't care about in a specific case.


You can kick the radians() function out by simply dividing by 57.29577951. That will eliminate six mathematical computations per row. The formula in general is not friendly for relational query joins on large sets. Nonetheless, here's a different query that attempts to shrink the views prior to joining. I'm not sure if it will run faster or slower without testing and tweaking it. Ultimately, I would decide to build a statistics table on the primary key and configure triggers on the other tables to maintain it, that way your final distance calculation query will run instantaneously against a very small table. And to be truly awesome, I would build an audit table similarly against the statistics table to summarize trends.

select p.product_name,
g.geonameid,
g.latitude,
g.longitude,
p.product_id,
avg(r.votes) as rating,
c.total_comments,
g.distance
(select product_id, geoname_id, product_name from products where product_id != 82) p
inner join 
(select geonameid, latitude, longitude, (6371 * acos(cos(38.7666667/57.29577951) 
               * cos(latitude/57.29577951) 
               * cos((longitude/57.29577951) - (-3.3833333/57.29577951)) 
             +   sin(38.7666667/57.29577951) 
               * sin(latitude/57.29577951))
         ) AS distance
from geoname group by geonameid having distance < 99) g on p.geoname_id = g.geonameid
left join
(select var_id, count(vote) votes from ratings group by var_id) r on p.product_id = r.var_id
left join 
(select var_id, count(comment_id) total_comments from comments group by var_id) c on p.product_id = c.var_id
group by p.product_id  
order by g.distance
limit 10
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号