I am using PHP and CURL to make HTTP reverse geocoding (lat, long -> address) requests to Google Maps. I have a premier account, so we can make a lot of a requests without being throttled or blocked.
Unfortunately, I have reached a performance limit. We get approximately 500,000 requests daily that need to be reverse geocoded.
The code is quite trivial (I will write pieces in pseudo-code) for the sake of saving time and space. The following code fragment is called every 15 seconds via a job.
<?php
//get requests from database
$requests = get_requests();
foreach($requests as $request) {
//build up the url string to send to google
$url = build_url_string($request->latitude, $request->longitude);
//make the curl request
$response = Curl::get($url);
//write the response address back to the database
write_response($response);
}
class Curl {
public static function get($p_url, $p_timeout = 5) {
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $p_url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, $p_timeout);
curl_setopt($curl_handle, CURLOPT_TIMEOUT, $p_timeout);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($curl_handle);
curl_close($curl_handle);
return $response;
}
}
?>
The performance problem seems to be the CURL requests. They are extremely slow, probably because its making a full HTTP request every operations. We have a 100mbps connection, but the script running at full speed is only utilizing about 1mbps. The load on the server is essentially nothing. The server is a quad core, with 8GB of memory.
What things can we do to increase the throughput of this? Is there a way to open a persistent (keep-alive) HTTP request with Google Maps? How about exploding the work out horizonta开发者_运维技巧lly, i.e. making 50 concurrent requests?
Thanks.
some things I would do:
no matter how "premium" you are, doing external http-requests will always be a bottleneck, so for starters, cache request+response - you can still update them via cron on a regular basis
these are single http requests - you will never get "fullspeed" with them especially if request and response are that small (< 1MB) - tcp/handshaking/headers/etc. so try using multicurl (if your premium allows it) in order to start multiple requests - this should give you fullspeed ;)
add "Connection: close" in the request header you send, this will immediately close the http connection so your and google's server won't get hammered with halfopen
Considering you are running all your requests sequentially you should look into dividing the work up onto multiple machines or processes. Then each can be run in parallel. Judging by your benchmarks you are limited by how fast each Curl response takes, not by CPU or bandwidth.
My first guess is too look at a queuing system (Gearman, RabbitMQ).
精彩评论