given the link http://bit.ly/2994js
What is the most efficient way or library to u开发者_如何学Cse that would get you to the final URL of a bit.ly,fb.me, etc... after the 302 redirects? Assume the scale to be 10+ million of these a day with the ability to scale across servers.
Java HttpClient? PHP with cURL? other?
The implementation language isn't likely to make much odds in terms of performance - there's almost nothing to do. It'll all be network latency. It's possible that using a customized network stack might help, but I wouldn't bother unless I really needed to.
I'm not sure whether a 302 response is still able to keep the connection alive with HTTP 1.1 - but if it can, that could really be a boon. That's also an argument against using cURL (which is going to start a new process, requiring a new connection) for each URL, unless there's some way of putting cURL into a batch mode. (There may be - worth investigating.)
The important thing will be to make sure that you don't hit any server so hard it thinks you're launching a DDOS attack, but to make as many requests in parallel as you can within that limit.
Note that 10,000,000 per day is only ~116 requests per second. If you've got an adequate network connection and the target servers aren't blocking you, that shouldn't be hard to achieve.
cURL is fastest. So, if you want absolute speed, go with writing a bash script that does it by cURL.
However, making 10+ million request may get your IP banned pretty soon from them.
In the case of bit.ly, there is an API call (expand) that gets the target URL from the shortened URL. Other URL shortening services may have similar API calls. In those cases, you wouldn't have to handle the redirect.
精彩评论