开发者

What is the best way to handle this: large download via PHP + slow connection from client = script timeout before file is completely downloaded

开发者 https://www.devze.com 2023-01-14 12:56 出处:网络
My client wanted a way to offer downloads to users, but only after they fill out a registration form (basically name and email). An email is sent to the user with the links for the downloadable conten

My client wanted a way to offer downloads to users, but only after they fill out a registration form (basically name and email). An email is sent to the user with the links for the downloadable content. The links contain a registration hash unique to the package, file, and user, and they actually go to a PHP page that logs each download and pushes the file out by writing it to stdout (along with the appropriate headers. This solution has inherent flaws, but this is how they wanted to do it. It needs to be said that I pushed them hard to 1.) limit the sizes of the downloadable files and 2.) think about using a CDN (they have international customers but are hosted in the US on 2 mirrored servers and a load balancer that uses sticky IPs). Anyway, it "works for me" but some of their international customers are on really slow connections (d/l rates of ~60kB/sec) and some of these files are pretty big (150 MB). Since this is a PHP script that is serving these files, it is bound by the script timeout setting. At first I had set this to 300 seconds (5 minutes), but this was not enough time for some of the beta users. So then I tried calculating the script timeout based on the size of the file divided by a 100kb/sec connection, but some of these users are even slower than that.

Now the client wants to just up the timeout value. I don't want to remove the timeout all together in case the script somehow gets into an infinite loop. I also don't want to keep pushing out the timeout arbitrarily for some catch-all lowest-common-denominator connection rate (most people are downloading much faster than 100kb/sec). And I also want to be able to tell the client at some point "Look, these files are too big to process this way. You are affecting the performance of the rest of the website with these 40-plus minute connections. We either need to rethink how they are delivered or use much smaller files."

I have a couple of solutions in mind, which are as follows:

  1. CDN - move the files to a CDN service such as Amazon's or Google's. We can still log the download attempts via the PHP file, but then redirect the browser to the real file. One drawback with this is that a user could bypass the script and download directly from the CDN once they have the URL (which could be gleaned by watching the HTTP headers). This isn't bad, but it's not desired.
  2. Expand the server farm - Expand the server farm from 2 to 4+ servers and remove the sticky IP rule from the load balancer. Downside: t开发者_开发百科hese are Windows servers so they are expensive. There is no reason why they couldn't be Linux boxes, but setting up all new boxes could take more time than the client would allow.
  3. Setup 2 new servers strictly for serving these downloads - Basically the same benefits and drawbacks as #2, except that we could at least isolate the rest of the website from (and fine tune the new servers to) this particular process. We could also pretty easily make these Linux boxes.
  4. Detect the users connection rate - I had in mind a way to detect the current speed of the user by using AJAX on the download landing page to time how long it takes to downloading a static file with a known file size, then sending that info to the server and calculating the timeout based on that info. It's not ideal, but it's better than estimating the connection speed too high or too low. I'm not sure how I would get the speed info back to the server though since we currently use a redirect header that is sent from the server.

Chances are #'s 1-3 will be declined or at least pushed off. So is 4 a good way to go about this, or is there something else I haven't considered?

(Feel free to challenge the original solution.)


Use X-SENDFILE. Most webservers will support it either natively, or though a plugin (apache).

using this header you can simply specify a local file path and exit the PHP script. The webserver sees the header and serves that file instead.


The easy solution would be to disable the timeout. You can do this on a per-request basis with:

set_time_limit(0);

If your script is not buggy, this shouldn't be problem – unless your server is not able to handle so many concurrent connections due to slow clients.

In that case, #1, #2 and #3 are two good solutions, and I would go with whichever is cheaper. Your concerns about #1 could be mitigated by generating download tokens that could only be used once, or for a small period of time.

Option #4, in my opinion, is not a great option. The speed can greatly vary during a download, so any estimate you would do initially would be, with a significant probability, wrong.


I am a bit reserved about #4. An attacker could forge a fake AJAX request to set your timeout to a very high value, then he can get you into an infinite loop. (If you were worried about that in the first place)

I would suggest a solution similar to @prodigitalson. You can make directories using hash values /downloads/389a002392ag02/myfile.zip which symlinks to the real file. Your PHP script redirects to that file which gets served by HTTP server. The symlink gets deleted periodically.

The added benefit for creating directory instead of a file is that end user doesn't see a mangled file name.


I think the main problem is serving the file thourgh a PHP script. Not only you will have the timeout problem. Also there is a web server process running while the file is being sent to the client.

I would recommend some kind of #1. It don't has to be a CDN but the PHP script should redirect directly to the file. You might check the bypass using a rewrite rule and a param that will check if the param and the current request time match.


I think you might do something like #1 except keep it on your servers and bypass serving it via php directly. After whatever auth/approval needs to happen with php have that script create a temporary link to the file for dowwnload via traditional http. If on a *nix id do this via a symlink to the real file and have a cron job run every n minutes to clear old links to the file.


You may create a temp file on the disk, or a symlink, and then redirect(using header()) to that temp file. Then a cronjob could come and remove "expired" temp files. The key here is that every download should have a unique temp file associated.

0

精彩评论

暂无评论...
验证码 换一张
取 消