开发者

Accelerated downloads with HTTP byte range headers

开发者 https://www.devze.com 2023-01-23 15:20 出处:网络
Has anybody got any experience of using HTTP byte ranges across multiple parallel requests to speed up downloads?

Has anybody got any experience of using HTTP byte ranges across multiple parallel requests to speed up downloads?

I have an app that needs to download fairly large images from a web service (1MB +) and then send out the modified files (resized and cropped) to the browser. There are many of these images so it is likely that caching will be ineffective - i.e. the cache may well be empty. In this case we are hit by some fairly large latency times whilst waiting for the image to download, 500 m/s +, which is over 60% our app's total response time.

I am wondering if I could speed up the download of these images by using a group of parallel HTTP Range requests, e.g. each thread downloads 100kb of data and the responses are concatenated back into a full file.

Does anybody out there have any experience of this sort of thing? Would the overhead of the extra downloads negate a speed increase or might t开发者_Go百科his actually technique work? The app is written in ruby but experiences / examples from any language would help.

A few specifics about the setup:

  • There are no bandwidth or connection restrictions on the service (it's owned by my company)
  • It is difficult to pre-generate all the cropped and resized images, there are millions with lots of potential permutations
  • It is difficult to host the app on the same hardware as the image disk boxes (political!)

Thanks


I found your post by Googling to see if someone had already written a parallel analogue of wget that does this. It's definitely possible and would be helpful for very large files over a relatively high-latency link: I've gotten >10x improvements in speed with multiple parallel TCP connections.

That said, since your organization runs both the app and the web service, I'm guessing your link is high-bandwidth and low-latency, so I suspect this approach will not help you.

Since you're transferring large numbers of small files (by modern standards), I suspect you are actually getting burned by the connection setup more than by the transfer speeds. You can test this by loading a similar page full of tiny images. In your situation you may want to go serial rather than parallel: see if your HTTP client library has an option to use persistent HTTP connections, so that the three-way handshake is done only once per page or less instead of once per image.

If you end up getting really fanatical about TCP latency, it's also possible to cheat, as certain major web services like to.

(My own problem involves the other end of the TCP performance spectrum, where a long round-trip time is really starting to drag on my bandwidth for multi-TB file transfers, so if you do turn up a parallel HTTP library, I'd love to hear about it. The only tool I found, called "puf", parallelizes by files rather than byteranges. If the above doesn't help you and you really need a parallel transfer tool, likewise get in touch: I may have given up and written it by then.)


I've written the backend and services for the sort of place you're pulling images from. Every site is different so details based on what I did might not apply to what you're trying to do.

Here's my thoughts:

  • If you have a service agreement with the company you're pulling images from (which you should because you have a fairly high bandwidth need), then preprocess their image catalog and store the thumbnails locally, either as database blobs or as files on disk with a database containing the paths to the files.
  • Doesn't that service already have the images available as thumbnails? They're not going to send a full-sized image to someone's browser either... unless they're crazy or sadistic and their users are crazy and masochistic. We preprocessed our images into three or four different thumbnail sizes so it would have been trivial to supply what you're trying to do.
  • If your request is something they expect then they should have an API or at least some resources (programmers) who can help you access the images in the fastest way possible. They should actually have a dedicated host for that purpose.

As a photographer I also need to mention that there could be copyright and/or terms-of-service issues with what you're doing, so make sure you're above board by consulting a lawyer AND the site you're accessing. Don't assume everything is ok, KNOW it is. Copyright laws don't fit the general public's conception of what copyrights are, so involving a lawyer up front can be really educational, plus give you a good feeling you're on solid ground. If you've already talked with one then you know what I'm saying.


I would guess that using any p2p network would be useless as there is more permutations then often used files.

Downloading parallel few parts of file can give improvement only in slow networks (slower then 4-10Mbps).

To get any improvement of using parallel download you need to ensure there will be enough server power. From you current problem (waiting over 500ms for connection) I assume you already have problem with servers:

  • you should add/improve load-balancing,
  • you should think of changing server software for something with more performance

And again if 500ms is 60% of total response time then you servers are overloaded, if you think they are not you should search for bottle neck in connections/server performance.

0

精彩评论

暂无评论...
验证码 换一张
取 消