I have made a simple HTTP client, which downloads a set of URLs parsed from a webpage.
My problem is that the download is slow, compared to a real browser (IE, Firefox, Chrome). Especially if the page contains many objects.
I noticed (with wireshark) that many times the real browsers will setup 5-10 TCP connections within the same millisecond instantly after starting 开发者_运维知识库the load of a page. Those conections will then live concurrently for a period of time.
My client will also setup concurrent TCP-connections (and it will reuse TCP connections), but not at all this aggressively. I'm guessing that this is one of the reasons my client is slower.
I have tried creating several URLConnections before reading from the input stream, but this does not work for me. I am inexperienced though, so I probably do it wrong.
Does anyone know of way to do this (mimic what the browsers are doing in terms of TCP connection setup) with URLConnection?
I recommend using HttpClient:
http://hc.apache.org/httpcomponents-client-ga/
It has support for internal connection management, pooling etc. Browsers tend to have this sort of stuff.
Things may have changed since I last used it, but UrlConnection didn't work well for production apps. Ex. it didn't have a clean way to shut it down.
I would also recommend using a high performance networking library, like Apache Mina. This will automatically create thread pool for you and save you a lot of time.
精彩评论