i build proxy server and it works great, however there are some sites which he cannot handle. I tried to reduce the problem to its core and this is what i came up with: My test case is: http://bits.wikimedia.org/en.wikipedia.org/load.php which is one of the http messages transfered in each wikipedia page. So i tried to build a request for it and send it via a socket like this:
String request1 =
"GET http://bits.wikimedia.org/en.wikipedia.org/load.php HTTP/1.1" +
"\r\n" +
"Host: bits.wikimedia.org" + "\r\n" +
"User-Agent: MyHttpProxy/example.java (http://stackoverflow.com/q/5924490/319266)" +
"\r\n" + "\r\n";
However i got 404 return code - which was strange because this page does exist! I made alot of trys and made a new request which was different only in the request line:
String request2 =
"GET /en.wikipedia.org/load.php HTTP/1.1" +
"\r\n" +
"Host: bits.wikimedia.org" +
"\r\n" +
"User-Agent: MyHttpProxy/example.java (http://stackoverflow.com/q/5924490/319266)" +
"\r\n" + "\r\n";
and it worked! a good 200 was brought back with some unimportent content("/* No modules requested. Max made me put this here */")
Ca开发者_如何转开发n anyone tell me what is the problem here? i looked at the rfc and i couldnt make any reason of this...
Here is the source code for running this test and print the resuls:
You would provide the full URL in the request line only if you're going via a proxy server. Direct requests to a web server need to follow the form as in request2
in your example.
Looking at the source, you send requests to port 80, which almost 100% means they're not going through a proxy. My guess is that you need to send request1
to port 8080 or whatever port your proxy is listening on.
As for the RFC, take a look at section 5.1.2. Note that the absolute path is used with proxies, and relative path with origin servers.
精彩评论