I'm building a Java application which will download a HTML page from a website and save the file in my local system. I'm able to manually access the web page's URL via browser. But when I try to access the same URL in my Java program, the server returns a 503 Error. Here's the scenario:
sample URL = http://content.somesite.c开发者_运维百科om/demo/somepage.asp
Able to access the above URL via browser. But the below Java code fails to download the page:
StringBuffer data = new StringBuffer();
BufferedReader br = null;
try {
br = new BufferedReader(new InputStreamReader(sourceUrl.openStream()));
String inputLine = "";
while ((inputLine = br.readLine()) != null) {
data.append(inputLine);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
br.close();
}
So, my questions are:
Am I doing anything wrongly here?
Is there a way for the server to block requests from programs/bots and allow only the requests coming from browsers?
You may want to try setting the User-Agent
and Referer
HTTP headers to something like what a normal web browser would send.
You can pick a User-Agent string from this list: Seehowitruns: User-agent strings.
In addition, if the page you are requesting is an internal page, it might also depend on cookies which were generated in previous page.
精彩评论