开发者

Getting HTTP 503 error while accessing a URL from my Java program [duplicate]

开发者 https://www.devze.com 2022-12-16 03:41 出处:网络
This question already has answers here: 403 Forbidden with Java but not web browser? (4 answers) Closed 4 years ago.
This question already has answers here: 403 Forbidden with Java but not web browser? (4 answers) Closed 4 years ago.

I'm building a Java application which will download a HTML page from a website and save the file in my local system. I'm able to manually access the web page's URL via browser. But when I try to access the same URL in my Java program, the server returns a 503 Error. Here's the scenario:

sample URL = http://content.somesite.c开发者_运维百科om/demo/somepage.asp

Able to access the above URL via browser. But the below Java code fails to download the page:

StringBuffer data = new StringBuffer();
BufferedReader br = null;
try {
    br = new BufferedReader(new InputStreamReader(sourceUrl.openStream()));
    String inputLine = "";
    while ((inputLine = br.readLine()) != null) {
        data.append(inputLine);
    }
} catch (Exception e) {
    e.printStackTrace();
} finally {
    br.close();
}

So, my questions are:

  1. Am I doing anything wrongly here?

  2. Is there a way for the server to block requests from programs/bots and allow only the requests coming from browsers?


You may want to try setting the User-Agent and Referer HTTP headers to something like what a normal web browser would send.

You can pick a User-Agent string from this list: Seehowitruns: User-agent strings.

In addition, if the page you are requesting is an internal page, it might also depend on cookies which were generated in previous page.

0

精彩评论

暂无评论...
验证码 换一张
取 消