开发者

How do I repeatedly read from a HttpURLConnection?

开发者 https://www.devze.com 2023-02-16 03:09 出处:网络
I\'ve written a Java program which scrapes some content from a web page. It retrieves the content by calling the readWebPage method every couple of seconds. The problem I\'m having is that only the fi

I've written a Java program which scrapes some content from a web page. It retrieves the content by calling the readWebPage method every couple of seconds. The problem I'm having is that only the first read actually works. After the first time I read the web page the InputStream always appears to be empty (in.ready() return false).

Also, conn.getContentLength() return the same value every time, even though the content on the page has changed. If I restart the program the new content is fetched properly.

What have I missed? Do I have to perform some sort of refresh on the conn object?

private String readWebpage(HttpURLConnection conn) throws IOException{
            conn.connect();
            InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
            BufferedReader buffer = new BufferedReader(in);
     开发者_运维问答       StringBuilder b = new StringBuilder(conn.getContentLength()+5);
            String line;
            while ((line=buffer.readLine())!=null){
                b.append(line);
            }
            in.close();
            buffer.close();
            return b.toString();
    }


Are you passing in the same HttpURLConnection object every time? If yes, then since the InputStream is tied to the underlying HTTP connection, you'll get the same InputStream every time rather than a new stream to the URL in consideration. Open a new connection (URL#openConnection) before passing it to this method and you should be good to go.


Once you've read the entire screen, what more is there to read? A single get or post message cannot result in multiple transmissions from the server. It sends one message back, end of story.

If the screen is still updating, then either (a) the input is not finished, or (b) the further updates are something other than HTML, like there's an applet or a Javascript function that's talking to the server or some such.

I think BufferedReader.readLine blocks as long as there's still input coming, so I don't think it could be (a). If the situation is (b), reading more HTML isn't going to help: that's not changing.

0

精彩评论

暂无评论...
验证码 换一张
取 消